This is a rather short article, by design. I browse a lot of popular self publishing websites, and wanted to take some time to refute some of the data they are displaying as true, which is unintentionally misleading author’s and publishers alike. First off, these market share graphs, created by a guy named Paul Abbassi, are everywhere. I guess it is because there is no central authority to really collect this data, and the private companies that do hold the data tight to their chest, and so we the mildly interest populace are left guessing. Paul asserts an air of authority, and through his Author Earnings Report (now known as Bookstat, as his original website author earnings.com is deleted) he creates a lot of official looking facts and figures that are spread far and wide. However, a cursory glance shows them to be a bit fantastical at best, as I will show in no particular order. I’m not exaggerating how wide spread his data is, here are some graphics from various extremely popular publishing websites. You might recognize them.
Above is a graph of E-Book market share, by Author Earnings Reports. The graph above was taken from an article by PublishDrive (whom I love, btw, great company), the same graph was re-designed and displayed in multiple articles on Kindlepreneur, similar info is on Idealog, Lulu.com, Geekwire, Janefriedman, QZ, Observer, ElectricLiterature, virtually every website that does articles on the self publishing industry. Before I get into just how inaccurate a lot of his data must be, keep in mind that Amazon releases NO ebook sales data. None. Zilch. All sales data, including Bookstat/AER’s, is from websites trawling Amazon, collecting sales ranks, and assuming. Ditto for other websites, most of whom don’t release hard or detailed data. It could be that Paul is not inaccurate in himself, it could simply be that collecting secondary scraps of information will, by definition, be wildly inaccurate when comparing so many secretive data sources.
I would also like to remind people that most websites where “data is money” skew numbers intentionally. Reddit stopped displaying downvotes, and also started fudging the exact upvote amounts, both to make it so anyone who wants accurate data must pay them, and also to retain a degree of control by letting themselves promote or punish certain behaviors with a higher ranking. If you think Amazon does not do this, you are naive. Amazon has incredibly complex algorithms, I certainly do not understand them, and any author who has been publishing with good results over a long period will tell you that the same amount of sales will give you a wildly different sales rank, and I’m not talking about subcategory rank. As Jeff Bezo’s essentially pioneered data scrapping his competitors and undercutting them, and the sales rank is an unofficial ranking granted to you for ranking purposes by Amazon, I think it’s safe to assume that your rank is not determined as a simple function of sales per day. As such, any data analysis collecting “sales” information purely from sales rank is doomed to fail. Sales rank is great to make a rough estimate of how much a book is selling, but it’s obviously insufficient to estimate total sales, otherwise Amazon would stop protecting their e-Book sales data like it’s the 11 herbs and spices.
Anyway, on to the bad data, in no particular order.
I will start with a silly one, he’s written JK Rowling twice into the list of top selling audiobook authors. This list looks almost official enough to base my entire writing strategy on, right?
Check this one out. Ignoring the fact that a generic genre called “Literature and Fiction” which outsells all of the other 44 genres (most of which, are literature and fiction) by a mile exists in this fantasy world, the numbers are also ridiculous compared to his own figures. He says (shown in next picture) that from April-Dec 2017 there were 25,425,137 e-books sold at $187,673,044 value.
Then he says (picture after next) that sales in Mystery, Thriller & Suspense was 215,519,384 e-books sold for $1,101,587,355 for the 18 month period from April 2017 – September 2018.
To believe his figures, you would have to believe that sales jumped from 25 million units sold in the last 9 months of 2017, to 190 million units in the first 9 months of 2018. Clearly, that’s untrue.
This one’s confusing. He claims e-book sales are “pretty flat from month to month” for 2017 and even includes this graph showing them steady for some months (e-books in green.) The picture after that is his summary for the 9 months , which shows 1.3 billion in e-book sales. This graph very clearly shows almost exactly 150mil per month, for 9 months of the year. Yet Paul, in the prior years report, had estimated 2016’s e-book sales at 3.2 billion, when basic logic would dictate around 1.8 billion, or a little more or little less as it was the prior year. So 3.2 billion in e-book sales for 2016, and 1.8 billion in e-book sales for 2017? In the comments section months later, he tries to defend his data seemingly by making up more data. (third pic.)
So to re-cap, he is saying the 2016 data is less accurate, and is putting 2017 e-book sales at 3 billion (where did he get this figure from? It’s not in the report.) The fact that he clearly listed 9 months of e-book sales in 2017 here at a total value of 1.3 billion? Ignored completely.
So. The last 9 months of 2017’s e-book sales were 1.3 billion and that includes over 90% of the market but all of 2017’s e-book sales were 3 billion, with 1.7 billion being made in the first 3 months. However, he only gives the 3 billion figure in the comments, with absolutely no explanation of how 1 + 1 = 5.
His explanation that he was capturing less of the market towards the beginning of the year, is paltry. He never mentioned that when he released the report, only as a reply to a comment many months after the report was released. He also, according to many forum posts and comments, went back and added 50-100 million to various figures, which is an absurd amount to simply add without explanation.
Some of what he says could make sense, if it was said upfront. Editing, adding and subtracting from your figures in real time to keep up with criticism of your figures as the criticism comes in, is shady and sloppy at best. Throwing out large numbers without explanation is also absurd. Not to mention that, without fail, every years report he releases contradicts the prior years report in such extreme ways as to make it clear that one of them must be wrong. The actual data analysis of the figures seems sloppy, and the underlying data could better be referred to as the underlying assumptions. Throw in the fact that even if he had a perfect data collector, he could never directly collect sales data complicated enough to make his data useful for targeting – and you’re left with the notion that letting this guy inform us is wrong. People make real, life changing decisions as authors and publishers based on this information. What if the commonly referred to 85% market share of the e-book market Amazon has, isn’t true? Amazon doesn’t claim it. The implications for decisions such as to go exclusive or wide, or what category to publish in, are massive.
I suppose the obvious next step is to ask – how can we get accurate data on the market?
To be honest, I’m unsure. I feel a combination of the little hard data we have, such as from The Association Of American Publishers and NPD Pubtrack, plus $ statements from the companies selling the books themselves, along with niche analysis through inference from a rough analysis of sales ranks, would give a relatively decent picture, good enough for any small publisher to rely upon. Who knows, maybe if enough people are interested I will do my best to paint a decent picture of the current e-book market. Until then, I hope this helped some people understand just how sad the state of “common knowledge” e-book market analysis is. Stay aware!