There’s a huge misconception in our industry…
That those Chrome extensions you use to research products, niches, and keywords are actually accurate.
The truth is, those tools are wildly inaccurate and imprecise.
And we’re going to pull back the veil…
Taking Matters Into Our Own Hands
This is an exciting post for me to write. In fact, this is about as excited as I’ve been since we launched ZonBlast 2.0
For the last six months or so, we’ve put nearly all of our efforts into shaking up the Amazon product and keyword research space. We’re nearing release of, yes, yet another Chrome extension among the dozen or more on the market.
Because you have been and still are relying on inaccurate data in your business. It’s the industry’s dirty little secret.
Let me explain a bit…
Like many of you reading this, I along with much of the team here at SixLeaf and in the #BrandOwners FB Group got started on Amazon in 2014. At that time, there was really only one tool out there (that I was aware of at least) that took a stab at guessing the number of daily and monthly sales for a given product on Amazon, and was used primarily as a marketing tool to indicate potential in buying the course and getting into the “game” that is Amazon-centric Brand development. It was by a certain educational course which shall remain nameless. It also grossly inflated sales by a factor of 4-10x. I know this because my own products would eventually appear in their list of suggested products to launch.
Then, Jungle Scout came along to revolutionize the space. It was the first or nearly the first Chrome focused extension that provided a quick and easy way to get important data points including estimated sales. It was and in many ways still is a data junkie’s dream.
Following in their footsteps, Viral Launch, Helium10, ASINInspector, ZonGuru, et al all established their own chrome extensions. And all premised their tools’ usefulness on estimated sales.
And all were and still are quite inaccurate and imprecise in our opinion.
I want to be clear here at the outset…I have an immense respect for Greg Mercer and Jungle Scout for having grown what is in my opinion one of if not the most professional and “grown up” of the SAAS in our space. And the team at Helium10 with Manny and his massive passionate audience, and ZonSquad alum Bradley Sutton brilliantly leading the charge in delivering stellar content to that audience, deserve endless kudos. And Viral Launch has of course built a meaningful company that has garnered investments from VC firms that see potential. These three companies in particular have earned their spots at the top of the research space for good reason.
In the past, we’ve focused primarily on entirely unique tools that created a new space, starting right from ZonBlast back in mid 2014.
But after watching this industry for almost 5 years suggest that reliance in part or in full on data known to be incredibly inaccurate and quite imprecise is somehow conducive to you making good business decisions…
…well, we decided to take matters into our own hands.
Enter Phoenix & the RISE Algorithm
We had toyed with getting in to the Chrome extension space since 2017. But two core issues held us back from moving forward.
First, we were [likely stubbornly] committed to creating entirely new tools, services, and features in the marketplace. While those tools and services were and are incredibly valuable to our end users, there were often rather “nichey” and served the Brand Owner at a very specific point in their journey in creating and growing a Brand.
Second, we didn’t know how or even if we could get more accurate estimated sales figures than what was already out there. And we certainly didn’t want to just be yet another Chrome extension on the block that offered nothing better or unique.
Fast forward to late Q4 2018, when by then we’ve been able to massively augment the amount of data coming into our systems through some key acquisitions, and we started conceptualizing techniques that just might get us more accurate estimates sales figures as part of the Chrome extension that would later be coined Phoenix.
The key, as it turned out, was machine learning in combination with the data we have access to via our own accounts, our close network of 7 and 8 figure sellers who agreed to share data over the years, and clients who have specifically invited and opted in to share data.
If a dose of geeky tech-ish speak is your style, our team uses buy box data, BSR history, search volume, and a few more key metrics to build prediction sets to estimate the sales data. It’s a dynamic/organic process that taps hundreds of millions of data points to, in essence, reverse engineer BSR.
That’s the technical (ish) lingo for this algorithm driving the forthcoming SixLeaf Phoenix extension. We’ve coined that algorithm RISE: Real-Time Integrated Sales Engine.
In my lingo, we basically made magic happen.
We’re about to set a new bar when it comes to accuracy and precision in this space.
Accuracy vs Precision…and Why Both are Important
Before we dig into the data, let’s discuss those two terms, accuracy and precision…because they’re important but often confused as being the same.
Accuracy is a measure of how close an estimate, measurement, or guess is to its real value.
Precision is a measure of how consistent and repeatable those estimates, measurements, or guesses are.
If you’d like to dive in a bit more, here are two worthwhile resources to understand the differences between the two and how both are important:
Analysis Part 1: Phoenix and the RISE Algorithm vs our Beta Users’ Data
In April, having entered phase the third and final phase for what is our best-performing and latest iteration of RISE, we recruited a dozen beta users from our #BrandOwners FB Group to test the extension and more importantly to evaluate the accuracy and precision of RISE vs actual, known sales.
We simply asked for ASIN, Buy Box %, actual sales, and category. Side note for those wondering: our beta group’s data in no way shaped or affected our RISE algorithm. Testing our data with data that helped created an algorithm would be statistically improper and ethically dishonest.
We then applied the RISE algorithm to that data to generate estimated sales figures. We then of course examined the nominal difference (as (RISE – Actual Sales), percent error (as nominal difference / Actual Sales), and Absolute Error (as ABS(percent error)).
While Jungle Scout was only interested in median error (for good reason, as you’ll see in Analysis Part 2), we dove significantly deeper.
First off, we had 11,353 rows of data. Plenty to deep dive into. We had 280 unique ASINs giving us at least 30 days of data, but mostly 90 days and in some cases almost a year’s worth of data.
Second, no exclusions of outliers similar to those performed by Jungle Scout in their recent analysis was performed (brief discussion on that in part 2). We used the entirety of the data available to us from our Beta users and only cleaned/aggregated the category variable (data).
Percentage Error = Absolute(Actual Sales – Estimated Sales) / Actual Sales
For 121 products from the Health & Household category, the observed sales number was 0, so the percentage error was not possible to calculate. Which is why the number of rows for both Mean and Median above are n=11232, whereas perfect of perfect predictions is n=11353.
Getting to an average of 9% error rate, and median of 0% error rate was incredibly in itself. But the mind boggling part of this for me was that 61% of the time we’re EXACTLY nailing the actual sales number.
The above plot shows the entire range of observed Percentage Errors. Errors less than 0 indicate that the sales are overestimated. Errors higher than 0 indicate that the sales are underestimated. The maximum Percentage Error was .5 (336 cases in which actual units sold was 2 and Phoenix estimated units sold was 1). The minimum Percentage Error was -1.5
Statistics By Category
We felt it was necessary to dig into categories to identify if there were any anomalies where one category might perform far worse than the other. Consistent quality was shown across all categories, with the relatively biggest errors residing in the Health & Household category. Note that 13 rows of data were for ASINs for which we were not able to identify category due to the ASIN no longer existing or being available at the time of this writing.
Errors in Units Sold
The prior analyses focused on Percentage Error. Let’s analyze Unit Error calculated as:
Unit Error = Units Sold – RISE Estimated Sales
It’s obviously easy to calculate for all 11,353 data points (even when the actual sales figure is 0) and is easy to understand.
Below is the distribution of Unit Error on our Beta group’s data set.
For 95% of cases the error is within ONE unit sold.
In sum, we’re absurdly accurate and precise when running our RISE algorithm against our Beta users’ data 🙂
Analysis Part 2 – Phoenix and the RISE algorithm vs Jungle Scout, Viral Launch, Helium10, et al
For this part of our analysis we were actually planning on using Jungle Scout’s case study data from mid 2018. That is, until they published their update just a short while ago on May 29. Suffice it to say we won in every possible metric with their Case Study 1 data set. But it wouldn’t be fair to any of the tools compared in that study to publish that since of course as of the Case Study Update a few of the tools, including Jungle Scout, seemed to have improved a bit. Quick note: each row of data in Jungle Scout’s spreadsheet comprises a month of data. Ours above in our Beta group comprises a day of data.
OK, so now we have a head to head using the data that Jungle Scout published and made available. Let’s dive in.
Case Study Methodology
We followed the same data cleansing steps as described by Jungle Scout in their case study. It consisted of 3 steps:
- Removing records with buy box controlled less than 99% of the time
- Recording the “<5” estimates to 4
- Filtering out child ASINs that shared rank with other child products
To see their initial case study methodology click here. And to see their recent May 29 update click here.
Median Absolute Percentage Error
As a reminder, this is the metric Jungle Scout chose to utilize in their case study. There’s nothing particularly wrong with using that metric. It’s a perfectly reasonable measure. It just doesn’t provide full context in our opinion which is why we’ll be presenting more than just Median Absolute Percentage Error.
I’m pretty sure that speaks for itself, but if it doesn’t, it effectively means that about 50% of our predictions have an absolute error rate of less than 1%
Mean Absolute Percentage Error
Again, just for the record, I have an immense respect for Jungle Scout and what they’ve done for sellers, for SAAS in this space, and just for the market in general. They created a genre in 2015, much as we did with ZonBlast in 2014, and the contributions they and their leadership have made has been phenomenal. So I don’t intend on this being a jab in their direction.
However, we can see here why they preferred median to mean in publishing their case study…Viral Launch beat them in this metric with 31% average error rate vs Jungle Scout’s 44%.
Buuuuut we beat everyone by a factor of more than 5, coming in at 6% error rate.
A Mean Absolute Percentage Error of 6% means that the average error on estimated sales was only 6%. Several of the tools/algorithms above are so far off from the mean because of many huge outliers. For example, the biggest error made by Jungle Scout’s algorithm was 3518% (ASIN B079YVN2GZ, 11 units sold with an estimated figure of 398. Phoenix’s RISE estimated 11)).
Percent Perfect Estimates
OK, this is where The RISE algorithm really shines, obviously. We’re correctly estimating the EXACT actual sales number 31% of the time.
Next best? Viral Launch, Helium10, and Jungle Scout at about 2% each.
Phoenix vs Jungle Scout vs Viral Launch
The density plot below illustrates the distribution of Percentage Errors from the top 3 (Phoenix, Jungle Scout, and Viral Launch).
Note here that to remain consistent with Jungle Scout’s presentation our calculation here is:
Error = (Units Sold – Estimated Sales)/Units Sold
Errors less than 0 indicate that the sales are overestimated and errors higher than 0 indicate that the sales are underestimated.
Note: Only errors from -100% to 100% are shown here. These are all the data points for the Phoenix RISE algorithm, since the maximum errors we had were less than 100%. Thus some data points are not shown above for Jungle Scout and Viral Launch since some error percentages were below -100% and higher than 100%.
As you can see, Phoenix’s RISE algorithm is untouchable when it comes to precision (remember, precision is how repeatable and consistent results are). 31% of estimates match perfectly and half of the estimates lies within 1% absolute error. The distribution of our algorithm is clearly concentrated around 0 and with the highest probability mass at 0.
To compare the distribution of errors between all algorithms, we selected 25% of the the most accurate and 25% of the least accurate estimates for each algorithm separately.
Note that not all ASINs have predictions available for all algorithms, so the actual number of ASINs in each group can vary (but always reflects the percentage).
Below are the mean and medians of percentage errors for the groups. Once again, Phoenix proves to be the most accurate in each, showing an incredibly stable and consistent quality of predictions.
See below for box plots with outliers.
And to duplicate the presentation and appearance of Jungle Scout’s box plot which excluded outliers, here’s where we stand…
It’s apparent that we have incredibly precise estimations (very small box and short whiskers and smallest outliers) and the ability to produce the least biased estimates (median closest to zero).
There are some very small differences in Jungle Scout’s box plot vs ours which as of now we’re unable to explain.
Normally a box plot works like this.
From Jungle Scout’s Case Study:
Here are a few tips to help you read this chart:
- The orange line represents the median and should be as close to zero as possible.
- The height of the box shows the “inter-quartile range”, which is a fancy way of saying the middle 50% of the results. The smaller the box, the greater the accuracy.
- The T-shaped lines or “whiskers” show the lower 25% and upper 25% of the results. Again, the smaller the whiskers, the more accurate the tool is.
Per our data scientist, what lies between the lower 25% and the upper 25% is exactly the middle 50% of results, so this description appears to be incorrect (otherwise points described in 2nd and 3rd bullet points would be overlapping). It’s a bit difficult to say what they’ve plotted and what the source of very small (and arguably negligible) differences between the two plots are but we are more than happy to be corrected by the Jungle Scout team or commenters who notices an error on our end. We felt a need to mention this purely to provide full transparency on the analysis of the provided data and our findings.
Additional Thoughts & Caveats
We fully expect to, well for lack of a better word, #trigger some services and die-hard fans when it comes to presenting the above data. What we’ve done here was to try to in the most holistic and complete way possible present all of our data from all sides. We’re not just cherry picking one data we’re great at and using that as a point of comparison. We’re not just being selective with the data we’re using. And we’re obviously not using small sample sizes.
We specifically had our contracted data scientist simply go wild with her analysis to present everything we have. If any services can skew our data – without removing or excluding data points – in a way that present us in a worse light, by all means, please let us know because our data scientist couldn’t do that. (And yes, I asked her to try.)
Further, the beautiful part of this is all of this is about leveling up the space. It’s about making not just our platform, but all platforms that you as a seller rely on, better. Without competition and new players continuing to challenge the status quo and improve the market, there is no movement forward.
Additionally, it’s important to understand the danger in looking at a single data point or small number of data points in drawing conclusions about ANY tool. Generally speaking, the more data the better. All things being equal, the more the data, the higher the degree of certainty and statistical significance.
The inverse of that means that the fewer the data points, the lower the statistical significance, the higher the chance of inaccurate or imprecise results. When we go live with Phoenix, there will inevitably be spaces where our numbers are just off. This is an unavoidable fact. Some categories/niches have far less data to plug into our machine learning algorithm, thus there is bound to be off the wall estimates on occasion. But on the whole, as you’ve seen above, well, we’re kicking arse. Further, our RISE algorithm is not and never will be static. It will continue to be refined so that we can get our analyses even better than what you see above.
What This All Means & What To Do About It
We’ve known since 2014 that the data sales estimation tools put out was inaccurate. You’ve seen the discussions or maybe you’ve even taken part it ones where the premise becomes essentially “use the estimated sales numbers as a relative measure”…that is, not actually looking at the number reported, but rather it as a relative measure vs other products.
Because, of course, you’ve known all along that the sales estimates were wrong. You and Brand Owners like you have been making key business decisions based on metrics that simply shouldn’t be a component of the decision-making process.
We felt that there was a better solution than simply throwing our hands up in the air, settling for mediocrity, and letting you run off with sales estimates that were neither precise nor accurate.
We didn’t settle, and we don’t think you should either.
That’s why we’ll be introducing Phoenix very soon.
Stay tuned for a revolution in the keyword and product research space.
If you scrolled to the bottom, I don’t blame you. It’s a long post and we kinda geek out here. We dove pretty darn deep into analysis of two separate sets of data that yielded the same conclusion:
Our Phoenix extension and the RISE algorithm that it runs on is, it seems, now the most accurate and most precise in the space. No tool on the market touches the accuracy and precision that Phoenix is exhibiting.
If you’d like to run your own data analysis, here are the two spreadsheets referenced here:
- Jungle Scout’s original spreadsheet with our Phoenix values included
- Our Beta users’ data
Note: to recruit Beta users to provide data, we agreed to keep ASINs private. Specific only to the good folks at Jungle Scout, Viral Launch, and Helium10, we are open to sharing ASINs so that you can pull your algorithm’s numbers and run your own analyses provided three conditions are met:
- Our Beta users need to agree to disclosure. I agreed to keep their ASINs private to protect their business. Unless they explicitly state that we can share their ASINs, I intend on keeping that promise.
- The ASINs would need to be hidden/removed from any raw spreadsheets or analyses you present. ie: it can only be used/analyzed internally. Only your analysis would be published.
- You actually publish your analysis on your blog. ie: we will not be handing over data unless you agree to publish that analysis to your blog, regardless of the result.For those 3 companies, if this is of interest, get in touch with our CSO, Barcus Patty and we’ll open the conversation with our Beta users to get permission, if they so choose.