Analytics, and why they don't work like you think

Matt Basta
Pinecast
Published in
6 min readJan 17, 2019

--

We get a lot of questions about our analytics at Pinecast. Many of these questions are very fair: Pinecast analytics don't work very much like the analytics for other podcast hosts, nor do they work like most analytics features in other non-podcasting applications. This article is meant to help explain how Pinecast analytics work, and why they work that way.

Analytics that change

The most important difference between Pinecast analytics and other analytics products is that Pinecast analytics are dynamic. We keep a living record of every listen and subscription, each tagged with metadata about the circumstances of the corresponding event. When you look at a chart on Pinecast, we're scanning our database for analytics events that match a particular pattern—there isn't some pre-compiled report that we draw from.

This gives us a great deal of flexibility with the types of data we can offer, including real-time analytics data. It also means that we can (and do) adjust our data retroactively.

Why do we adjust

There's a few different reasons why we'd make adjustments. First and foremost, we fix errors. If we find that we've been over or under-counting listens, we can correct that data after the fact. Our massive corpus of data gives us the power to not only find these errors, but fix them retroactively.

Aside from errors, we also find and retroactively adjust analytics for bots. This is a requirement for IAB compliance: analytics data must be routinely audited. We go above and beyond by not just adding and modifying filters for future analytics, but also filtering our historical data. Much of this work involves filtering automated listens, which we call “bots.” Every few months, we perform an audit that flags these in our database.

A lot of folks find this jarring, and rightly so. Seeing your numbers change can be alarming, especially if you make decisions based on them. I hope that these next few sections can shed some light on how analytics at Pinecast work, so you can better understand why your data changes like it does. First, let’s talk about bots.

What is a bot, and why is it listening to my podcast?

We use the term "bot" to refer to any analytics events that are not the result of a human listener's intent to consume an episode. In most cases, this is a computer program that makes automated requests that show up as listens.

These bots are often misunderstood to be malicious, and in almost all cases this is not correct. In fact, many of the bots that we filter are owned by companies like Google. To understand why they're downloading your episodes requires a bit of understanding of how they work.

A web crawler, like those that search engines use, visits web pages to vacuum up information. It might be looking for metadata, marketing data, or other information. On each page, it finds links to other pages which it then in turn visits. In the process of crawling the web, bots find podcast RSS feeds. They treat these as web pages and follow the links found in them. This, in turn, registers listen events. The bots are only doing their job: they likely had no intention of mucking with your analytics.

Why bots are hard to filter

Some bots are easy to detect and filter out. Many report themselves as bots. Others have signatures that are similar to other known bots. Many bots are unknown, however, and new bots appear every day.

We use an elaborate system of filters to minimize the number of bots that trigger listens, but those filters are only as good as the data that’s gone into creating them. If we’ve never seen a bot before, and it doesn’t appear obviously botty, we need to manually flag it as a bot once we get enough signal that it is not a genuine listener. This process is complicated by bots disguising themselves as legitimate apps: this is done to avoid special treatment by websites trying to game search engines or block certain browsers.

A recent bot that we blacklisted had created over 330,000 listens across Pinecast over the last three years. The slow but steady trickle of listens contributed to it not showing up high on our filter lists. It also disguised itself as a version of Chrome that was very recent in past years, but now sticks out like a sore thumb. Last, it had periodically changed the name that it reports itself as, which had helped it to evade detection.

In cases where bots do tricky things like this, it can be difficult to figure out what’s a legitimate listen and what’s not. We’re continuously improving our detection mechanisms, but no filter will ever be 100% foolproof.

Quality over quantity

Our philosophy around analytics is to—first and foremost—provide the highest quality data possible. Data that provides an inaccurate view of your past and present audience size is not useful in any way: it can’t be used for decision making. While some podcasters might view analytics as a tool of success, we view analytics as a tool for improvement, allowing you to market your show and test news ways of engaging your audience while being able to track the results.

Analytics consistency is always a priority, too, but we will never prioritize consistency over quality. Many of our customers have expressed frustration with other hosts who do not make an effort to adjust filters, let alone correct historical data. We take great pride in our ability to make sure that the analytics you see are what we believe to be the most accurate view of your show’s reach. Looking back at 2018, over half of our engineering effort and infrastructure investment was put towards improving our analytics quality.

Analytics that are not homogeneous

One frequently asked question about analytics is about analytics do not behave smoothly across all charts and totals. The answer to this is unfortunately not a simple one, and stems from how listener and subscriber data is collected and measured.

Our analytics come from a variety of sources, both internal and external. Most of the data that we collect ourselves is available in our database almost instantly. Some data is less readily available. Spotify data, for instance, takes up to 36 hours to become available. When it is available, it’s added as part of a daily batch job. This can be jarring, since data will seem to suddenly appear: listen totals can jump by the hundreds in a matter of moments.

Subscribers and what they mean

Heterogeneous data is also often a concern for new users who are not familiar with the notion of subscribers (though they may not know it!). Subscribers and listeners are measured differently:

  • Listens are discreet events that happen at a point in time, representing a single person consuming an episode. This is the number of times your episodes have been heard.
  • Subscribers represent the number of unique users checking your feed on a day-to-day basis. This is the number of people who will automatically receive new episodes when you release them.

There’s no “ping” that a host like Pinecast gets when a user subscribes or unsubscribes, so we instead measure the number of distinct devices that contact us looking for a feed every day. If we see the same device twice in one day, it only counts as one subscriber.

Listens are easily tracked, because they’re events that happened at a particular point in time. Subscribers are trickier, because we want to know whether a device checked a feed on a particular day. We’re also measuring a number that should remain fairly consistent day-to-day (that is, the number of devices checking your feed shouldn’t change dramatically) rather than a number like listens, which will spike when a new episode is released.

The Subscribers total represents the number of subscribers from the previous full day.

Because of the way this is measured, however, subscribers start from zero and count up on a given day. On your dashboard, you’ll notice that the “Subscribers” total at the top is yesterday’s subscriber count. In the chart, today’s subscriber count is often dipped below the previous day’s (since the number is still counting up from zero as we see more devices).

We’ve written a bit more on our support site. If you’re interested, I recommend you check it out.

Going forward

We’re continuing to make large infrastructure improvements to our analytics, including the systems that filter bots and bring in new data. As we grow and add new features, this will continue to be a complicated part of Pinecast’s platform. If you have questions or concerns, please don’t hesitate to reach out.

--

--