This is the first of a series of blogs around building time-series forecasting models. At clypd, we use forecasting models to help media owners and buyers forecast future TV audiences. A successful forecasting model depends on many factors. In this blog, we focus on algorithms, and how we tap into both modern Machine Learning (ML) models and classical statistics models to take advantage of what both offer.
The advancement of Machine Learning and Artificial Intelligence has been creating amazing stories everyday, from the AI assistant and self-driving vehicles to computer programs beating professional Go players. At clypd, we also have lots of success stories of using ML models. With the benefits of better accuracy and better automation, these ML models are an integral part of our forecasting models. At the same time, we also continue to find great value in “conventional” statistics models. So, instead of pitching Data Scientist vs. Statistician, let us look at ML models vs. statistical models, and how we can leverage both types of approaches in building a TV audience forecasting model.
The answers to these questions have a common origin that takes us back 80 years to 1936. Commercial consumer and social research was a new business then – AC Nielsen had been founded 13 years previously (in 1936 it was about the same age as Facebook is now) and Gallup inc. was just one year old. As it turned out, events in 1936 would soon make Gallup front-page news.
Like 2016, 1936 was a general election year in the US and there were opinion polls. One organization that considered itself expert in this field was The Literary Digest, a magazine that had been in the polling business since 1916.
In our last post, we introduced the importance of data management platforms (DMPs) in the television industry. This month, we’ll discuss the importance of set-top box (STB) data in DMPs and programmatic TV.
As we know, data is a core tenet of programmatic TV. The layering of data sources on top of the media activity is essential in understanding the audience composition for the best data-enhanced decisioning.
In the linear TV world, of the many data sources available, perhaps none is more important or particular, than the second-by-second viewership activity from the set-top box. STB data can be used to measure all the activity, including that which is not measured by Nielsen. This long tail inventory primarily being consumed on cable networks constitutes greater than 40% of TV viewership. The challenge lies in the different rules, technologies, and protocols that exist when looking to utilize that STB data in a consistent, coherent manner.