The Amazing Race of Forecast Models

By February 27, 2018 clypd Blog

Deal or No Deal

To plan for a TV advertising campaign, it is important to have a good understanding of the trend in TV audiences: who will be watching what, when, and using what screen. An accurate and reliable forecast is an integral component of clypd’s advanced targeting platform. For this reason, the Data Science team at clypd is always looking for ways to improve our forecast models to be more useful, accurate and reliable.

In the last decade or so, through the help of more data and better technology, lots of new algorithms have been developed. My previous blog post talks about the benefits of using both statistics and machine learning models. Recently, the data science team held an internal competition to examine how various models pair up against each other. This post provides some details of the competition.

Iron Chefs

Participations and contributions from the following Data Science team members are acknowledged:

Who Wants to Be the Million-dollar Model?

In this competition, we were interested in testing several types of models. Broadly speaking, they can be classified into these types:

Baseline Models

To forecast future audiences, there are several simple yet effective approaches based upon past performances – usually using the same period a year ago, or a most recent time window. When a TV audience is stable, these approaches can provide satisfactory results. Additional adjustment can also be layered on top of the baseline forecast to account for seasonality or other special events. These models are considered baseline models, with which all other models can compare.

Regression Models

Regression analysis has be around for a long time, and could be considered a standard approach for forecasting across many disciplines. Based upon the use cases and available datasets, there are many different types of regression models. We are particularly interested in time series models. Since we have the history of past TV viewing for many years, time-series models are a natural choice to consider.

Non-parametric Models

Compared to parametric models (of which regressions models is a good example), non-parametric model makes no a priori assumptions for the population distributions or the structure of the data. Therefore, it offers much more flexibility. When working with small datasets, non-parametric models can be very effective.

Machine Learning (ML) model

For this competition, we are also interested in testing a service recently launched by AWS, called SageMaker. According to AWS, “Amazon SageMaker is a fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale.” The model we tested using SageMaker is called DeepAR. Using recurrent neural networks (RNN) models, DeepAR can extend classical time-series models to analyze multiple time series.

Is Your Model Smarter than the Baseline Model?

How to compare the performance of each model? We looked into two types of statistics: Bias and Variance. The differences between these two metrics can be illustrated using this example:

We also considered the scalability of each model. Ideally, we would like to find a model with low bias and low variance, and can scale well to deliver forecast by network, stream, hour (or program), and advanced target or age and gender building block.

We collected monthly TV audience data from 2014 to 2017 for 129 networks, including both broadcast and cable networks. Each observation is also characterized by these features:

  • Demographics/Age
  • Stream (e.g., Live)
  • Dayparts (e.g., primetime)

All together, we have about 700k observations to work with. This is not a large data set, but provides a good framework to compare the performance of multiple models.

Survivor of the Fittest Model

Initially, we tested 6 different models:

  • BM1: 4 Week Trailing Average (Baseline)
  • TS1: 4 Week Exponential Smoothing with Seasonal Adjustment
  • TS2: Generalized Linear Regression model
  • NPM1: Attenuated Trend model
  • NPM2: PUT Share Model
  • Deep AR: SageMaker/DeepAR model

The table below is a summary of how these models have performed, across three different demographic segments:

A few interesting findings:

  • Two of the six models (4 Week Trailing Average and Deep AR) had the worst performances. The other four had similar overall performance, with no clear winner.
  • Between the two non-parametric models, NPM1 (Attenuated Trend model) had better (smaller) variations while NPM2 (PUT Share Model) had better (smaller) biases. This gave us an idea of combining the two to get even better results.

The heatmap below illustrates the final results after adding the new model (NPM3, PUT Adjusted Attenuated Trend model). The numbers are differences of each model from the best one in the same row.

We had a clear winner!

Fear Factor

George Box famously said in 1976 that “since all models are wrong the scientist cannot obtain a “correct” one by excessive elaboration.” The final winning model, PUT Adjusted Attenuated Trend (PAAT) model, seems to have reached a good balance between elaborateness and simplicity, and it also incorporated consideration of special events – a “fear factor” of many forecast models.

A good example of special events that impact TV audiences is sport events. Major sport events such as the Super Bowl, Olympics and FIFA World Cup can attract a huge audience. While they are held on a regular basis (once a year or four years), they can be shown on different networks. There are also unplanned events (e.g., breaking news) or one-time special event (e.g., the Simpsons Marathon).

A good forecasting model has to address this issue. For the final PAAT model, we incorporated event-based indexes to adjust forecasts. In future development, we envision that a combination of programming metadata and domain expertise can provide the idea solution.

Ultimate Team Challenge

Through this competition, the Data Science team worked collaboratively to come up with the final model. A few things we have learned through this experience:

  • We were very happy to see that by combining two good models we were able to achieve an even better model. This clearly illustrates the power of ensemble models in forecasting.
  • The bias and variation framework provides a good foundation to understand the performance of a model. More importantly, it can help us identify ways to develop better models.

It is important to keep in mind the limitations and the underlying assumptions of this competition. For example, Deep AR did not perform well in this case, but we believe that, given a bigger dataset, it will prove more valuable.

Join the discussion One Comment

  • Aanchal Iyer says:

    Very interesting article, it is fascinating to know how the data science team works to improve forecast models to be more useful, accurate and reliable. Good brief about the different types of models and their impact.

Leave a Reply