Because age and gender demographics have traditionally been used by the advertising industry, viewing patterns have been well established and are mostly consistent over time. With the availability of advanced audiences from a variety of data sources, more precision in reaching an audience is possible. By understanding how advanced audiences differ from traditional demographics, we stand to gain in advertising efficiency.

As an example, a campaign for a new dog treat wants to target dog owners. The target could be created using MRI fusion data or a first-party match of a database of dog owners onto Nielsen respondents. Before creating a proposal for the campaign, clypd users could benefit from exploring how the Dog Owner segment has historically viewed – in particular, which networks and dayparts have higher audience indices for this target and are likely to be effective for ad placement.

To enable this audience exploration, the Data Science team at clypd is building a planning tool that includes this viewing exploration, as well as information on the profile of the audiences and eCPM for a selected target. Within the viewing reports of this prototype, metrics for an age/gender demographic can be displayed as a baseline measure. Armed with this information, a user can find the dayparts and networks that will deliver the optimal reach or impressions for their target. Our prototype also includes a schedule feature which allows the user to see the historical TRPs and reach for a schedule composed of unit counts by network/daypart.

An example of one of the outputs is shown below. It can be seen that Network B, Late Night has the highest rating index for Dog Owners.

**Building the Prototype – Data Structures and Probabilities**

One of the challenges that the team tackled was the need to produce audience and reach values for all ad-supported Nielsen reported networks and dayparts, and to produce these results quickly. A traditional reach calculation using Nielsen minute-level viewing data requires several steps to transform the data to calculate the number of unique viewers to a network during a daypart. The Nielsen panel is also updated daily, which requires us to create a single panel of viewers for a selected time period.

To allow for more efficient calculations, the team developed infrastructure that would produce daily viewing probabilities from the Nielsen viewing data. We also developed a strategy for precomputing as many of the steps as possible to avoid repeating the same calculation steps.

The team created Amazon Spectrum tables to store the hourly aggregation of minutes viewed by a Nielsen panelist to any network on a particular date. Daypart probabilities were then calculated per respondent using the hourly aggregations divided by the available minutes in that daypart.

To give the prototype flexibility, the daypart probabilities are produced by month and quarter. Probabilities are produced each week using Python modules that first calculate the probabilities within the team’s Redshift database. The newest probability files are stored within a S3 bucket created for these Spectrum tables. The prototype would have required 18GB of minute-level viewing data to produce this report, using the Nielsen respondent level files. We achieved a 99.2% reduction in data volume by reading 0.141GB of viewing probabilities in our prototype.

**Using Viewing Probabilities to Calculate Audiences**

We can closely approximate average audiences and reach using the viewing probabilities and calculated sample weights. The tables below present the average audience and reach calculations using probabilities for five respondents to Network A for a single hour. Each probability below represents the likelihood that the respondent would view a particular minute within that daypart to Network A.

From each Nielsen respondent, a portion of the average audience can be estimated. In this example, the first respondent represents 1500 people in the population and has a 98% probability of viewing. The average audience based on the first respondent is approximated as 0.98 x 1500 = 1476 people in the population. Summing across respondents produces the total average audience of the population. A rating is calculated as the average audience multiplied by 100 divided by the universe estimate. Average time spent can also be calculated: person 1 is expected to view 0.98 x 60 = 58.8 minutes.

For reach, we first calculate the probability of __not__ viewing, using the inverse probability (1-p) for each respondent. The overall probability that a respondent did not watch any of the minutes in the daypart is calculated as the (1-p)^{m} following basic probability rules. P is the viewing probability and m is the number of minutes.

For respondent 2 below, the probability of not viewing is 0.988^{60} = 0.485. The probability of viewing for respondent 2 is 0.515. Applying this to the respondent weight, we estimate reached persons as 0.515 x 2200 = 1134. If we repeat this methodology across all sample respondents, we can produce the total reach estimate in persons. Multiplying the reach estimates in persons by 100 and dividing by the universe estimate gives us the reach expressed as a percentage.

An audience report across all networks and dayparts for a predefined daypart set can be generated within 480 seconds from the prototype for reach and average audiences with this viewing probability solution. The schedule report contains a subset of all networks and dayparts and can be generated in even less time. These timings are acceptable for a prototype and the production version will be quicker. With this viewing probability solution, the user can move from the creation of the Dog Owners target using the Audience Definition tool to the campaign planning stages armed with information they need to create the perfect ad schedule to help market their new dog treat.