Another reason we need Bayesian models and experiments in marketing mix models

Spotify don’t care why you like Bananarama, they just care that you do. For passive prediction problems, causality doesn’t matter, it’s all about model performance. 

Causality does matter when we want to use models to guide active human decision making. And causal inference is never just about following the data - model design is crucial. 

In a brilliant paper currently in pre-print called Your MMM is Broken: Identification of Nonlinear and Time-varying Effects in Marketing Mix Models, the authors describe an interesting challenge (to add to the already quite long list). 

Diminishing returns

It is customary to assume there are diminishing returns to ad spend. This is captured in MMMs by applying transformations such as Hill or negative exponential functions to ad variables. 

On its own, this assumption implies that the optimal, profit maximising spend on a channel is constant over time. (An example I was given at uni - the first pint is more enjoyable than the second; and the second more than the third. So if you’re going to drink 7 pints in a week your total enjoyment is maximised by having 1 a day rather than 7 in one hit…Which of course ignores the interaction effect with your friends’ pints consumption!).

Time-varying effectiveness

It is also often assumed that effectiveness changes over time. For example, there may be seasonal variations in ad responsiveness or, ideally, growing effectiveness on a channel over time as you learn more about how it works. It is common practice to ‘refresh’ models with new data and more advanced vendors will build models with time-varying effects. 

This assumption implies of course that the optimal spend is not constant over time.

The problem is that under certain conditions, MMMs can’t distinguish between the effects, making it impossible to develop good recommendations. 

Why is this happening?

The paper explains - a model can estimate diminishing returns where only time-varying effects exist and vice versa. 

If the data generating process includes only diminishing returns, and spends change gradually, then a model designed to capture only time-varying changes in effectiveness can fit the data well. This is intuitive - at times where spend moves higher, the marginal return should fall. This will be detected by the time-varying model as a fall in effectiveness at the new time point.

There are also circumstances where the opposite is true. For example, if spend always increases over the modelled period, then there is a unique mapping between time and spend level. If effectiveness varies over time, then our diminishing returns model can estimate this as a non-linear (eg diminishing returns) relationship between spend and response. 

What can we do about it?

If nothing else, the paper highlights that standard MMMs, which usually try to estimate (or just assume) diminishing returns, may generate misleading recommendations. The paper describes how Gaussian Processes (GPs) can be used within a Bayesian modelling framework to estimate time-varying effects, and while they suggest it might be possible to model both of these effects within a single model, it will be difficult to disentangle with standard, aggregate MMM-like data. 

As with many of the challenges of MMM (selection bias, multicollinearity), experiments provide a potential solution. The paper recommends what they call ‘separation tests’, designed to help you identify each effect. 

Separation tests

The authors suggest two tests - ‘maximal separation’ and the ‘seesaw’ test. This is the most interesting part of the paper. 

The idea of the maximal separation test is to choose spend levels at each period that generate the maximum predicted difference in sales across the candidate models. So, say model 1 is a diminishing returns model and model 2 is a time-varying effectiveness model, we choose spend such that the predicted response from the two models is at its greatest. The outcome data is then used to update model estimates and we get new model posteriors. As the test evolves the model that better captures the real ad effect should outperform the others.

True DGP is diminishing returns. Source: Dew et al (2024), figure 14

In the paper there is a simulation showing how the test recommends different spend levels at each iteration and how the separation between the effects occurs.

In the case that the true data generating process is a non-linear relationship, the spends ‘seesaw’ between high and low values. This disrupts the time-varying model’s ability to approximate the non-linear relationship and separation is achieved. 

True DGP is time-varying. Source: Dew et al (2024), figure 14

The opposite occurs when the true data generating process is a time-varying effects model. This time the test converges on a roughly constant spend level.

The non-linear model will generate identical response predictions at the same spend level, whereas the time-varying model will generate different predictions as effectiveness evolves. And again, separation is achieved. 

The conditions under which a diminishing model masquerades as a time-varying model are both more intuitive and trickier to identify. For that reason, the authors recommend replicating the pattern that emerges from the maximal separation test - alternating between higher and lower spend values. This is the same idea as ‘pulsing’ incrementality tests.

Why Bayesian?

We typically think about using experiments to inform our model (via informative priors). Here we see a different idea - use Bayesian models to capture the real world complexity, use priors to regularise, and use the models to inform the test. The paper also highlights the power of using simulation to guide model design and decision making. Simulation is a key part of any Bayesian workflow and should be used more widely as a tool for understanding uncertainty and guiding decisions.

A core feature of modern, Bayesian MMMs is the use of Gaussian Processes. These are complicated conceptually, but hugely powerful. We’ve been building complex Bayesian models with GPs for clients for a number of years and they can make a big impact on how we learn from data.

Get in touch to find out if we can help you on your Bayesian journey!

Next
Next

How to measure customer lifetime value