Probability models

Marketing model attempt to describe or predict behaviour. Need models that specify a random model for individual behaviour – sum across individuals to get aggregate behaviour, may need to incorporate differences between individuals.

Uses of probability models

Introductory example

Introduce a new product to market – how do we model % who have tried the product? Could treat time of first purchase $T$ as random variable, distributed exponentially with trial rate $\lambda$ . Or assume there are two groups of consumers, triers and never-triers: $P(T \le t) = p F(t | . Need to estimate $p$ and $\lambda$ . Use maximum likelihood (nlm or optim in R)

This model assumes same trial rate for all households (apart from never triers) – over-simplistic! Can allow for multiple segments with different underlying trial rates, but these finite mixture models are hard to fit with many local maxima. An easier approach is to use continuous mixture models as assume that trial rates are distributed $\lambda ~ g(\lambda)$ .

Assume trial rates are distributeed according to a gamma distribution $g(\lambda) = \frac{\beta^\alpha}{\Gamma(\alpha)} \lambda^{\alpha-1} , with shape parameter $\alpha$ and inverse scale parameter $\beta$ . So $F(t) = P(T \le t) = \int P(T \le t | \lambda) g(\lambda) d\lambda) $ = $1 - (\frac{\beta}{\beta + t})^\alpha$ . Again estimate these parameters using ML.

Can extend by:

Counting model

Outdoor advertising exposure: Advertiser can buy a “monthly showing” on set of billboards. Effectiveness measured through reach, frequency and gross rating points (GPRs). Derived from daily traffic maps, one week for each person, want to extend to 4 weeks – model weekly distribution then estimate summary statistics for the month.

Let $X$ be the number of billboard exposures in one week, and is distributed Poisson with rate parameter $\lambda$ , which is distributed gamma over the whole population. Integrating out $\lambda$ gives $P(X = x) – the Poisson-Gamma distribution, aka the negative binomial, with mean $\alpha \beta$ , and variance $\alpha (\beta + . Estimate parameters with ML.

To extend for more than one week, note that if $X(t)$ is the number of exposures over $t$ weeks, then $X(t)$ is also Poisson with rate $\lambda , and $P(X = x) = \frac{\Gamma(\alpha + x)}{\Gamma(\alpha) x!} {( with mean $\alpha t .

Reach = $1 - P(X = 0)$ . Average frequency = $E[X(t)] / \text{reach}$ GRP = reach * average frequency

Timing model

More accurate to ask people “when did you last purchase blah”, rather than “how many times have you bough blah in the last month”, so need way of converting timing to count data.

Poisson purchases corresponds to expontential interpurchase times. If $T$ is exponentially distributed, and $\lambda ~ \text{Gamma}(\alpha, \beta)$ , then $P(T \le t) = 1 - (\frac{\beta}{\beta + t})^\alpha$ . We can then use ML to estimate $\alpha$ and $\beta$ and then plug back into negative binomial distribution to get estimated count data.

Generalisations

What if the exponential-gamma isn’t the most appropriate? What other options are there? Key characteristic of the exponential is its “memorylessness” – this means that the probability that $x$ occurs in the interval $(t, t+ \Delta)$ is independent of $t$ . How can we make it dependent on $t$ ?

Could use a hazard function, $h(t) = \frac{f(t)}{1 - F(t)}$ , the instanteous failure rate at time $t$ , given that it has survived until time $t$ . Uniquely defines distribution of a non-negative random variable. The exponential distribution has a constant hazard function $\lambda$ .

The Weibull distribution is a generalisation of the exponential can represent decreasing or increasing hazard functions: $F(t) = 1 - , $h(t) = c \lambda t^{c-1}$ . If $\lambda$ distributed gamma, then we get a Weibull-Gamma model $P(T \le t) = 1 - {( .

Intuitively, we might expect the probability of first purchase in week $t$ to be a function of the marketing in week $t$ (and in part a function of the previous marketing activity). This can be formalised with proportional hazards regression. Assume covariates have a multiplicative affect, giving $h(t | \lambda, x(t), \beta) = h_0(t | \lambda) exp(\beta^{'} , where $x(t)$ is vector of covariates at time $t$ . If covariates are assumed constant within each time period then $\int^t_0 h(u) = \lambda , where $A(t) = \sum exp(\beta^{'}x(i))$ . And $P(T \le t) = 1 - .

Choice model

Have data on past customer purchases, divided into segments, and believe that some segments are more likely to respond to a mailout than others. Sent test mailout to 3% sample, now want to analyse response by segment to identify most profitable targets (profitable if purchase response rate > cost per letter / unit margin). The standard approach is to mail out to all segments with PRR above the cut off, but this is not a very statistical approach!

So we want to develop a model which models the responses from letters sent, by segment – a binomial distribution. Assume all members of segment $s$ have same probability of reponse $p_s$ , $p_s ~ Beta(\alpha, \beta)$ . Integrating out $p_s$ gives $P(X_s = x_s | m_s) = {(\binom{m_s}{x_s})} , where $m_s$ size of mailout. As usual, estimate $\alpha$ and $\beta$ with ML.

Applying the model

What’s out best estimate of $p_s$ given response $x_s$ to test mailout of size $m_s$ ? Intuitively might expect it to be a weighted average of population mean and mean of that segment. Can use Bayes theorem to refine this intuition. In particular, we will use empirical Bayes to derive a prior from the data (and the posterior!). This gives $\frac{\alpha + – a weighted average.

General approach

If model doesn’t fit well – generalise! Use likelihood ratio test to check (if nested). Use covariates. Use more flexible distribution. Add point effects.