Choice modelling

When launching a new product (or improving old), want to see how it will impact on market. Need to be able to model effect of each change on range of similar products. Do this with choice modelling, where people are given different scenarious and asked to choose which product they would buy.

Design of experiment

Ideally want design to be orthogonal (effects uncorrelated) and balanced (equal sample sizes per combination) – but often not possible with reasonble number of runs (%lt; 40) per person (especially when including interactions). Use %mktruns and %mktdes SAS macros to help.

In real-life, main effects explain 70-90% of variation, two-interactions 3-6%, three-way <3% – don’t need to worry much 3rd+ order interaction (and are very hard to explain!). Use prior knowledge!

Analysis of data

Typically analysed with multinomial logit model, where $P(c_i | C) = = exp(utility of that choice) / total exp(utility) of all choices. Has same form as proportional hazard regression with Breslow ties, so use SAS PHREG procedure to fit.

As for any regression model, test overall significance test first, then interactions/non-linearity, then individual covariates.

Data mining

Classification trees – used to predict categorical responses. Algorithm splits dataset at each branch to maximise some criterion (eg. entropy, information, etc.). Can also predict continuous data (leaf gives expected value)

Problems:

can be unnecesarrily complicated
structure often unstable (over-fitting)

Validation and standard errors:

applying to training set gives very optimistic errors
cross-validation: separate test dataset (wastes data)
k-fold cross-validating (divide into k parts, keep 1/k out for testing,
train on rest, repeat for all k parts)
bootstrapping, jack-knifing etc.

Many other (more complicated) methods, with various improvement in prediction. Still no consensus on which method is best in different situations.

Ethics

Don’t forget about ethics! Combining information via data warehousing could violate Privacy Act. Data mining raises ethical issues mainly during application – should we use ethnicity if it is a good predictor? Ethics depends on application.

Multivariate analysis

MVA allow analysis of two or more variables simultaneously. Why is this important? See Simpson’s paradox – collapsing over related variable can give misleading results. EDA(exploratory data analysis) usually very worthwhile – will highlight any problems with data etc. Important to think about missing values.

Often have many variables in market research, especially from surveys. MVA can help summarise the data, and reduce chance of obtaining spurious results. Two general techniques: analysis of dependence and analysis of independence.

Principal components: Identify underlying dimension of a distribution. Probably most commmonly used method of deriving factors.

Cluster analysis: identifying separate groups of similar cases. Also used to summarise by defining segments. Two main techniques: hierachical and iterative (eg. k-means). Sometimes do tandem segmentation by clustering on factors – loses information, but makes interpretation easier. Distance measure as important as clustering techqniue.

Structural equation modelling: extracts latent variables, with specified causal structures (confirmatory)

Partial least squares: multivariate genralisation of regression. Extract underlying factors to explain response variation and variation between predictors.

Discriminant analysis: How to best classify observations into (known) groups.

Chi-squared automatic interaction detection (CHAID): descrete response with many discrete predictors, produces tree structure

Correspondence analysis: visual summary of relationship of contingency tables. Vectors in similar directions positively related, opposite directions negatively, distance from origin represents strength of relationship.

Probability models

Marketing model attempt to describe or predict behaviour. Need models that specify a random model for individual behaviour – sum across individuals to get aggregate behaviour, may need to incorporate differences between individuals.

Uses of probability models

understand and profile individual behaviour
understand market level behaviours
provide norms/benchmarks for comparison
prediction

Introductory example

Introduce a new product to market – how do we model % who have tried the product? Could treat time of first purchase $T$ as random variable, distributed exponentially with trial rate $\lambda$ . Or assume there are two groups of consumers, triers and never-triers: $P(T \le t) = p F(t | . Need to estimate $p$ and $\lambda$ . Use maximum likelihood (nlm or optim in R)

This model assumes same trial rate for all households (apart from never triers) – over-simplistic! Can allow for multiple segments with different underlying trial rates, but these finite mixture models are hard to fit with many local maxima. An easier approach is to use continuous mixture models as assume that trial rates are distributed $\lambda ~ g(\lambda)$ .

Assume trial rates are distributeed according to a gamma distribution $g(\lambda) = \frac{\beta^\alpha}{\Gamma(\alpha)} \lambda^{\alpha-1} , with shape parameter $\alpha$ and inverse scale parameter $\beta$ . So $F(t) = P(T \le t) = \int P(T \le t | \lambda) g(\lambda) d\lambda) $ = $1 - (\frac{\beta}{\beta + t})^\alpha$ . Again estimate these parameters using ML.

Can extend by:

adding never-try component
incorporate effects of marketing covariates (eg. advertising weight over
time)
incorporate effects of household covariates (eg. presense of children).

Counting model

Outdoor advertising exposure: Advertiser can buy a “monthly showing” on set of billboards. Effectiveness measured through reach, frequency and gross rating points (GPRs). Derived from daily traffic maps, one week for each person, want to extend to 4 weeks – model weekly distribution then estimate summary statistics for the month.

Let $X$ be the number of billboard exposures in one week, and is distributed Poisson with rate parameter $\lambda$ , which is distributed gamma over the whole population. Integrating out $\lambda$ gives $P(X = x) – the Poisson-Gamma distribution, aka the negative binomial, with mean $\alpha \beta$ , and variance $\alpha (\beta + . Estimate parameters with ML.

To extend for more than one week, note that if $X(t)$ is the number of exposures over $t$ weeks, then $X(t)$ is also Poisson with rate $\lambda , and $P(X = x) = \frac{\Gamma(\alpha + x)}{\Gamma(\alpha) x!} {( with mean $\alpha t .

Reach = $1 - P(X = 0)$ . Average frequency = $E[X(t)] / \text{reach}$ GRP = reach * average frequency

Timing model

More accurate to ask people “when did you last purchase blah”, rather than “how many times have you bough blah in the last month”, so need way of converting timing to count data.

Poisson purchases corresponds to expontential interpurchase times. If $T$ is exponentially distributed, and $\lambda ~ \text{Gamma}(\alpha, \beta)$ , then $P(T \le t) = 1 - (\frac{\beta}{\beta + t})^\alpha$ . We can then use ML to estimate $\alpha$ and $\beta$ and then plug back into negative binomial distribution to get estimated count data.

Generalisations

What if the exponential-gamma isn’t the most appropriate? What other options are there? Key characteristic of the exponential is its “memorylessness” – this means that the probability that $x$ occurs in the interval $(t, t+ \Delta)$ is independent of $t$ . How can we make it dependent on $t$ ?

Could use a hazard function, $h(t) = \frac{f(t)}{1 - F(t)}$ , the instanteous failure rate at time $t$ , given that it has survived until time $t$ . Uniquely defines distribution of a non-negative random variable. The exponential distribution has a constant hazard function $\lambda$ .

The Weibull distribution is a generalisation of the exponential can represent decreasing or increasing hazard functions: $F(t) = 1 - , $h(t) = c \lambda t^{c-1}$ . If $\lambda$ distributed gamma, then we get a Weibull-Gamma model $P(T \le t) = 1 - {( .

Intuitively, we might expect the probability of first purchase in week $t$ to be a function of the marketing in week $t$ (and in part a function of the previous marketing activity). This can be formalised with proportional hazards regression. Assume covariates have a multiplicative affect, giving $h(t | \lambda, x(t), \beta) = h_0(t | \lambda) exp(\beta^{'} , where $x(t)$ is vector of covariates at time $t$ . If covariates are assumed constant within each time period then $\int^t_0 h(u) = \lambda , where $A(t) = \sum exp(\beta^{'}x(i))$ . And $P(T \le t) = 1 - .

Choice model

Have data on past customer purchases, divided into segments, and believe that some segments are more likely to respond to a mailout than others. Sent test mailout to 3% sample, now want to analyse response by segment to identify most profitable targets (profitable if purchase response rate > cost per letter / unit margin). The standard approach is to mail out to all segments with PRR above the cut off, but this is not a very statistical approach!

So we want to develop a model which models the responses from letters sent, by segment – a binomial distribution. Assume all members of segment $s$ have same probability of reponse $p_s$ , $p_s ~ Beta(\alpha, \beta)$ . Integrating out $p_s$ gives $P(X_s = x_s | m_s) = {(\binom{m_s}{x_s})} , where $m_s$ size of mailout. As usual, estimate $\alpha$ and $\beta$ with ML.

Applying the model

What’s out best estimate of $p_s$ given response $x_s$ to test mailout of size $m_s$ ? Intuitively might expect it to be a weighted average of population mean and mean of that segment. Can use Bayes theorem to refine this intuition. In particular, we will use empirical Bayes to derive a prior from the data (and the posterior!). This gives $\frac{\alpha + – a weighted average.

General approach

Identify behaviour of interest at individual level
Choose appropriate pdf $f(x | \theta)$
Specify distribution of $\theta$ across population
Obtain market-level distribution by integrating out $\theta$ .
Estimate parameters of mixing distribution and check model fit
Make predictions!

If model doesn’t fit well – generalise! Use likelihood ratio test to check (if nested). Use covariates. Use more flexible distribution. Add point effects.

Surveys

Introduction

Many stages – with different people (and organisations) involved at each stage – good communication is very important! Statistics useful at many points during the survey, but especialy for design and analysis.

Excellent background environment at: amstat.org

Goal setting

Vital to define objectives, may involved specific accuracy requirements (eg < 10% margin of error). Formality depends on:

if sponsor and researcher are separate people/organisations
how transparent survey process needs to be (very for government)
importance of survey

Survey design

Survery design involves:

choosing a data collection methodology (eg. face-to-face, phone, mail, email, WWW)
questionnaire design
sample design and analysis planning
estimating costs

Try to achieve objects in a cost-effective manner – usually involves some compromise between cost, speed and accuracy.

Sample design and selection

Many design options (eg. stratification, clustering, etc.). Has major implications for analysis.

Data collection.

Contact selected respondents, obtain completed quetionnaires. Statistics involved here in design decisions (quotas, estimated time etc).

Data capture and cleaning

Data entry – often from paper questionnaires, typically proportion re-entered to assess quality. Coding – qualitative to quantitative. Data editing – eliminate inconsistent data, deal with outliers

Weighting and imputation

Data analysis and tables

Many techniques available, cross-tabulation ubiquitious. Need to estimate random sampling error.

Survey sampling

Sampling methods

probability sampling (each object has known non-zero probability of being selected, calculate sampling error)
judgement based.
quota sampleing (appeals to idea of representativeness, but can produce substantial bias)
convenience samples

Terminology

Units: the objects to be survey. Survey population: collection of units that results should describe or explain Sample: subset of population that is surveyed Sampling frame: method of contacting selected sample units, including information need to select them

Sampling frames

Simple example is a list, but more generally, is any procedure and data that effective enables selection of a sample. Good frames require effort to maintain, and most frames are inperfect (exhibiting undercoverage, duplicated units, out-of-date or missing data).

Sampling frames for households and individuals

No list of all occupied private dwellings in New Zealand, but some have been developed. Different frame for telephone and face-to-face surveys. Once household has been selected, need someway of selecting people in that house (eg. Kish grid, last birthday technique).

Telephone sampling

Undercoverage a fundamental problem (only 92% of houses have landline, < 80% of Maori/PI houses). Duplicates also occur.

White pages : telecom sells random samples (but doesn’t include unlisted _{15%), but may be cheaper to use (out-of-date) paper directories
Random digit dialing (naive approach: < 10% success; better approaches up to 60% success)}

Household sampling

Multistage approach widely used – area sample from NZ Geo system, list all households in area, select random number. Many variations (eg. random route)

Business frames

Business directory : excellent frame held by StatsNZ (280,000 enterprises), but not available for market research. Dun & Bradstreet : few duplicates, useful auxiliary info UBD : some auxiliary info, substantial undercoverage (~60%) Yellow pages : more duplicates, undercoverage

Probability Sampling

See:

Non-response

Incentives very important! Weighting can help to reduce effects of unit non-response, but need population data. Post-stratification and rim-weighting (for multiple strate) are most common methods.

Data checking and imputation

Checking: for consistency, for outliers Editing: recontact, replace with missing, impute Must document!

Methods for missing data:

drop any units with missing values (end up with no data!)
pairwise deletion (can be severely biased, not all methods can use)
impute (mean, mean + simulated error, mean + random residual, random hot-deck imputation, NN hotdeck imputation)