Data mining

Classification trees – used to predict categorical responses. Algorithm splits dataset at each branch to maximise some criterion (eg. entropy, information, etc.). Can also predict continuous data (leaf gives expected value)

Problems:

Validation and standard errors:

Many other (more complicated) methods, with various improvement in prediction. Still no consensus on which method is best in different situations.

Ethics

Don’t forget about ethics! Combining information via data warehousing could violate Privacy Act. Data mining raises ethical issues mainly during application – should we use ethnicity if it is a good predictor? Ethics depends on application.