stat480

Homework: week 5

Due 21 Feb.

Write a short report describing problems with the diamonds data. This week we're going to do something a bit different: I want you report to be written purely in R. You should be able to copy and paste your report into R and run it to get the results (and this is what I will do). See below for an example.

# Diamonds report
# Hadley Wickham, Stat 480

# Load data
diamonds <- read.csv("diamonds.csv")
 
# Price range of the diamonds
range(diamonds$price)

# This is a heading
# ---------------------------------------------

Use the skills you have learned in class to create new variables, find interesting subsets and graphics which reveal problems with the data. To find problems you will need to use your common sense, and your knowledge of the data. Start by looking for unusual diamonds on each variable individually, and then on combinations of the variables. You might want to create new variables. You will need to present more than a summary of the strange things we have seen in class to get top marks.

Hand in an electronic copy of the copy (by email, with a file name like hadley-wickham.r), and a printed copy of the results when you run it in R (copy and paste the code into R, and then copy and paste the results, including graphics, into word to print out - and make sure it isn't more than 5 pages!)

A good report will motivate the question(s), find a good summary view of the data, and discuss the findings. You might want to speculate on why the results are true (don't forget about where this data came from) and whether your findings agree or disagree with your initial speculation. If you don't find anything, that's interesting too. Don't despair if you don't find anything, just keeping looking and report what you found. Make sure to use headings so I can easily navigate through your report, and include a short summary at the start.