Useful SAS code
v = variable, k = keywords, o = options SAS user’s guide
Univariate statistics
PROC UNIVARIATE [o /];
VAR v; (variables to process) BY v’s; (sort by) CLASS v-1 v-2; (classify by) FREQ v; (variable represents frequencies) HIST [v’s]; (produce histogram) ID v-1 …; (observation identifier) OUTPUT [OUT = data set] keyword-1=name-1 …; (save to data-set) PROBPLOT v’s; (produce probability plot) QQPLOT v’s; (duh)
RUN;
Will produce everything you could ever possibly want
Options:
- NORMAL; for normality tests (use Shapiro-Wilk for n<2000, Kolmogorov-Smirnov for n >2000)
- PLOT; for crappy text graphics (stem and left, box plot, normal prob plot)
- FREQ; for frequency tables
Methods for continuous data
Student’s t-test commonly used for detecting differences means between two groups
Unpaired: PROC TTEST [o] VAR v’s; (variables to test) CLASS variable; (classification variable) BY v’s; (group by) FREQ v; (frequency variable) RUN;
Paired: PROC TTEST [o] PAIRED pairlist; (variables to compare) BY v’s; FREQ v; RUN;
Pairlist:
- (v’s)*(v’s) = compare every v in first list with every in 2nd
- (v’s):(v’s) = compare first in first with first in second etc.
- pairlist pairlist
Output: variables, n, mean, sd, mean se., t-tests, and folded f-test for equality of variances.
Wilcoxon rank sum
PROC NPAR1WAY [o]; VAR v’s; CLASS variable; BY v’s; FREQ v; EXACT statistic-groups; (compute exact p values) RUN;
One-way ANOVA
Same assumptions as t-test. Non-parametric test is Kruskal-Wallis test.
PROC GLM [o]; CLASS v’s; MODEL dependents=independents [/ o]; (model formula) RUN;
ABSORB v's; (absorb out variabe effects)
BY v's; FREQ v; ID v; WEIGHT v;
CONTRAST 'label' effect values ... ; (contrasts)
ESTIMATE 'label' effect values ...; (estimate linear combs of parameters)
LSMEANS effects; (essential for ANOVA!)
MANOVA ...;
MEANS effects;
OUTPUT ...;
RANDOM effects;
REPEATED;
TEST;
Prints multivariable equivalents of t-test outputs.
Scatter plots
PROC PLOT [o]; BY v’s; PLOT x * y; RUN;
Correlation
PROC CORR [o]; VAR v’s; (variables to correlate) PARTIAL v’s; (variables to calculate partial correlation coefficients for) RUN;
BY v's;
FREQ v;
WEIGHT v;
Regression
PROC REG [o]; MODEL y={x1 + x2} x3×4 / SELECTION = stepwise RUN;
BY v's; FREQ v; ID v's; VAR v's; WEIGHT w;
ADD v's; DELETE v's; (interactively add and delete variables from model)
MTEST... ; (test linear combination hypotheses)
OUTPUT...;
PAINT ...; PLOT ...; (generate scatter plots)
PRINT...;
RESTRICT...; (restrict parameters)
Produces typical shitload of output.