Useful SAS code

v = variable, k = keywords, o = options SAS user’s guide

Univariate statistics

PROC UNIVARIATE [o /];

VAR v; (variables to process) BY v’s; (sort by) CLASS v-1 v-2; (classify by) FREQ v; (variable represents frequencies) HIST [v’s]; (produce histogram) ID v-1 …; (observation identifier) OUTPUT [OUT = data set] keyword-1=name-1 …; (save to data-set) PROBPLOT v’s; (produce probability plot) QQPLOT v’s; (duh)

RUN;

Will produce everything you could ever possibly want

Options:

NORMAL; for normality tests (use Shapiro-Wilk for n<2000, Kolmogorov-Smirnov for n >2000)
PLOT; for crappy text graphics (stem and left, box plot, normal prob plot)
FREQ; for frequency tables

Methods for continuous data

Student’s t-test commonly used for detecting differences means between two groups

Unpaired: PROC TTEST [o] VAR v’s; (variables to test) CLASS variable; (classification variable) BY v’s; (group by) FREQ v; (frequency variable) RUN;

Paired: PROC TTEST [o] PAIRED pairlist; (variables to compare) BY v’s; FREQ v; RUN;

Pairlist:

(v’s)*(v’s) = compare every v in first list with every in 2nd
(v’s):(v’s) = compare first in first with first in second etc.
pairlist pairlist

Output: variables, n, mean, sd, mean se., t-tests, and folded f-test for equality of variances.

Wilcoxon rank sum

PROC NPAR1WAY [o]; VAR v’s; CLASS variable; BY v’s; FREQ v; EXACT statistic-groups; (compute exact p values) RUN;

One-way ANOVA

Same assumptions as t-test. Non-parametric test is Kruskal-Wallis test.

PROC GLM [o]; CLASS v’s; MODEL dependents=independents [/ o]; (model formula) RUN;

ABSORB v's; (absorb out variabe effects)
BY v's; FREQ v; ID v; WEIGHT v;

CONTRAST 'label' effect values ... ; (contrasts)
ESTIMATE 'label' effect values  ...; (estimate linear combs of parameters)
LSMEANS effects; (essential for ANOVA!)
MANOVA ...;
MEANS effects;
OUTPUT ...;
RANDOM effects;
REPEATED;
TEST;

Prints multivariable equivalents of t-test outputs.

Scatter plots

PROC PLOT [o]; BY v’s; PLOT x * y; RUN;

Correlation

PROC CORR [o]; VAR v’s; (variables to correlate) PARTIAL v’s; (variables to calculate partial correlation coefficients for) RUN;

BY v's;
FREQ v;
WEIGHT v;

Regression

PROC REG [o]; MODEL y={x1 + x2} x3×4 / SELECTION = stepwise RUN;

BY v's; FREQ v;  ID v's; VAR v's; WEIGHT w;
ADD v's; DELETE v's; (interactively add and delete variables from model)

MTEST... ; (test linear combination hypotheses)

OUTPUT...;
PAINT ...; PLOT ...; (generate scatter plots)
PRINT...;

RESTRICT...; (restrict parameters)

Produces typical shitload of output.