Creating SAS datasets

You will make up your own names for your SAS datasets and variables. These names must conform to these rules:
  • no longer than 8 characters,
  • start with a letter, and
  • contain only letters, numbers, or underscores (_).
SAS is not case-sensitive. You can use capital or lowercase letters in your SAS variables. However, when you specify filenames (as you do with the include and file SAS commands), you must type it exactly as it exists in UNIX.

The DATA step

The data step is used to describe and modify your data. Within the data step you tell SAS how to read the data and generate or delete variables and observations. The data step transforms your raw data into a SAS dataset.
There are four statements that are commonly used in the DATA Step
  • DATA statement names the dataset
  • INPUT statement lists names of the variables
  • CARDS statement indicates that data lines immediately follow.
  • INFILE statement indicates that data is in a file and the name of the file.
Generally, the data step portion of your program will either look like this: DATA dataname; INPUT varname1 varname2 (etc); . . . In this part of the program, you can create . . . new variables, use if statements, do loops, . . . or other data manipulation statements. CARDS; lines of data . . .

or like this:

DATA dataname;
INFILE 'filename';
INPUT varname1 varname2 (etc);

. . . In this part of the program, you can create
. . . new variables, use if statements, do loops,
. . . or other data manipulation statements.
Each style uses the DATA and INPUT statements. However, the first style has the data lines inside the program, so it uses the CARDS statement. The second style uses data from another file: it uses the INFILE statement to let SAS know where to get the file.


The DATA statement

This statement must begin your DATA step. It is used to name your SAS dataset. All data statements must end with a semicolon.
Example:
data hwk1;
This tells SAS to create a new dataset and call it hwk1. The name you choose is up to you, but it must conform to SAS naming conventions.


The INPUT statement

The INPUT statement comes after the data statement. It designates the names of the variables in your dataset. The variable names must follow the SAS naming rules, and a space separates the variable names in an input list.
Example:
input age weight height;
This is an example of using free format in naming the input variables. It is also possible to specify the columns that the variables occupy.
Example:
input age 1-2 weight 3-5 height 7-8;
This statement tells SAS that the value of the variable age is found in the first two columns of each line, weight occupies columns 3-5 and height is in columns 7 and 8. Notice that column 6 is not used in this example.
SAS reads a dataset one line at a time, reading in each value and putting it into the next variable in the input statement list. When it has filled out the list, SAS moves on to the next line of data. However, sometimes you may want to put several observations for the variables on each line. This is illustrated in the next example.
The dollar sign ($) after a variable name tells SAS that the variable has character values (not numbers). If your data contains character variables, you must let SAS know by following the variable name in the INPUT statement with the dollar sign:
Example:
input age sex $ salary @@;
A line of data might look like this:
32 m 150 20 f  108 22 m 200 
The double 'at' character (@@) is used in an input statement when information for more than one observation will be located on each line.

The CARDS statement

The CARDS statement tells SAS that the data immediately follows on the next line. The keyword is named CARDS because years ago data fed into a computer came on real cards with holes punched to represent different characters or numbers.
Use this style of input when you want to enter the data directly into your program (i.e., not when reading in an external file using the INFILE statement).
Example:
cards;
27 118 63 24 170 70 25 173 73 23 183 68 19 203 78



The INFILE statement

The INFILE statement precedes the INPUT statement in the data step. It tells SAS two things: first, that the data will be coming from an external file, and second, the name of that file.
Example:
infile 'rebound.dat';
Note that the filename must be enclosed in single quotes and must be spelled exactly as it exists in the UNIX system (i.e., capitalization matters).

Popular posts from this blog

Data analytics services market in India to triple by 2015

Web Analytics Market Update, 2012

Text Analytics Platforms Part 1