Creating SAS datasets
You will make up your own names for your SAS datasets and variables. These names must conform to these rules:
There are four statements that are commonly used in the DATA Step
Example:
data hwk1;
This tells SAS to create a new dataset and call it hwk1. The name you choose is up to you, but it must conform to SAS naming conventions.
Example:
input age weight height;
This is an example of using free format in naming the input variables. It is also possible to specify the columns that the variables occupy.
Example:
input age 1-2 weight 3-5 height 7-8;
This statement tells SAS that the value of the variable age is found in the first two columns of each line, weight occupies columns 3-5 and height is in columns 7 and 8. Notice that column 6 is not used in this example.
SAS reads a dataset one line at a time, reading in each value and putting it into the next variable in the input statement list. When it has filled out the list, SAS moves on to the next line of data. However, sometimes you may want to put several observations for the variables on each line. This is illustrated in the next example.
The dollar sign ($) after a variable name tells SAS that the variable has character values (not numbers). If your data contains character variables, you must let SAS know by following the variable name in the INPUT statement with the dollar sign:
Example:
input age sex $ salary @@;
A line of data might look like this:
Use this style of input when you want to enter the data directly into your program (i.e., not when reading in an external file using the INFILE statement).
Example:
cards;
27 118 63 24 170 70 25 173 73 23 183 68 19 203 78
Example:
infile 'rebound.dat';
Note that the filename must be enclosed in single quotes and must be spelled exactly as it exists in the UNIX system (i.e., capitalization matters).
- no longer than 8 characters,
- start with a letter, and
- contain only letters, numbers, or underscores (_).
The DATA step
The data step is used to describe and modify your data. Within the data step you tell SAS how to read the data and generate or delete variables and observations. The data step transforms your raw data into a SAS dataset.There are four statements that are commonly used in the DATA Step
- DATA statement names the dataset
- INPUT statement lists names of the variables
- CARDS statement indicates that data lines immediately follow.
- INFILE statement indicates that data is in a file and the name of the file.
or like this:
DATA dataname; INFILE 'filename'; INPUT varname1 varname2 (etc); . . . In this part of the program, you can create . . . new variables, use if statements, do loops, . . . or other data manipulation statements.Each style uses the DATA and INPUT statements. However, the first style has the data lines inside the program, so it uses the CARDS statement. The second style uses data from another file: it uses the INFILE statement to let SAS know where to get the file.
The DATA statement
This statement must begin your DATA step. It is used to name your SAS dataset. All data statements must end with a semicolon.Example:
data hwk1;
This tells SAS to create a new dataset and call it hwk1. The name you choose is up to you, but it must conform to SAS naming conventions.
The INPUT statement
The INPUT statement comes after the data statement. It designates the names of the variables in your dataset. The variable names must follow the SAS naming rules, and a space separates the variable names in an input list.Example:
input age weight height;
This is an example of using free format in naming the input variables. It is also possible to specify the columns that the variables occupy.
Example:
input age 1-2 weight 3-5 height 7-8;
This statement tells SAS that the value of the variable age is found in the first two columns of each line, weight occupies columns 3-5 and height is in columns 7 and 8. Notice that column 6 is not used in this example.
SAS reads a dataset one line at a time, reading in each value and putting it into the next variable in the input statement list. When it has filled out the list, SAS moves on to the next line of data. However, sometimes you may want to put several observations for the variables on each line. This is illustrated in the next example.
The dollar sign ($) after a variable name tells SAS that the variable has character values (not numbers). If your data contains character variables, you must let SAS know by following the variable name in the INPUT statement with the dollar sign:
Example:
input age sex $ salary @@;
A line of data might look like this:
32 m 150 20 f 108 22 m 200The double 'at' character (@@) is used in an input statement when information for more than one observation will be located on each line.
The CARDS statement
The CARDS statement tells SAS that the data immediately follows on the next line. The keyword is named CARDS because years ago data fed into a computer came on real cards with holes punched to represent different characters or numbers.Use this style of input when you want to enter the data directly into your program (i.e., not when reading in an external file using the INFILE statement).
Example:
cards;
27 118 63 24 170 70 25 173 73 23 183 68 19 203 78
The INFILE statement
The INFILE statement precedes the INPUT statement in the data step. It tells SAS two things: first, that the data will be coming from an external file, and second, the name of that file.Example:
infile 'rebound.dat';
Note that the filename must be enclosed in single quotes and must be spelled exactly as it exists in the UNIX system (i.e., capitalization matters).