We often need to execute a command or
perform a similar action repeatedly for a large number of variables. For
example, we might want to see the table of frequencies for 10 variables in our data
set, or we want to change the value of 98 and 99 in our data set to missing for
all variables. One way to do is to type command tab 10 times for the first example, and type recode for each variable (as many as they are) for the second example. An
easier way to do both examples is to use a “loops” and let Stata create those
10 frequency tables, or recode all 98 and 99’s to missing.
Loops is used to execute an action/a command
repeatedly for many variables (as many as you like!) at once. Stata have three
commands for performing loops: foreach, forvalues and while.
In this tip, I will teach you how to
loop using foreach loop, which is perhaps the most common one. You will receive tips
about forvalues and while loops soon.
Note: I am
using The Asia Foundation’s A Survey of the
Afghan People (2014) data in my examples. If you do not have
this data, you can download it from the link below:
Examples:
Suppose you want to see frequency
distribution (table of frequencies) of q27a, q27b, q27c, q27d and q27e
questions. The long way is to type five commands for five mentioned variables:
tab q27a
tab q27b
tab q27c
tab q27d
tab q27e
Another way of doing the above 5
commands is to use command foreach and loop:
foreach x in q27a
q27b q27c q27d q27e {
display “`x’”
tab `x’
}
The above loop repeats command tab for each variable mentioned after in in the first line of command. In reality,
the first line of command creates a local macro called x, which includes variables q27a, q27b, q27c,
q27d and q27e. If you recall Stata tip #31, macros are used as a shorthand to a list of variables (or strings/text).
The above loop has four lines of codes.
The first line foreach creates a local macro called x that includes variables q27a, q27b,
q27c, q27d and q27e. The second line executes command display, which displays each elements of local
macro x after using doubt quotes “” in order to display it as a text. The third
line executes command tab for each element of local macro x. The last line, always closes the
loop using a curly bracket.
Rules:
There are some general rules that apply with loops: foreach, forvalues and while:
- Curly brackets must be used to specify the beginning and end of the loop;
- The open brace must appear on the same line as the loop command foreach, forvalues or while;
- Nothing may follow the open brace except, comments; the first command to be executed must appear on a new line; and
- The close brace must appear on a line by itself at the end.
Remember, each line is typed and
entered. But as you follow the above rules, if you enter a line, Stata will not
execute anything until the loop is closed using the curly bracket in the last
line of codes.
The above example demonstrates one way
of using foreach in terms of syntax, the structure of which can be written as below:
1. foreach local-macro-name in list-of-variables
{
Stata has other variants of foreach in terms of syntax, which loops over a
list of variables. Here are four other variants. Remember, the key words are
typed in boldface:
2. foreach local-macro-name of
local `local-macro-name’ {
3. foreach local-macro-name of
global $global-macro-name
{
4. foreach local-macro-name of
numlist list-of-numbers {
5. foreach local-macro-name of varlist list-of-variables {
The 2 and 3 syntaxes obtain a list of
variables, which should be defined in local and global macros already. If I
perform the example about q27 using the syntaxes 2 and 3, it will be as
follows:
2.
local lmacname
q27a q27b q27c
q27d q27e
foreach x of local
lmacname {
display “`x’”
tab `x’
}
3.
global
gmacname q27a q27b
q27c q27d q27e
foreach x of global
gmacname {
display “`x’”
tab `x’
}
The syntax 4 with numlist is different since it takes only a list
of numbers to loop with.
For example, I want to see how many
people of various ages (d2) live in rural or urban areas (m6b). Using foreach, I am tabulate rural/urban variable
from the data set for each years of age from 18 to 22:
4.
foreach y of numlist 18/22 {
display “Age = `y’”
tab m6b
if d2==`y’
}
In the above command, the first line
says: create a local macro y that includes numbers from 18 to 22 (here slash “/” works to indicate
all the integer numbers from 18 to 22, i.e. 18, 19, 20, 21 and 22), and opens
the foreach loop. The second line says: display “Age = ” followed by the elements of loop y, i.e. 18, 19, 20, 21 and 22. The third
line says: tabulate variable m6b conditional to age being equal to elements
local macro y. The last line closes the loop. Since there are 5 elements/numbers
inside local macro y, the two commands display and tab will run 5 times, one for each element of the macro y.
The last variant of foreach loops is the syntax 5 from above.
This variation of foreach loop is similar to the first syntax with in, with slight difference. foreach loop
with in allows a general list, with
elements being typed and separated using a space. foreach loop with of varlist is different in the sense that Stata
gives an interpretation of list to the elements, meaning that Stata knows the
elements typed are not variables, but they are a list of variables. Therefore,
the syntax 5 allows for variable abbreviations.
Some common variable abbreviations include:
-
q27*: an
abbreviation for all variables prefix (variables that start with) q27. Meaning
all variables named q27 and followed by something.
-
q27a-q27e: meaning all
variables q27a through q27e in the order that the variables are recorded in the
data set.
-
_all or *:
meaning all variables.
So, let’s go back to the question of recoding all 98 and 98’s to
missing in our data set. Remember, Stata does not have an undo option per se.
To undo an action, you need to execute command preserve before your
action, and then execute restore to undo any changes you have
brought after execution of preserve. See tip #27 about restore
and preserve. So, I recommend you execute preserve and restore
before performing this action, or simply do not save your data set after your
work, because your original data set which included values 98 and 99 will
disappear.
foreach w of varlist
_all {
capture recode `w’ (98
99=.)
}
The first line of command says: create a
local macro w that includes all variables, and open a foreach
loop. The second line recodes all 98 and 99’s from
all variables (elements of local macro w) to missing. Notice that I have put capture before command recode. capture executes command recode (or any other command for that matter), suppressing all error messages
(if any). In other words, as the command recode will be recoding each variable in your data set, if there is a problem
with one of the variables (maybe one variable is not numeric, and thus does not
have any 98 or 99, which is the case with some variables in our data set),
Stata will stop the loop unfinished. Thus, when capture is used, Stata will suppress that error and continue performing for
all elements of the local macro w. The last line has only the closing curly bracket and closes the loop.
There are other ways to recode all 98
and 98’s to missing. For instance, using an asterisk *:
foreach w of varlist
* {
capture recode `w’ (98
99=.)
}
No comments:
Post a Comment