Thursday, July 23, 2015

Stata tip #33: foreach Loops


We often need to execute a command or perform a similar action repeatedly for a large number of variables. For example, we might want to see the table of frequencies for 10 variables in our data set, or we want to change the value of 98 and 99 in our data set to missing for all variables. One way to do is to type command tab 10 times for the first example, and type recode for each variable (as many as they are) for the second example. An easier way to do both examples is to use a “loops” and let Stata create those 10 frequency tables, or recode all 98 and 99’s to missing.

Loops is used to execute an action/a command repeatedly for many variables (as many as you like!) at once. Stata have three commands for performing loops: foreach, forvalues and while.

In this tip, I will teach you how to loop using foreach loop, which is perhaps the most common one. You will receive tips about forvalues and while loops soon.

Note: I am using The Asia Foundation’s A Survey of the Afghan People (2014) data in my examples. If you do not have this data, you can download it from the link below:

Examples:

Suppose you want to see frequency distribution (table of frequencies) of q27a, q27b, q27c, q27d and q27e questions. The long way is to type five commands for five mentioned variables:
tab q27a
tab q27b
tab q27c
tab q27d
tab q27e

Another way of doing the above 5 commands is to use command foreach and loop:
foreach x in q27a q27b q27c q27d q27e {
display “`x’”
tab `x’
}
The above loop repeats command tab for each variable mentioned after in in the first line of command. In reality, the first line of command creates a local macro called x, which includes variables q27a, q27b, q27c, q27d and q27e. If you recall Stata tip #31, macros are used as a shorthand to a list of variables (or strings/text).

The above loop has four lines of codes. The first line foreach creates a local macro called x that includes variables q27a, q27b, q27c, q27d and q27e. The second line executes command display, which displays each elements of local macro x after using doubt quotes “” in order to display it as a text. The third line executes command tab for each element of local macro x. The last line, always closes the loop using a curly bracket.

Rules:

There are some general rules that apply with loops: foreach, forvalues and while:
  1. Curly brackets must be used to specify the beginning and end of the loop;
  2. The open brace must appear on the same line as the loop command foreach, forvalues or while;
  3. Nothing may follow the open brace except, comments; the first command to be executed must appear on a new line; and
  4. The close brace must appear on a line by itself at the end.
Remember, each line is typed and entered. But as you follow the above rules, if you enter a line, Stata will not execute anything until the loop is closed using the curly bracket in the last line of codes.

The above example demonstrates one way of using foreach in terms of syntax, the structure of which can be written as below:
1.         foreach  local-macro-name  in  list-of-variables {

Stata has other variants of foreach in terms of syntax, which loops over a list of variables. Here are four other variants. Remember, the key words are typed in boldface:
2.         foreach  local-macro-name  of  local  `local-macro-name’ {
3.         foreach  local-macro-name  of  global  $global-macro-name {
4.         foreach  local-macro-name  of  numlist  list-of-numbers {
5.         foreach  local-macro-name  of  varlist  list-of-variables {

The 2 and 3 syntaxes obtain a list of variables, which should be defined in local and global macros already. If I perform the example about q27 using the syntaxes 2 and 3, it will be as follows:
2.         
local  lmacname  q27a  q27b  q27c  q27d  q27e
foreach  x  of  local  lmacname {
display  “`x’”
tab  `x’
}

3.         
global  gmacname  q27a  q27b  q27c  q27d  q27e
foreach  x  of  global  gmacname {
display  “`x’”
tab  `x’
}

The syntax 4 with numlist is different since it takes only a list of numbers to loop with.
For example, I want to see how many people of various ages (d2) live in rural or urban areas (m6b). Using foreach, I am tabulate rural/urban variable from the data set for each years of age from 18 to 22:
4.         
foreach  y  of numlist  18/22 {
display  “Age = `y’”
tab  m6b  if  d2==`y’
}

In the above command, the first line says: create a local macro y that includes numbers from 18 to 22 (here slash “/” works to indicate all the integer numbers from 18 to 22, i.e. 18, 19, 20, 21 and 22), and opens the foreach loop. The second line says: display “Age = ” followed by the elements of loop y, i.e. 18, 19, 20, 21 and 22. The third line says: tabulate variable m6b conditional to age being equal to elements local macro y. The last line closes the loop. Since there are 5 elements/numbers inside local macro y, the two commands display and tab will run 5 times, one for each element of the macro y.

The last variant of foreach loops is the syntax 5 from above. This variation of foreach loop is similar to the first syntax with in, with slight difference. foreach loop with in allows a general list, with elements being typed and separated using a space. foreach loop with of varlist is different in the sense that Stata gives an interpretation of list to the elements, meaning that Stata knows the elements typed are not variables, but they are a list of variables. Therefore, the syntax 5 allows for variable abbreviations.

Some common variable abbreviations include:
-          q27*: an abbreviation for all variables prefix (variables that start with) q27. Meaning all variables named q27 and followed by something.
-          q27a-q27e: meaning all variables q27a through q27e in the order that the variables are recorded in the data set.
-          _all or *: meaning all variables.

So, let’s go back to the question of recoding all 98 and 98’s to missing in our data set. Remember, Stata does not have an undo option per se. To undo an action, you need to execute command preserve before your action, and then execute restore to undo any changes you have brought after execution of preserve. See tip #27 about restore and preserve. So, I recommend you execute preserve and restore before performing this action, or simply do not save your data set after your work, because your original data set which included values 98 and 99 will disappear.
foreach  w  of  varlist  _all {
capture  recode  `w’  (98  99=.)
}

The first line of command says: create a local macro w that includes all variables, and open a foreach loop. The second line recodes all 98 and 99’s from all variables (elements of local macro w) to missing. Notice that I have put capture before command recode. capture executes command recode (or any other command for that matter), suppressing all error messages (if any). In other words, as the command recode will be recoding each variable in your data set, if there is a problem with one of the variables (maybe one variable is not numeric, and thus does not have any 98 or 99, which is the case with some variables in our data set), Stata will stop the loop unfinished. Thus, when capture is used, Stata will suppress that error and continue performing for all elements of the local macro w. The last line has only the closing curly bracket and closes the loop.
There are other ways to recode all 98 and 98’s to missing. For instance, using an asterisk *:
foreach  w  of  varlist  * {
capture  recode  `w’  (98  99=.)
}

No comments:

Post a Comment