# Control Flow and loops in R¶

## Control Flow¶

### The standard if else¶

p.test <- function(p) {
if (p <= 0.05)
print("yeah!!!!") else if (p >= 0.9)
print("high!!!!") else print("somewhere in the middle")
}


Now pick a number and put it in p.test

p.test(0.5)

##  "somewhere in the middle"


## ifelse()¶

A better and vectorized way of doing this is ifelse(test, yes, no) function. ifelse() is far more useful as it is vectorized.

p.test.2 <- function(p) {
ifelse(p <= 0.05, print("yippee"), print("bummer, man"))
}


Test this with the following sequence. See what happens if you use if vs. ifelse().

x <- runif(10, 0, 1)
x

##   0.27332 0.14155 0.89000 0.07041 0.79419 0.25013 0.02324 0.86766
##   0.41114 0.56165


Now try it with p.test() (uses if).

p.test(x)

## Warning: the condition has length > 1 and only the first element will be used
## Warning: the condition has length > 1 and only the first element will be used

##  "somewhere in the middle"


Now try it with p.test.2()

p.test.2(x)

##  "yippee"
##  "bummer, man"

##   "bummer, man" "bummer, man" "bummer, man" "bummer, man" "bummer, man"
##   "bummer, man" "yippee"      "bummer, man" "bummer, man" "bummer, man"


## Other vectorized ways of control flow.¶

There are many times that you may think you need to use an if with (iterating with a for loop... see below), or ifelse, but there may be far better ways.

For instance, say you are doing some simulations for a power analysis, and you want to know how often your simulation gives you a p-value less than 0.05.

p.1000 <- runif(n = 1000, min = 0, max = 1)


The line above generates 1000 random values between 0-1, which we will pretend are our p-values for differential expression from our simulation.

You may try and count how often it less than 0.05

p.ifelse <- ifelse(p.1000 < 0.05, 1, 0)  # If it is less than 0.05, then you get a 1, otherwise 0.


Our approximate false positives. Should be close to 0.05

sum(p.ifelse)/length(p.1000)

##  0.059


However the best and fastest way to accomplish this is to use the index, by setting up the Boolean (TRUE/FALSE) in the index of the vector.

length(p.1000[p.1000 < 0.05])/length(p.1000)

##  0.059


Same number, faster and simpler computation.

## Simple loops¶

### while() function..¶

I tend to avoid these, so you will not see them much here

i <- 1
while (i <= 10) {
print(i)
i <- i + 0.5
}

##  1
##  1.5
##  2
##  2.5
##  3
##  3.5
##  4
##  4.5
##  5
##  5.5
##  6
##  6.5
##  7
##  7.5
##  8
##  8.5
##  9
##  9.5
##  10


## for loop¶

If I run a loop I most often use for(){} automatically iterates across a list (in this case the sequence from 1:10).

for (i in 1:10) {
print(i)
}

##  1
##  2
##  3
##  4
##  5
##  6
##  7
##  8
##  9
##  10


If you do not want to use integers, how might you do it using the for()?

for (i in seq(from = 1, to = 5, by = 0.5)) {
print(i)
}

##  1
##  1.5
##  2
##  2.5
##  3
##  3.5
##  4
##  4.5
##  5


Using strings is a bit more involved in R, compared to other languages. For instance the following does not do what you want:

.. code:: r

for (letter in “word”) {
print(letter)

}

##  "word"


(try letters for a hoot.)

Instead in R, we have to split the word “word” into single characters using strsplit(), i.e:

.. code:: r

strsplit(“word”, split = “”)
## []
##  "w" "o" "r" "d"


## So for the for loop we would do the following:¶

for (letter in strsplit("word", split = "")) {
print(letter)
}

##  "w" "o" "r" "d"


## More avoiding loops¶

Many would generate random numbers like so.

for (i in 1:100) {
print(rnorm(n = 1, mean = 0, sd = 1))
}

##  -0.1837
##  -0.9313
##  1.648
##  -0.6964
##  0.2112
##  0.3441
##  1.036
##  0.7439
##  0.5859
##  -0.6087
##  -0.4014
##  1.44
##  -0.3906
##  -1.861
##  -0.739
##  -1.204
##  0.07794
##  -1.65
##  1.261
##  0.6753
##  0.6736
##  0.3238
##  -1.316
##  0.2965
##  1.499
##  0.4326
##  0.4488
##  0.8873
##  -1.304
##  -0.347
##  0.3491
##  0.24
##  0.1425
##  -0.2785
##  -0.5072
##  -1.775
##  -0.04051
##  0.9452
##  0.3322
##  -0.01994
##  -0.2308
##  -0.4053
##  -0.5685
##  -1.631
##  -0.1484
##  0.434
##  1.653
##  1.57
##  0.1308
##  -1.059
##  -0.7157
##  -0.8316
##  0.06561
##  0.8243
##  0.1841
##  1.048
##  0.1612
##  -0.9553
##  -0.7569
##  -0.288
##  -1.837
##  0.7301
##  -2.103
##  -1.869
##  -1.298
##  -1.077
##  -0.2139
##  -0.9419
##  0.4694
##  -1.344
##  -0.08514
##  -2.055
##  -0.803
##  -0.7281
##  1.778
##  -1.116
##  1.33
##  0.1535
##  -2.897
##  0.7305
##  1.228
##  1.697
##  -0.8183
##  -1.013
##  -0.634
##  -0.942
##  -0.3395
##  0.1396
##  1.022
##  0.9868
##  -0.7778
##  1.075
##  -0.1029
##  0.2644
##  0.01165
##  0.8025
##  -1.24
##  -0.8865
##  0.981
##  0.5333


We are cycling through and generating one random number at each iteration. Look at the indices, and you can see we keep generating vectors of length 1.

better/cleaner/faster to generate them all at one time

rnorm(n = 100, mean = 0, sd = 1)

##    -0.08683 -1.55262 -1.16909  0.30451 -1.14555  0.76682  0.12643
##    -0.61174 -0.29103 -0.10707 -0.03397 -0.05926  0.27294  1.32693
##   -0.53284  1.83234  0.43959 -0.88991  0.25383  0.96709 -0.23210
##   -1.00190 -1.32289  1.80030  1.15272 -1.82907  0.75989  1.35966
##    0.53943  0.01429 -0.58707 -0.11886 -0.70367 -2.38988  0.08033
##   -0.22795 -0.62166 -0.19832 -1.95990 -0.85127  0.94236  0.37771
##    0.32617 -0.08393 -0.54506 -2.58781 -0.58433  0.20985 -0.41613
##    0.60527  0.51713  1.57950 -0.61079 -0.28564 -0.16444  0.55007
##    0.57258  0.58513 -0.86728 -0.81185 -0.29333 -1.23935  0.46169
##   -1.53586 -0.32583  0.17629 -0.85579  1.04989  1.22120  1.53359
##   -2.37276  1.44393  1.47506  0.40110 -0.10157  0.35485 -0.72068
##   -1.27910  0.63152 -0.65216  1.60160  0.27109  0.50904 -1.00531
##    0.76743 -0.78954 -0.01159  1.06944  1.15661 -0.91031  1.54919
##   -0.84334  2.19994  0.26716  0.02081  0.53577  0.07840 -0.79387
##   -1.18941  1.24745


The not advisable approach

First we initialize a vector to store all of the numbers. Why do we initialize this vector first?

n <- 1e+05
x <- rep(NA, n)


## The step above creates a vector of n NA’s. They will be replaced sequentially with the random numbers as we generate them (using a function like the above one).¶

head(x)

##  NA NA NA NA NA NA


Now we run the for loop.

for (i in 1:n) {
x[i] <- rnorm(n = 1, mean = 0, sd = 1)
}


for each i in the index, one number is generated, and placed in x

head(x)

##   0.2848 -0.5432  1.1391 -1.0901  0.8515  0.5490


However this is computationally inefficient in R. Which has vectorized operations.

system.time(

for (i in 1:n){
x[i] <- rnorm(n=1, mean=0, sd=1)})

##    user  system elapsed
##   0.562   0.023   0.584


We can also use the replicate function to do the same thing. Easier syntax to write.

system.time(z <- replicate(n, rnorm(n = 1, mean = 0, sd = 1)))

##    user  system elapsed
##   0.561   0.035   0.841


This is ~20% faster.

However, since R is vectorized, both of the will be far slower than:

system.time(y <- rnorm(n, 0, 1))

##    user  system elapsed
##   0.010   0.000   0.011


About 65 times faster than the for loop

The general rule in R is that loops are slower than the apply family of functions (for small to medium data sets, not true for very large data) which are slower than vectorized computations.

LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.