8 Random Numbers

In this lesson we learn about generating “random” numbers in R.

8.1 Generating “Random” Numbers

The random word is between quotes above because R - or any software - cannot really generate truly random numbers. It gets really close though. This is because softwares use an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. This is termed a pseudorandom number generator, which is not truly random, because it is completely determined by a relatively small set of initial values, called the seed.

8.1.1 Sampling from a vector

Below we will sample from several vectors, and emulate tossing a coin, rolling a die, playing the lottery, etc.

# Toss a coin

# Toss 10 coins

# Roll a die

# Roll 10 dice
sample(1:6,10, replace=T)

# Roll a Icosahedron (20 sides die)
sample(1:20, 10, replace=T)

# Lottery 
sample(1:49, 6, replace=F)

# Pick a card, any card...
sample(1:54, 6, replace=F)


Challenge: with what we learned thus far, can you think of a way to use R to return a more veritable deck card (i.e., suits and numbers, and with a face cards -> kings, queens, and jacks)?

  rep(c(1:10, "Jacks", "Queen", "King"), 4), 
  rep(c("of Hearts", "of Diamonds", "of Spades", "of Clubs"), 13),
  sep = " "),
## [1] "10 of Spades"

8.1.2 Concept: Randomness & Distributions

In computational statistics, and in R, random numbers are described by a distributions, which is a function specifying the probability that a random number is within some range. If the random number is continuous this is called “probability density function”, if the random number is discrete then the term is “probability mass function”.

8.1.3 Sampling from a Distribution

How to choose a random number in R? As a language for statistical analysis, R has a comprehensive library of functions for generating random numbers from various statistical distributions.

Distribution R Function
Uniform runif
Normal rnorm
Student’s t rt
F rf
chi-squared rchisq
Exponential rexp
Log normal rlnorm
Beta rbeta
Binomial rbinom
Negative Binomial rnbinom
Poisson rpois
Gamma rgamma
Weibull rweibull
Cauchy rcauchy
Multinomial rmultinom
Geometric rgeom
?Distributions full list

The Uniform Distribution

If you want to generate a decimal number where any value (including fractional values) between the stated minimum and maximum is equally likely, the runif() function is what you are looking for.This function generates values from the Uniform distribution. Here’s how to generate one random number between 0 and 1:

runif(1, min=0, max=1)
## [1] 0.9060579

Of course, when you run this, you’ll get a different number, but it will definitely be between 0 and 1. You won’t get the values 0 or 1 exactly, either.

If you want to generate multiple random values, you can generate several values at once by specifying the number of values you want as the first argument to runif. Here’s how to generate 10 values between 0 and 1.

runif(10, 0, 1)
##  [1] 0.3919717 0.5232624 0.3228991 0.9871804 0.7990972 0.0477729 0.6952032
##  [8] 0.4867851 0.7367453 0.5725431

The (standard) Normal Distribution

\[ y = \frac{1}{{\sqrt {2\pi } }}e^{ - \frac{{z^2 }}{2}} \]

To generate numbers from a normal distribution, use rnorm().

# By default the mean is 0 and the standard deviation is 1

# Define mean and standard deviation
rnorm(1, mean=5, sd=1)
rnorm(5, mean=5, sd=3)
rnorm(100, mean=10, sd=10)

# Check that the distribution looks right, make a histogram of the numbers
hist(rnorm(100, mean=10, sd=10), breaks = 10)

There are helpful online editors to help you learn code for various equations you might want to include. I have found the one at: http://visualmatheditor.equatheque.net/VisualMathEditor.html to be very useful. You can work out the code there and then copy it over to your RMarkdown document in between dollar signs (1 or 2 on either end depending on whether you want the equation in line or in display mode).

The Poisson Distribution

To generate numbers from a poisson distribution, use rpois(). The Poisson distribution is popular for modeling the number of times an event occurs in an interval of time or space. The Poisson distribution may be useful to model events such as:

  • The number of meteors greater than 1 meter diameter that strike earth in a year
  • The number of occurrences of the DNA sequence “ACGT” in a gene
  • The number of patients arriving in an emergency room between 11 and 12 pm

In probability theory, a Poisson process is a stochastic process that counts the number of independent events in a given time interval. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

p ( x ; λ ) = e λ λ x x !  for  x = 0 , 1 , 2 ,

Try to use the function from the above table that is for the poisson distribution rpois(). If you want to know more about how it works type ?rpois in the console.

rpois(10, 1)
rpois(10, 4)

# In the poisson ditribution mean and variance are the same: lambda
rpois(10, lambda=1)
rpois(10, lambda=5)

# Check that the distribution looks right, make a histogram of the numbers
hist(rpois(1000, lambda=1), breaks = 5)

8.2 Wrap-up Exercise

This is an excercise in which you can get very creative with everything you learned thus far.

Question: Let’s say I have a categorical variable which can take the values A, B, C and D. How can I generate 10000 random data points and control for the frequency of each? For example: A = 10% B = 20% C = 65% D = 5%. Any ideas how to do this? Don’t fret if you have no ideas. I didn’t when I first tried to solve it. But it helped me a great deal to practice the skills I learned and how they can be useful.

8.2.1 Solution 1: Elegant and quickest

sample(LETTERS[1:4], 10000, 
       prob=c(0.1, 0.2, 0.65, 0.05))
# But how do I know it is right? Save it to a 'vector' named x
x <- sample(LETTERS[1:4], 10000, 
            prob=c(0.1, 0.2, 0.65, 0.05))

# And then table its results

# If you want the check the exact proportions 

8.2.2 Solution 2: Clever, but dirty

x <- sample(c(rep("A", 0.1*10000), 
              rep("B", 0.2*10000),
              rep("C", 0.65*10000), 
              rep("D", 0.05*10000)),

# Equally good, but perhaps less straightforward
x <- rep(c("A","B","C","D"), 10000*c(0.1,0.2,0.65,0.05))
# But how do I know it is right? table its results and check the exact proportions

8.2.3 Solution 3: Brute force (reversed thinking?)

x <- runif(10000) # Why use runif()? Hint: what it is the function's default?

# Now we rename x values
x[x <= 0.1]              <- "A"
x[x >  0.1  & x <= 0.3]  <- "B"
x[x >  0.3  & x <= 0.95] <- "C"
x[x >  0.95 & x <= 1]    <- "D"

# Check