5 Vectorized Operations

In this lesson we will learn about vectorized operations. In specific, about element-wise operations and vector variables

While R would make a great calculator, it is designed to help statisticians and data scientists deal with data. And data never comes with one data point. Instead, data usually comes with several observations at a time for a given variable. So we are going to learn how to deal with these R objects.

Say I am interested in the relationship between height, weight, and gender of students with respect to the number of R workshops each student has taken. So I take a sample of size 10 out of the students population and I measure these students’ height, weight, and ask about they gender, and how many R courses/workshops they attended. Then our data would look like this:

Table 1: Our Workshop Sample
Names Age Height Weight Gender Courses
Alan 23 170.6 76.9 Male 1
Brian 31 179.6 59.6 Male 2
Carlos 31 168.9 48.3 Male 0
Dalton 25 164.9 78.6 Male 2
Ethan 32 160.9 54.6 Male 0
Flora 26 161.6 69.8 Female 4
Gaia 35 194.2 56.0 Female 0
Helen 26 171.3 86.5 Female 3
Ingrid 27 165.1 62.9 Female 0
Jennifer 20 165.6 59.4 Female 2

In analyzing a data-set, we are often interested in conducting operations for a whole set of numbers of a given variable (which we can call vectors). A vector can contain numbers, strings, logical values, or any other type. For example, if we take our participants Height, it would be an example of a numeric vector. If we take our participants’ Gender, it would be an example of a string vector. In this way, one can build a data-set with several types of vectors.

Let’s focus on male students’ heights as our example. Suppose we are interested in their Arithmetic mean, how can we calculate it in R? Here’s the formula: \[\frac{1}{n} \sum_{i=i}^{n} x_{i}\]

The first thing we need to do - according to the formula - is to add all the heights. So lets…

157.9 + 172.8 + 180.8 + 146.5 + 174.3
## [1] 832.3

… then we need to divide this sum by the number of observations, which is 5. So, what is the mean height of male students in our sample?

(157.9 + 172.8 + 180.8 + 146.5 + 174.3)/5
## [1] 166.46

Now, let’s do the same operation using a vector. In the previous section we learned how to store a mathematical expression into an object. Now, we are going to store more than one piece of data into an object (i.e., vector). So first, we need to name our vector, let’s call it “height”. And it will receive all five values. In R, we do this by “combining” or “concatenating” several values, so we use the “c” in front of a parenthesis, with values separated by commas.

height <- c(157.9, 172.8, 180.8, 146.5, 174.3)

Let’s check if we created our vector correctly. Type height in you console. It should return the following:

height
## [1] 157.9 172.8 180.8 146.5 174.3

Now that we created our vector, we can do the same operations we did above for our height vector. This is one of the main advantages or R over other statistical software.

So, type in your Console or Source panel the following expressions:

Multiplication height * 2

height * 2
## [1] 315.8 345.6 361.6 293.0 348.6

Division height / 2

height / 2
## [1] 78.95 86.40 90.40 73.25 87.15

Exponentiation height ^ 2

height ^ 2
## [1] 24932.41 29859.84 32688.64 21462.25 30380.49

Vectorized operations are one of the most important strengths of R because it facilitates immensely the process of dealing with data. For example, if we wanted to calculate the mean of height for males, all we have to do is to know the function in R that calculates the: mean(), and put our vector inside it.

Mean

mean(height)
## [1] 166.46

Variance

var(height)
## [1] 194.743

Standard Deviation

sd(height)
## [1] 13.95503

Median

median(height)
## [1] 172.8

Range

range(height)
## [1] 146.5 180.8

Summary

summary(height)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   146.5   157.9   172.8   166.5   174.3   180.8

We can also do regular operations with vectors without naming them

c(1, 2, 3, 4, 5) * 2
## [1]  2  4  6  8 10

Can you guess which mathematical operation the below code is yielding?

c(1, 2, 3, 4, 5) * c(1, 2, 3, 4, 5)
## [1]  1  4  9 16 25

How about this last one? Can you guess? height - c(1, 2, 3, 4, 5) * c(5, 4, 3, 2, 1)

height - c(1, 2, 3, 4, 5) * c(5, 4, 3, 2, 1)
## [1] 152.9 164.8 171.8 138.5 169.3

5.1 Practicals

5.1.1 Vector and R functions

Create a vector called height.female which is composed by the height of all women in our sample, and another vector called n.R.courses for the number of R courses/workshops for each participant. Then for these two vectors, calculate the mean, median, variance, standard deviation, and range. Now, choose at least 5 (five) other functions from the list below and apply to both created vectors. See if you understand what the function returns.

R function Description
max(x) Largest Value
min(x) Smallest Value
mean(x) Arithmetic Mean
sum(x) Sum
median(x) Median
var(x) Variance
sd(x) Standard Deviation
abs(x) Absolute value
range(x) Range
length(x) Length
diff(x, lag=1) lagged differences
scale(x) column center or standardize a matrix.
sqrt(x) square root
ceiling(x) ceiling(3.475) is 4
floor(x) floor(3.475) is 3
round(x, digits=n) round(3.475, digits=2) is 3.48
log(x) natural logarithm
log10(x) common logarithm
exp(x) e^x
summary(x) Min, 1st Quan, Median, Mean, 3rd Quan, and Max
quantile(x) sample quantiles

5.1.2 Boolean expressions in R

Boolean expressions evaluate to either TRUE or FALSE. A crucial part of computing involves conditional statements. Is this value bigger than other? Are two vectors the same size? etc. Questions can be joined together using words like ‘and’ ‘or’, ‘not’. In R, < means ‘less than’, > means ‘greater than’, and ! means ‘not’ (see Table below).

R function Symbol Description
! logical NOT
& logical AND
logical OR
< less than
<= less than or equal to
> greater than
= greater than or equal to
== logical equals (double =)
!= not equal
&& AND with IF
double upright bars OR with IF
xor(x,y) exclusive OR
isTRUE(x) an abbreviation of identical(TRUE,x)

For all these logical statements, try to figure out (before running the code) the result/answer. Then run the code by pressing ctrl + enter on the desired line.

```
# Is true equal to false?
TRUE == FALSE

# T and F are shorthand for TRUE and FALSE. Try this:
T == TRUE
T == F
T != F

# Is 4 smaller than 4
3 < 4

# Is 2 + 2 equal to 5
2 + 2 == 5

# Is 2 smaller than 5
2 < 5

# Is 7 smaller or equal than minus 2
7 <= -2

# Try to figure these out
3 > (3+1)
4 >= 4
(3/4) == (9/12)

# The symbol '!' is a negation of a logical statement
!TRUE
!F
2^4 != 4^2
!(2 < 1)
!(3 < 6)

# The ampersand symbol & means "and". A statement is TRUE only if the expressions on both sides of the operator are TRUE. One can also think of "intersection" as in set operations 
3*4==12 & 6/8<1
(3 < 5) & (2 > 0)
(2 < 3) & (5 > 5)

# The symbol | means "or". The | operator is TRUE if at least one of the expressions surrounding it is TRUE. You can also think in terms of set operations as in "union" of sets.
(3 < 5) | (2 > 3)
(2 < 1) | (5 > 5)
TRUE | FALSE
FALSE | TRUE
FALSE | 2+2==4

# Can you guess?
c(1, 2, 3, 4, 5) <= 3
((5>4) & !(3<2)) | (6>7)

# Most Boolean operators act element-wise.
# %in%  is a  matching operator
c(1, 2, 3, 4, 5) %in% c(1, 2, 3)
height %in% c(157.9, 172.8, 180.8, 146.5, 174.3)
height.female %in% c(161.6, 194.2, 171.3, 165.1, 165.6)

# %%  is the symbol for modulus. In computing, the modulo operation finds the remainder after division of one number by another (sometimes called modulus)
5 %% 2
9 %% 3
V <- c(3,2,8,6,5,6,11,0)
I <- (V %in% 2 == 1)

# Lets try to use what we learned thus far with our vectors
height == height.female
height > height.female
height < height.female

```

5.2 Exercises

These exercises are slightly adapted (shamelessly copied with minor adjustments) from R-exercises, a website that offers many exercises for you to test your R skills. They also offer a R Course Finder which catalogs several R courses on MOOCs (Massive Open Online Courses) such as Coursera and Khan Academy and other online learning platforms (e.g. Udemy, EdX, Lynda.com).

5.2.1 Exercise 1

There are two main different type of interest, simple and compound. To start let’s create 3 variables, initial investment (S = 100), annual simple interest (i1=.02), annual compound interest (i2=.015), and the years that the investment will last (n=2).

Simple Interest: define a variable called simple equal to \(S * (1 + i1 * n)\)

Compound Interest: define a variable called compound equal to \(S * (1 + i2) ^ n\)

S  <- 100
i1 <- 0.1
i2 <- 0.09
n  <- 2
simple <- S*(1 + i1*n)
compound <- S*(1 + i2)^n

5.2.2 Exercise 2

It’s natural to ask which type of interest for this values gives more amount of money after 2 years (n = 2). Using logical functions <,>, == check which variable is bigger between simple and compound

simple>compound
## [1] TRUE

5.2.3 Exercise 3

Using logical functions <,>, ==, |, & find out if simple or compound is equal to 120

Using logical functions <,>, ==, |, & find out if simple and compound is equal to 120

simple>=120|compound>=120
simple>=120 & compound>=120
## [1] TRUE
## [1] FALSE

5.2.4 Exercise 4

Formulas can deal with vectors, so let’s define a vector and use it in one of the formulas we defined earlier. Let’s define S as a vector with the following values 100, 96. Remember that c() is the function that allow us to define vectors.

Apply to S the simple interest formula and store the value of the vector in simple

S <- c(100,95)
simple <- S*(1 + i1*n)

5.2.5 Exercise 5

Using logical functions <,>, == check if any of the simple values is smaller or equal to compound

simple<=compound
## [1] FALSE  TRUE

5.2.6 Exercise 6

Using the function %/% find out how many $20 candies can you buy with the money stored in compound

compound%/%20
## [1] 5

5.2.7 Exercise 7

Using the function %% find out how much money is left after buying the candies.

compound%%20
## [1] 18.81

5.2.8 Exercise 8

Let’s create two new variables, ode defined as rational=1/3 and decimal=0.33. Using the logical function != Verify if this two values are different.

rational <- 1/3
decimal <- 0.33
rational!=decimal
## [1] TRUE

5.2.9 Exercise 9

There are other functions that can help us compare two variables.

Use the logical function == verify if rational and decimal are the same.

Use the logical function isTRUE verify if rational and decimal are the same.

Use the logical function identical verify if rational and decimal are the same.

rational==decimal
isTRUE(rational==decimal)
identical(rational,decimal)
## [1] FALSE
## [1] FALSE
## [1] FALSE

5.2.10 Exercise 10

Using the help of the logical functions of the previous exercise find the approximation that R uses for 1/3. Hint: It is not the value that R prints when you define 1/3

1/3==0.3333333333333333
## [1] TRUE

5.3 Advanced Exercises

  1. Calculate square root of 729
  2. Create a new variable ‘b’ with value 5124
  3. Create a vector numbers from 1 to 21 and find out its class
  4. Create a vector containing following mixed elements {2131, 24, ‘j’, 2, ‘b’} and find out its class
  5. Initialise a character vector of length 26
  6. Assign the character ‘a’ to the first element in above vector
  7. Create a vector with some of your class mates names (at least 5)
  8. Get the length of above vector
  9. Get the first two friends from above vector
  10. Get the 2nd and 3rd friends
  11. Sort your friends by names
  12. Reverse direction of the above sort
  13. Create with rep() and seq() R functions the following vector: ‘a’,‘a’,‘a’, 1,2,3,4,5,11,13,15,17,19,21
  14. Sample 50 random numbers between 1 to 100
  15. Sample 50 random numbers between 1 to 500, with replacement
  16. Find the class of ‘iris’ dataframe, find the class of all the columns of ‘iris’, get the summary of ‘iris’, get the top 6 rows, view it in a spreadsheet format, get row names, get column names, get number of rows and get number of columns.
  17. Apply the above functions and inspect results on ‘iris’ (a base R dataframe)
  18. Get the last 2 rows in last 2 columns from iris dataset
  19. Get rows with Sepal.Width > 3.5 from iris
  20. Get the rows with ‘versicolor’ species using subset() from iris

Below you find the answers

sqrt(729)
b <- 5124
one_to_21 <- 1:21
class(one_to_21)
my.vector <- c(2131, 24, 'j', 2, 'b')
class(my.vector)
charHundred <- character(26)
charHundred
charHundred[1] <- "a"
myFriends <- c("alan", "bala", "amir", "tsong", "chan")
length(myFriends)
myFriends[1:2]
myFriends[c(2,3)]
sort(myFriends) 
myFriends[order(myFriends)]
sort(myFriends, decreasing=TRUE)
myFriends[rev(order(myFriends))]
out <- c(rep('a', 3), seq(1, 5), seq(11, 21, by=2))
mySample <-sample(1:100, 50)
mySample <-sample(1:500, 50, replace=T)
class(iris)  # get class
sapply(iris, class)  # get class of all columns
str(iris)  # structure
summary(iris)  # summary of airquality
head(iris)  # view the first 6 obs
fix(iris)  # view spreadsheet like grid
rownames(iris)  # row names
colnames(iris)  # columns names
nrow(iris)  # number of rows
ncol(iris)  # number of columns
numRows <- nrow(iris)
numCols <- ncol(iris)
iris[(numRows-1):numRows, (numCols-1):numCols]
iris[iris$Sepal.Width > 3, ]
iris[which(iris$Sepal.Width > 3), ]
subset(iris, Species == "versicolor")