R in Eighty Days

How much R can I learn from scratch in eleven-and-a-half weeks?


Day 11: Sunday 25th August 2019

I want to get clearer on the basics of R so I'm going to start again with Chapter Three of R in a Nutshell. I started easy: I've just done the basic arithmetic examples (17 + 3) on page 19. So far, so pleasingly simple.

But then I come across this sentence: "In R, any number that you enter in the console is interpreted as a vector. A vector is an ordered collection of numbers." As I said on Day Three, I think it's time I started taking vectors seriously. That sentence about vectors was part of the explanation of why you get the [1] displayed in the console before the answers to the various bits of arithmetic. For example:

> 17 + 3
[1] 20

But I'm not entirely sure I get it. I mean why [1]? Why not [2]? Or [5]? Or [478]? The book is not making this clear. But I'll continue anyway. Longer vectors (longer than one number, I assume) can be created using the c(…) function.

OK, let's give this a go. So c(0,1,1,2,3,5,8) gives us the first seven elements of the Fibonacci Series. But when you get the 'answer' it's still a [1] at the start of the line. Even though there are seven numbers in the vector I've just created.

OK, hang on a minute, I might be getting an inkling now. If you type c(1:50) you get a sequence of numbers from 1 to 50 inclusive. And the numbers wrap depending on the size of the console window, and when they wrap you get a new line number [23] or [45], so the [45] number is basically a reminder of where the sequence is up to, I guess…

This number in square brackets at the beginning of the line, it's now being described—I think—as 'the index of the first element on each row.'

Next, it's getting more complicated. The book is telling me you can perform an operation on two vectors. R 'matches the elements of two vectors pairwise and returns a vector.' OK, I sort of get this (I'm now halfway down page 20), but surely this will only work if the two vectors are of equal length. Dear Book, please anticipate the problem this poses….

Oh, phew! 'If the two vectors aren't the same size, R will repeat the smaller sequence multiple times.' Hmmm, OK, I'm not totally sure I understand that sentence, but let's just see if the book's examples makes things clearer...

OK, I get this first example:

> c(1,2,3,4)+1
[1] 2 3 4 5


…but what if one vector has five elements and the other vector only has two elements?

> c(1,2,3,4,5)+c(10,100)
[1] 11 102 13 104 15
Warning message:
In c(1, 2, 3, 4, 5) + c(10, 100) :
longer object length is not a multiple of shorter object length

So, OK, you get a warning. R does a1 + b1, then a2 + b2, and then it runs out of options so it stops and warns you!

Next (bottom of page 20), text characters are being introduced!

> "Hello world"
[1] "Hello world"

This is a character vector. Of length 1, apparently. Even though there are a lot more than 1 character in it!

Oh right! Characters are the things contained within the double quotes.

> c("Hello world","Hello R interpreter")
[1] "Hello world" "Hello R interpreter"

So this is a character vector of length 2. (Is that how you say it? Not 'a two-character vector' but 'a character vector of length 2'..?)

The textbook (top of page 21) is now telling me that what R calls a character, other languages call a string. Yes, I think I understand that now.

You can add comment to R code. Anything after a pound sign (#) (Why does the book call it a 'pound sign'? That is not a pound sign; it's a hastag) on a line is ignored.

> #Here is an example of a comment at the beginning of a line
> 1+2+ #and here is an example in the middle of a line
+ +3
[1] 6

(Slightly puzzled by the fact there appear to be two plus signs in that last example but I'll let that go for now...)

FUNCTIONS are next. (I'm now a quarter of the way down page 21.) Functions are mainly in the following form:

f(argument1, argument 2,…)

Kinda like they are in Excel!

Some examples without the whole f(argument1,argument2,…) thing:

> 17+2
[1] 19
> 2^10
[1] 1024
> 3==4
[1] FALSE

So these are just operators, basically. Not functions. But what is the double == ..? What does that mean? Does it mean actually 'equals'..? Why two equals signs? Don't know, but basically if you want to say 'equals' in R, you need two equals signs!

Anyway, next item on the agenda is VARIABLES. You can assign values to variables and refer to them by name. You use the symbol <-, which is basically a kind of clunkily-typed left-pointing arrow. And you pronounce this clunkily-typed left-pointing arrow as 'gets' for some reason. But for all my cynicism, I can see that this is going to be useful. But you have to be careful about when the substitution gets done. It is at the time the value is assigned to z, not at the time z is evaluated.

> x<-1
> y<-2
> z<-c(x,y)
> #evaluate z to see what's stored as z
> z
[1] 1 2
> y<-4
> z
[1] 1 2

The next bit (I'm now two-thirds of the way down page 22) looks difficult. It's to do with different ways of referring to a member or set of members of a vector.

> b<-c(1,2,3,4,5,6,7,8,9,10,11,12)
> b
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> #let's fetch the 7th item in b
> b[7]
[1] 7

No, actually, I doget this. First, I was confused by the c(1,2,3,4,5,6,7,8,9,10,11,12) bit. I'd forgotten that c() is a function that just creates a vector from scratch, so to speak. And now I see the point of the square brackets and how they contain the item number within the vector. Actually probably best to refer to it as the index.

As for the last example on page 22, this is the whole maths bias, the horrible assumptions made, this is the bit about R tutorials that I dread! Here's what it says: 'fetch only members of b that are congruent to zero (mod 3). In non-math speak, members that are multiples of 3.' Does the author not realise that 'multiple' is also a math word..? And therefore also a complete turn-off to those of us who arenlt comfortable with math? This empathy deficit makes me cross. (Can you tell?)

> b[b %% 3 ==0]
[1] 3 6 9 12

Next: two additional operators that we can use to assign values to symbols. One is the = sign. I mean, isn't this the most obvious one..?

> one <- 1
> two <- 2
> #This means: assign the value of "two" to the variable "one"
> one = two
> one
[1] 2

This example is useful as a reminder that the #comment text usually refers to what follows, as opposed to what's gone before.

Finally, the weird arrow thing also seems to work the other way:

> 3-> three
> three
[1] 3

I guess the thing being pointed at is the alias, whilst the thing that's doing the pointing is the thing you want to be aliased.

The last thing in this section (I'm now at the top of page 24) I decidedly do not understand!

> f <- function(x,y) {c(x=1, y+1)}
> f(1,2)
x
1 3

Previous Day ... Next Day