R in Eighty Days

How much R can I learn from scratch in eleven-and-a-half weeks?


Day 3: Saturday 17th August 2019

I just want to go back to what I was saying about vectors yesterday. Perhaps even retract what I said about them. I was quite rude about vectors. I was dismissive. I was a bit "What's the point of vectors?" So I've now had a chance to sleep on that jumped-to conclusion, and I want to re-think things for a moment.

I think that all data analysts have some sort of 'mental model' in their heads when they think about 'data'. My guess is that the mental model we use (and we'll all use different mental models) will be heavily influenced by the one that was formed when we first encountered data as a thing to be properly thought about. My mental model of data, for example, was formed in the Spring of 1987 when I had my first proper professional encounter with data, and—because the data in question was being stored in an Oracle database—the way I learned to think about data was as data tables where the rows were individual records (attendances, admissions, patients, whatever) and the columns were the things we recorded about those attendances, admissions or patients. And the things we recorded will be variables like firstname, date of birth, GP practice, and so on.

The important thing, though, is that I learned to think about data—raw data, that is—as rows and columns organised in a table structure. It's as if my mental model of data is that data already comes pre-arranged into a grid or a matrix or a table. And I can then use a query language to select which rows I want, which columns I want, how I want to order it, whether I want to crosstab it, and so on. The important thing is: I always think of data as organised into some kind of table structure, even in its raw state.

And this mental model of what data looks like has served me pretty well over the last 32-and-a-half years. It has never really let me down.

But last night, as I lay awake in my bed, pondering whether or not I'd been a bit harsh in my snap judgement of vectors when I was working my way through Chapter 3 of R in a Nutshell, it occurred to me that what I might be encountering with vectors was—for me at least—a different way of thinking about raw data. It was a challenge to my mental model. And that was why I was reacting to it so ill-manneredly.

The vectors I was seeing on page 23 of the textbook. This was data displayed as just rows of numbers. (Yes, I know it doesn't always have to be numbers, but let's just stick with numbers for the moment.) With the numbers separated by commas. This function in R, that I see written down as, for example: c(1,2,3,4,5,6,7,8,9,10) (Oh, and by the way, I had to Google to see what the 'c' actually stands for and I think it stands for 'combine'), it's creating raw data. And it's raw data organised as just a row of numbers, separated by commas.

So it's a challenge to my pre-existing way of thinking about data. It's as if it's taking me one layer further down. Not just raw data, but raw raw data. And it's not as if it's a totlaly alien thing, either, I mean I've spent a large amount of my time over the last 32-and-a-half years importing data into various applications, and one way of doing it is by importing .csv files, and I've looked at .csv files in a text editor and I know what they look like, and—let's face it—they are not a million miles away from vectors in terms of how they look.

So that's what I wanted to say about vectors, as a sort of apology for yesterday's mini-outburst. And this sort of thinking is interesting to me becuase I am intrigued to find out how learning about R will affect the way I think about data, and how it is structured, and whether different structures enable new possibilities.

Previous Day ... Next Day