R in Eighty Days

How much R can I learn from scratch in eleven-and-a-half weeks?


Day 22: Thursday 5th September 2019

A strong sense this morning that I've basically been pussyfooting around the whole R thing so far. I've been spending too much time just following textbooks and tutorials without actually doing anything in R, and that it is about time I tried actually doing something for myself in R. So I thought: how about if I just find a .csv file somewhere, one that contains—for example—raw patient flow data, and I import it into R Studio, and I then just try running—say—the skim function on it.

So, first of all, let's find a .csv file. OK, got one. It's a sample of a Flow_ology course file from a couple of years ago, it's called F_Ward_Data_Example and I've saved it as a .csv file in my Project R folder. It's got 25 columns and 100 rows. The idea now is that I open up R Studio, create a new project, then I just import the file. Here goes...

Right, I just opened R Studio. The first thing I notice is that it opens up where it left off the last time I used it. So if I didn't close my project(s) yesterday, it just oopens up again where I left off. I don't know if this is a good thing or a bad thing. But let me close the existing project anyway and then open up a new one.

One annoying thing I've just noticed is that R Studio is not making clear to me whether I am naming the project or simply specifying the location (i.e. which folder) of the project. So as a result of this confusion I now have a new project called Project R that is located within the Project R folder, even though I didn't actually ask R Studio to name it Project R. FFS. Anyway, maybe I can rename the project later. Whatever. It doesn't matter. The key thing now is to see if I can import that .csv file.

Ah! Interesting! I don't really have to do anything! I just have to glance at the bottom right pane in R Studio, the Files tab is the one that is visible and of course I can just see my file, it's just sitting there, waiting to have things done to it. So all I need to do now is somehow get it to the top left pane and do the library(skimr) function and then the skim("F_Ward_Data_Example.csv") thing to it. Or whatever it is. Can I remember how to do that..? OK, here is my first attempt...

library(tidyverse)
library(skimr)
ward_data <- read_csv("F_Ward_Data_Example.csv")
skim(ward_data)

Here is what I think I am doing with that. First I am assigning an 'alias' (ward_data) to my longwinded file name (F_Ward_Data_Example.csv). Second, I am reading a .csv file into R so that R can display it and do whatever it needs to do to it. Third, I am using the skim function to get some useful info about what my dataset looks like.

Reader, it worked! Or, at least it appeared to work. I am happy with it anyway. I got a result. I managed to do something in R that was to do with my own data. That was my objective for today, and I will take that as a significant victory. You could argue that I should be more advanced than this after 22 days of exploring R but I—for now—am happy!

I am so happy, in fact, that I am going to copy and paste the output I got from the console pane so that everyone can see what I got! (It'll look gibberish without the line breaks but I don't care.)

> library(tidyverse) -- Attaching packages --------------------------------------- tidyverse 1.2.1 -- v ggplot2 3.2.1 v purrr 0.3.2 v tibble 2.1.3 v dplyr 0.8.3 v tidyr 0.8.3 v stringr 1.4.0 v readr 1.3.1 v forcats 0.4.0 -- Conflicts ------------------------------------------ tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() > library(skimr) Attaching package: ‘skimr’ The following object is masked from ‘package:stats’: filter > ward_data <- read_csv("F_Ward_Data_Example.csv") Parsed with column specification: cols( .default = col_character(), Anon_Patient_ID = col_double(), Age = col_double(), HospStayNoOverall = col_double(), WardStayNoOverall = col_double(), WardStayLOSHours = col_double(), WardStayLOSDays = col_double(), F_WardStayLOSDays = col_double(), `Admission Type` = col_double(), `Admission Transfer From On Admission` = col_double(), `Discharge Type` = col_double(), `Final Discharge Type` = col_double() ) See spec(...) for full column specifications. > skim(ward_data) Skim summary statistics n obs: 100 n variables: 25 -- Variable type:character ----------------------------------------------------- variable missing complete n min max empty n_unique Date of Death 83 17 100 10 10 0 14 Day_Quartile 60 40 100 3 3 0 1 F_WardStayEndDateTime 0 100 100 16 16 0 73 F_WardStayStartDateTime 0 100 100 16 16 0 35 HospStayEndDateTime 0 100 100 16 16 0 86 HospStayStartDateTime 0 100 100 16 16 0 89 IPDC 0 100 100 1 1 0 2 Sex 0 100 100 4 6 0 2 Specialty 0 100 100 10 28 0 9 Ward 0 100 100 3 14 0 18 WardStayEndDateTime 0 100 100 16 16 0 73 WardStayStartDate 0 100 100 10 10 0 14 WardStayStartDateTime 0 100 100 16 16 0 94 WardStayStartYear 0 100 100 7 7 0 2 -- Variable type:numeric ------------------------------------------------------- variable missing complete n mean sd Admission Transfer From On Admission 0 100 100 27.42 15.4 Admission Type 0 100 100 24.1 9.16 Age 0 100 100 63.94 17.36 Anon_Patient_ID 0 100 100 48918.86 22983.54 Discharge Type 0 100 100 11.53 1.18 F_WardStayLOSDays 0 100 100 0.39 0.21 Final Discharge Type 0 100 100 11.8 6.24 HospStayNoOverall 0 100 100 28255.65 12549.9 WardStayLOSDays 0 100 100 3.33 7.19 WardStayLOSHours 0 100 100 79.65 172.38 WardStayNoOverall 0 100 100 45005.04 18749.14 p0 p25 p50 p75 p100 hist 10 10 41 41 41 ▆▁▁▁▁▁▁▇ 10 11 30 30 36 ▃▁▁▁▁▁▇▁ 14 53 67.5 78 94 ▁▁▂▃▃▇▇▂ 1167 27891.75 57794 66531.75 76320 ▂▃▂▂▁▅▇▆ 10 10 12 12 19 ▃▇▁▁▁▁▁▁ 0 0.2 0.4 0.6 0.7 ▁▃▅▃▂▃▇▂ 10 10 10 10 42 ▇▁▁▁▁▁▁▁ 796 18276 32017 37814.5 44319 ▂▃▃▂▃▅▇▆ 0 0.2 0.7 3.02 38.9 ▇▁▁▁▁▁▁▁ 0.1 4.97 16.25 73 933.5 ▇▁▁▁▁▁▁▁ 1386 31054 51879.5 58832.25 66759 ▂▂▂▂▁▅▇▇ >

Previous Day ...