# May the force be with the units

Contents

When working with data it’s very important to be aware of the unit of each variable. To make it explicit it’s convenient to use the package called units.

Support for measurement units in R vectors, matrices and arrays: automatic propagation, conversion, derivation and simplification of units; raising errors in case of unit incompatibility.

## A simple example

By reading the help of the starwars dataset, we learn that

• height: Height (cm)
• mass: Weight (kg)

So let’s test it.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  library(dplyr) library(magrittr) library(units) starwars %>% mutate(height = set_units(height, cm)) %>% mutate(mass = set_units(mass, kg)) %>% select(name, height, mass, species) %>% head() # A tibble: 6 x 4 # name height mass species # [cm] [kg] # 1 Luke Skywalker 172 77 Human # 2 C-3PO 167 75 Droid # 3 R2-D2 96 32 Droid # 4 Darth Vader 202 136 Human # 5 Leia Organa 150 49 Human # 6 Owen Lars 178 120 Human 

Note that units are nicely printed in the output.

## Converting units

Imagine I want to draw a bar chart of the 5 first characters of the dataset. But I want the scale to be defined in meters instead of centimeters since it make it cleared to figure out the scale. However, it does not work as is. To make it work the package ggforce is needed and more specifically, the scale_unit function.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14  library(ggforce) starwars %>% mutate(height = set_units(height, cm)) %>% mutate(mass = set_units(mass, kg)) %>% select(name, height, mass, species) %>% head(5) %>% ggplot() + geom_bar(aes(x = reorder(name, -height), y = height, colour = species, fill = after_scale(alpha(colour, 0.5))), stat = "identity", size= 1.5) + # Give a good looking xlab("name") + scale_y_unit(unit = "m") + # Doing the conversion scale_x_discrete(guide = guide_axis(n.dodge = 2)) # To avoid overlapp in names 

Thanks to a call to scale_y_unit(unit = "m") I’m able to display the scale in meters.

Note: x = reorder(name, -height) is a trick to sort the bars by value (height in this case).

## Mixing units

If we try to check if height is correlated to weight we obtain this scatter plot.

 1 2 3 4 5 6 7  starwars %>% mutate(height = set_units(height, cm)) %>% mutate(mass = set_units(mass, kg)) %>% select(name, height, mass, species) %>% drop_na() %>% ggplot() + geom_point(aes(x = mass, y = height)) 

There is a big outlier in this plot. Let’s check who is it?

  1 2 3 4 5 6 7 8 9 10 11  starwars %>% mutate(height = set_units(height, cm)) %>% mutate(mass = set_units(mass, kg)) %>% select(name, height, mass, species) %>% drop_na() %>% filter(mass == max(mass)) # A tibble: 1 x 4 # name height mass species # [cm] [kg] # 1 Jabba Desilijic Tiure 175 1358 Hutt 

It’s Jabba the Hutt. For him it would be also useful to use tons instead of kg! If we remove him we obtain something more conventional.

 1 2 3 4 5 6 7 8  starwars %>% mutate(height = set_units(height, cm)) %>% mutate(mass = set_units(mass, kg)) %>% select(name, height, mass, species) %>% drop_na() %>% filter(mass != max(mass)) %>% # Here is the removal ggplot() + geom_point(aes(x = mass, y = height)) 
His Body Mass Index (BMI)1 should be very bad let’s check it. Thanks to the units package it’s pretty straightforward and explicit.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18  starwars %>% mutate(height = set_units(height, cm)) %>% mutate(mass = set_units(mass, kg)) %>% # Converting in meters and raising to power of 2 mutate(bmi = mass / set_units(height, m) ^ 2) %>% select(name, height, mass, bmi, species) %>% arrange(desc(bmi)) %>% head() # A tibble: 6 x 5 # name height mass bmi species # [cm] [kg] [kg/m^2] # 1 Jabba Desilijic Tiure 175 1358 443.42857 Hutt # 2 Dud Bolt 94 45 50.92802 Vulptereen # 3 Yoda 66 17 39.02663 Yoda's species # 4 Owen Lars 178 120 37.87401 Human # 5 IG-88 200 140 35.00000 Droid # 6 R2-D2 96 32 34.72222 Droid 
Note that the BMI bmi unit is correct: kg/m^2.