Feather

We often read articles on the topic Python vs. R, which language should I pick? My opinion is that each language, and its corresponding ecosystem, has its pros and cons and can be used efficiently to solve different problems.

Wes McKinney and Hadley Wickham seem to agree on this point and have recently developed in strong collaboration the Feather packages (one in Python and one in R at this time, but it could / will be extended to other languages).

It is designed to make reading and writing data frames efficient, and to make sharing data across data analysis languages easy.

Stop talking, it’s time to experiment. Here is a “hello world” of writing a data frame in R and reading it back in Python (pandas).

library(feather)

data("mtcars")
path <- "/tmp/mtcars.feather"
write_feather(mtcars, path)

Now we can read it in Python and start playing with mtcars data ;-)

import feather

path = '/tmp/mtcars.feather'
df = feather.read_dataframe(path)

df.head(3)
##     mpg  cyl   disp     hp  drat     wt   qsec   vs   am  gear  carb
## 0  21.0  6.0  160.0  110.0  3.90  2.620  16.46  0.0  1.0   4.0   4.0
## 1  21.0  6.0  160.0  110.0  3.90  2.875  17.02  0.0  1.0   4.0   4.0
## 2  22.8  4.0  108.0   93.0  3.85  2.320  18.61  1.0  1.0   4.0   1.0

A last world, Feather uses the Apache Arrow columnar memory specification, but at this time it should not be used for long-term storage since it is likely to change.

Note to users: Feather should be treated as alpha software. In particular, the file format is likely to evolve over the coming year. Do not use Feather for long-term data storage.

References / Further reading