Functional Sequences

published on 2016-11-29 included in data

This feature blew my mind. R permits to define functional sequences. You can use %>% to not only produce values but also to produce functions (or functional sequences)! It’s really all the same, except sometimes the function is applied instantly and produces a result, and sometimes it is not, in which case the function itself is returned.1 Here is an example. suppressPackageStartupMessages(library(tidyverse)) suppressPackageStartupMessages(library(babynames)) # Defining the functional sequence prepare <- .

DevOps the new view

published on 2016-11-26 included in ops book

While reading Effective DevOps1 I found a very interesting point. In an environment where humans are blamed and punished for errors, a culture of fear can build up walls that prevent clear communication and transparency. It is absolutely true, organization tends to segregate the work between distinct very specialized teams hiding behind walls built with SLA (Service Level Agreement) and RAM (Responsibility assignment matrix) . These are useful tools and the management is more confortable with clear contracts between teams, but at the opposite they kill collaboration.

Working days

published on 2016-11-23 included in data

I’ve worked a lot with Excel spreadsheets on financial data. Among other things, I had to use working days. Excel provides a function (WORKDAY) to compute working days but holidays have to be provided manually and in France we have a lot of public holidays :-)—and they are not all anchored. The python Pandas package comes from the financial analysis domain and in consequence, it provides a lot of features to manipulate time series data.

Pandas apply and map

published on 2016-11-20 included in data

The way to apply a function to pandas data structures is not always obvious–several methods exist (apply, applymap, map) and their scope is different. First there is two main structures (fortunately I’m not talking about Panel here): Series: one-dimensional labeled array DataFrame: 2-dimensional labeled data structure The apply / map methods can work on different ways. Element-wise The function is called (mapped) for each individual element (value)–so it takes the element (each distinct value) as parameter.

Rack Awareness

published on 2016-11-18 included in data ops

Hadoop is a distributed system, so its core services1 have to know the network topology — in short where data nodes reside in the data center — either to ensure reliability (try to write replicas of a block at different location for fault tolerance) and performance (try2 to read the closest blocks). HDFS (the NameNode) uses this information during its writing process in order to choose the destination of each block replica3.

Be Smart and Fun

published on 2016-11-15 included in book dev

I’ve heard about a book standing on top of computer science book lists (for example in the Essential Books of Computer Science list). And what is this book about? Haskell. What? Haskell, the functional programming language, a descendant of ML. Haskell is not widely used — I’ve never seen this competency in any CV. So why this book is so popular? Because it’s well written and also because it’s smart and fun.

Pandas pipes

published on 2016-11-13 included in data

I love the ability of using pipes (with the dedicated operator %>%) in R introduced by the magrittr package – we are using them for many years in *nix systems, the old good |. They let write data wrangling sequences in a very readable way – very close to a natural language. See them in action. babynames %>% filter(name == "Eva") %>% group_by(year) %>% summarise(n = sum(n)) %>% ggplot(aes(x = year, y = n)) + geom_line() Reading that and knowing the content of the babynames package, we can guess without effort that we are trying to plot the evolution of the number of babies (born in the USA) named “Eva” along the years.

Feather

published on 2016-11-11 included in data

We often read articles on the topic Python vs. R, which language should I pick? My opinion is that each language, and its corresponding ecosystem, has its pros and cons and can be used efficiently to solve different problems. Wes McKinney and Hadley Wickham seem to agree on this point and have recently developed in strong collaboration the Feather packages (one in Python and one in R at this time, but it could / will be extended to other languages).

Lazy

published on 2016-11-10 included in dev

The laziness of engineers has always been one of the biggest driver in computer science. Here is a quote1 by John Backus the father of FORTRAN — the daddy of modern programming languages . Much of my work has come from being lazy. I didn’t like writing programs, and so, when I was working on the IBM 701 (an early computer), writing programs for computing missile trajectories, I started work on a programming system to make it easier to write programs.

Netscape, the rewrite big mistake

published on 2016-11-10 included in dev

Back in the mid-1990s Netscape was one of the first successful startup of the Internet era. The company had made with success it’s IPO (Initial Public Offering) and trusted more dans 90% of browser usage. In 2008 the company had lost the browser war and its owner AOL has announced the end of the support of Netscape products. What happened?

Stack Overflow: reading data

published on 2016-11-05 included in dev

One of the tedious thing in Stack Overflow is to grab example data provided by users in order to be able to use it to reproduce the case and try to solve it. Here is how to do it efficiently in R and Python, hope this will help you being the first to answer! In R library(tibble) # Paste the text in a String, R allows multiline strings. zz <- "Sepal.

Back 2 Code