Categories

Back 2 Code

Rack Awareness

Hadoop is a distributed system, so its core services1 have to know the network topology — in short where data nodes reside in the data center — either to ensure reliability (try to write replicas of a block at different location for fault tolerance) and performance (try2 to read the closest blocks). HDFS (the NameNode) uses this information during its writing process in order to choose the destination of each block replica3.

Be Smart and Fun

I’ve heard about a book standing on top of computer science book lists (for example in the Essential Books of Computer Science list). And what is this book about? Haskell. What? Haskell, the functional programming language, a descendant of ML. Haskell is not widely used — I’ve never seen this competency in any CV. So why this book is so popular? Because it’s well written and also because it’s smart and fun.

Pandas pipes

I love the ability of using pipes (with the dedicated operator %>%) in R introduced by the magrittr package – we are using them for many years in *nix systems, the old good |. They let write data wrangling sequences in a very readable way – very close to a natural language. See them in action. 1 2 3 4 5 babynames %>% filter(name == "Eva") %>% group_by(year) %>% summarise(n = sum(n)) %>% ggplot(aes(x = year, y = n)) + geom_line() Reading that and knowing the content of the babynames package, we can guess without effort that we are trying to plot the evolution of the number of babies (born in the USA) named “Eva” along the years.

Feather

We often read articles on the topic Python vs. R, which language should I pick? My opinion is that each language, and its corresponding ecosystem, has its pros and cons and can be used efficiently to solve different problems. Wes McKinney and Hadley Wickham seem to agree on this point and have recently developed in strong collaboration the Feather packages (one in Python and one in R at this time, but it could / will be extended to other languages).

Lazy

The laziness of engineers has always been one of the biggest driver in computer science. Here is a quote1 by John Backus the father of FORTRAN — the daddy of modern programming languages . Much of my work has come from being lazy. I didn’t like writing programs, and so, when I was working on the IBM 701 (an early computer), writing programs for computing missile trajectories, I started work on a programming system to make it easier to write programs.

Netscape, the rewrite big mistake

Back in the mid-1990s Netscape was one of the first successful startup of the Internet era. The company had made with success it’s IPO (Initial Public Offering) and trusted more dans 90% of browser usage. In 2008 the company had lost the browser war and its owner AOL has announced the end of the support of Netscape products. What happened? In these cases there is often more than one reason, but one of them is certainly “the single worst strategic mistake that any software company can make: They decided to rewrite the code from scratch”.