I use Markdown as much as I can but mainly to write blog posts and documentation. There are many great editors – at least on Mac – for Markdown but most of them are not free to use. Moreover, Ulysses just began a move, that others could follow, to a subscription model (~$5/month). Finally, I don’t want to avoid vendor lock-in and charging $40/year for plain text editing in a format designed to be usable from anywhere by John Gruber and Aaron Swartz.
This article explains how to put in place quickly a basic monitoring of the Hadoop YARN resource allocation system through the ResourceManager REST API’s1. The same information is also available in metrics collectors like Ambari metrics (AMS), but YARN API is really easy to use and is available everywhere — and most of the time without requiring authentication. A last word, triggers (thresholds) on these metrics make efficient source to integrate into a global monitoring and alerting system, like Nagios or Zabbix, since they are not available out of the box in the standard Ambari alerting system2.
A Virtual Environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them. It solves the “Project X depends on version 1.x but, Project Y needs 4.x” dilemma, and keeps your global site-packages directory clean and manageable. – source1 They can also be used to supply very different things like different versions of Python. Virtual Environments can be managed by the Python package manager pip, but when you are using the popular Anaconda distribution it is necessary to use the conda package manager coming with it.
While reading Data Analysis with Open Source Tools1–one of my favorite book on this topic–, I was surprised to discover that rounding numbers to display was described by a rule bearing the name of Andrew S. C. Ehrenberg. It is obviously pointless to report or quote results to more digits than is warranted. In fact, it is misleading or at the very least unhelpful, because it fails to communicate to the reader another important aspect of the result–namely its reliability!
Definition An outlier is a data point or observation whose value is quite different from the others in the dataset being analyzed.1 It is an important part of the analysis to identify outliers and to use appropriate techniques to take them into account. But unfortunately 😔 There is no absolute agreement among statisticians about how to define outliers […]1 So what can be done? Fortunately 😏
The book Think like a programmer1 is interesting because it does not only focus on coding. One of the most interesting chapter is related to general problem-solving techniques. I found it so great, that I wanted to write down the rules, in order to remind me later these useful advices–cherry on the cake, they can be applied to almost every problem. Always have a plan Restate the problem Divide the problem Start with what you know Reduce the problem Look for analogies Experiment Don’t get frustrated V.