At one of the last sessions during the Hadoop Summit 2009, Arun Murthy (Yahoo) was going over the changes that were necessary in Hadoop to sort a terabyte of data in less than 60 seconds. Besides all of the good wisdom in the work, what I liked the most was his use of charts to understand where could Hadoop use some optimization. He described one of the charts (see image on the right) as the "ideal hadoop job". I don't remember everything, but the fact is that you see smooth lines/waves of both mappers and reducers, quick startup time, little wasted jobs and so on. This left me thinking: what would the graph look for my jobs? Hence, the reason for Hadoop Timelines.
Hadoop Timelines is a Web service built using App Engine and a Python script using Dumbo that will take care of everything to replicate Arun's Task Timelines for your own Hadoop jobs. My goal with this project is to raise the awareness of Hadoop developers in understanding job execution and performance, maybe even crazier, that we collaborate and analyze together individual job performance through comments on specific graphs.
If you're a Hadoop developer, I'd recommend following the instructions to submit your hadoop timelines to the site and start understanding how to improve your job execution time. Enjoy.
Created by Elias Torres. (Source)