Review of Enterprise Data Workflows with Cascading

Enterprise Data Workflows with CascadingEnterprise Data Workflows with Cascading by Paco Nathan (O’Reilly Media) is a great summarization of using the Cascading API.  Paco spends a sufficient amount of time providing a solid overview of Cascading along with an explanation of related extensions such as Pattern and Lingual. Test cases provided allow a novice user to quickly understand the basics of Cascading though some of the test cases followed along the same flows as the Cascading online documentation site.

Enterprise Data Workflows with Cascading is a great resource for beginning users that need to quickly come up to speed on using Cascading.  The book works the reader through evolutions of exercises such as setting up and loading files into Hadoop to using different types of joins along with finally reaching the point of integration points with the different languages and a larger case study based on the City of Palo Alto Open Data.

I’d recommend Enterprise Data Workflows with Cascading as a good entry point and base work to build upon as the reader gains more experience.