Review: MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

MapReduce Design PatternsI picked up MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems by Donald Miner and Adam Shook (O’Reilly) to explore the deeper analytics that were possible in using Hadoop and MapReduce.   This book definitely did not disappoint in covering many of the more advanced challenges that engineers working with Hadoop datasets will encounter once they move from the simple into the advanced.  MapReduce Design Patterns is not for the faint of heart nor the true novice in Hadoop and/or MapReduce frameworks.  A solid understanding of the fundamentals of analytics is also a valuable prerequisite to this title.

Having explored the use of Pig and Hive as a way to abstract the underlying implementations of MapReduce, MapReduce Design Patterns helped me understand what was going on “under the hood.”  This was important to me as I have learned the hard lesson that sometimes the easy way is not always the most efficient and/or effective way.  By reading through this title, I now better understand how I can use Pig and Hive for the straight forward analytics and MapReduce native for my more specialized needs – or in other words – use the right tool for the job.

To the bold adventurer new to Hadoop and MapReduce – I’d suggest that you look at this book as your follow-on study guide to be used after learning the basics and working with those frameworks for a little while.  In that view, I have no hesitations in recommending MapReduce Design Patterns to those engineers that are looking for something to help them move from entry level into advance levels of understanding in these technologies.

Disclaimer: I received a free electronic copy of this book as part of the O’Reilly Blogger Program

Review: Hadoop Operations by Eric Sammer

Hadoop Operations by Eric Sammer (O’Reilly Media) is a thoughtfully organized book that guides the operational and architectural reader into a viable Hadoop-centric solution.  In his book, Sammer spends a reasonable amount of time providing the reader with enough Hadoop background to be able to move onto the more complex considerations and actions needed to implement high quality Hadoop clusters in an operations environment.  Sammer provides some very specific information in his books that puts it into my “must have” collection for Hadoop.

First, instead of trying to cover what Hadoop can do in all flavors and colors, Sammer describes configurations that will meet the needs of a general operational implementation.  This allows the reader to focus on the key concepts of installing, configuring, and operating a Hadoop cluster instead of learning the many Hadoop features that most shops will never use.  Secondly, Sammer spends an appropriate amount of time discussing ways that an operational team can monitor and troubleshoot Hadoop clusters.  Very few authors cover the areas needed so that a solution can move from “proof of concept” into a “production-level” implementation.  Third, Sammer looks at products that work around Hadoop to either add features or allow for better maintainability/management of the system.  This gives the reader the ability to see how Hadoop fits into the larger operational model.  Finally, Sammer approaches the chapters in the book from the view of someone that has actually implemented Hadoop clusters by providing suggestion, tips, and tricks that allow the reader to bypass many of the more common challenges that Hadoop adopters can face.

I highly recommend Hadoop Operations by Eric Sammer for the operational and architectural readers that want to get a highly viable solution as soon as possible.

Disclaimer: I received a free electronic copy of this book as part of the O’Reilly Blogger Program