Review: MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

MapReduce Design PatternsI picked up MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems by Donald Miner and Adam Shook (O’Reilly) to explore the deeper analytics that were possible in using Hadoop and MapReduce.   This book definitely did not disappoint in covering many of the more advanced challenges that engineers working with Hadoop datasets will encounter once they move from the simple into the advanced.  MapReduce Design Patterns is not for the faint of heart nor the true novice in Hadoop and/or MapReduce frameworks.  A solid understanding of the fundamentals of analytics is also a valuable prerequisite to this title.

Having explored the use of Pig and Hive as a way to abstract the underlying implementations of MapReduce, MapReduce Design Patterns helped me understand what was going on “under the hood.”  This was important to me as I have learned the hard lesson that sometimes the easy way is not always the most efficient and/or effective way.  By reading through this title, I now better understand how I can use Pig and Hive for the straight forward analytics and MapReduce native for my more specialized needs – or in other words – use the right tool for the job.

To the bold adventurer new to Hadoop and MapReduce – I’d suggest that you look at this book as your follow-on study guide to be used after learning the basics and working with those frameworks for a little while.  In that view, I have no hesitations in recommending MapReduce Design Patterns to those engineers that are looking for something to help them move from entry level into advance levels of understanding in these technologies.

Disclaimer: I received a free electronic copy of this book as part of the O’Reilly Blogger Program

Review: Hadoop Operations by Eric Sammer

Hadoop Operations by Eric Sammer (O’Reilly Media) is a thoughtfully organized book that guides the operational and architectural reader into a viable Hadoop-centric solution.  In his book, Sammer spends a reasonable amount of time providing the reader with enough Hadoop background to be able to move onto the more complex considerations and actions needed to implement high quality Hadoop clusters in an operations environment.  Sammer provides some very specific information in his books that puts it into my “must have” collection for Hadoop.

First, instead of trying to cover what Hadoop can do in all flavors and colors, Sammer describes configurations that will meet the needs of a general operational implementation.  This allows the reader to focus on the key concepts of installing, configuring, and operating a Hadoop cluster instead of learning the many Hadoop features that most shops will never use.  Secondly, Sammer spends an appropriate amount of time discussing ways that an operational team can monitor and troubleshoot Hadoop clusters.  Very few authors cover the areas needed so that a solution can move from “proof of concept” into a “production-level” implementation.  Third, Sammer looks at products that work around Hadoop to either add features or allow for better maintainability/management of the system.  This gives the reader the ability to see how Hadoop fits into the larger operational model.  Finally, Sammer approaches the chapters in the book from the view of someone that has actually implemented Hadoop clusters by providing suggestion, tips, and tricks that allow the reader to bypass many of the more common challenges that Hadoop adopters can face.

I highly recommend Hadoop Operations by Eric Sammer for the operational and architectural readers that want to get a highly viable solution as soon as possible.

Disclaimer: I received a free electronic copy of this book as part of the O’Reilly Blogger Program

Review: Learning Rails 3

Review by Jason Armstrong of Learning Rails 3

Learning Rails 3

Learning Rails 3 by Simon St. Laurent, Edd Dumbill, and Eric J Gruber (O’Reilly Media) is a great opening guide for developers that are new to Ruby on Rails development.  The book does assume some basic background from the reader (as stated in the preface).  The reader should know HTML development (not just HTML via WYSIWYG tools) along with Ruby in order to truly understand the concepts that are being presented in this book.  The authors provide an Appendix to help in the Ruby ramp up. Finally, a background in how programming is done generally will help the reader understand the concepts being presented.

Like all technology books, the authors had to write the title to the version of Rails that was available at the time.  However, I feel that the authors have provided a solid foundation to the reader that can support the independent advancement of the reader as they iterate through newer versions of the technology.  The authors also provide warnings about potential problems and confusions the readers may experience.  Too few authors are willing to commit to these types of warnings and I appreciate those that do provide them.  After all, no technology is perfect in all ways.

While the Model View Controller (MVC) specialist in me kept screaming about some of the early conversations in the book, the authors actually found ways to meet the fundamentals of MVC while making sure that the concepts provided were maintainable and manageable.  I don’t fault them for their approach since the flow of the book actually results in the developer meeting those fundamentals as they progress through the book.  In fact, it was actually refreshing to see the MVC concepts being explained in a way that would reach all developers – not just the purists.

Overall, I recommend this book to the type of reader described above.  As the authors state in their preface, you will not be a Rails guru after reading it; but you be a lot closer towards it than you were before this book was read.

Disclaimer: I received a free electronic copy of this book as part of the O’Reilly Blogger Program

Review: Version Control with Git

A Review of Version Control With GITAlready having a background in advanced usage of ClearCase, CVS, and SVN, I picked up Version Control with Git by Jon Loeliger and Matthew McCullough (O’Reilly publisher) to understand how Git could help me solve some of the feature challenges I have been working through with other VCSs.  This book certainly was able to deliver to my expectations.

The authors work through the processes to setup and configure Git step-by-step.  In addition, they also spends a great deal of time delving into the more important topics required to work with Git as a power user.  The examples were useful and the diagrams where acceptable to convey the points needed.  There is no doubt that that the authors understand Git.  The time they take in explaining why to “do something” is important in moving the reader from a simple user of Git into becoming a power user.  The “Submodule Best Practices” chapter was helpful in solving some of my current challenges while the “Tips, Tricks, and Techniques” chapter gave me some quick wins.

While there are many ways to solve the same problem when using any VCS, I felt like the authors worked hard to provide an honest and open view of their approaches.  I highly recommend Version Control with Git to the reader that wants to understand more about a VCS like Git than simply a small number of quick commands via an IDE.  While this book is not a definitive reference guide in all things Git, it provides a solid foundation that allows the reader to head in the right path as they learn more about Git’s inner workings.

Disclaimer: I received a free electronic copy of this book as part of the O’Reilly Blogger Program

SOA Governance Control the Chaos

With the growing number of implementations in the development community based on the SOA (Services Oriented Architecture) paradigm, I see many different governance mechanisms stated in terms of “must”, “shall”, and “only” when someone has begun an implementation of SOA. In this post, I will discuss some of the observations I have seen in my 8+ years of working in the SOA paradigm and provide some advice on how to better manage one of the stickiest areas of SOA implementation – governance.

SOA Governance is probably the most misunderstood area of a SOA implementation in our organizations or projects. Even the community contributors to Wikipedia struggle to provide a single and concise definition for what SOA governance “is.” There are many academic reasons for this struggle, but the overriding reason for this struggle is that governance is a term that means different things to different levels in an IT organization. To the technical manager, governance is controlling what tools, resources, and processes their development team will utilize. To the developer, governance is controlling how they create a service, how they integrate them with other services, and generally how they should be built. To the architect, governance is about controlling what specific services are built, their interactions, the domain of responsibility they satisfy, and how well they can be reused.

With all of these multiple pulls at a SOA governance process, a person can quickly see why there are so many flavors of governance implementations around the community. With the exercise of outsourcing the development of SOA implementations to remote organizations, the puzzle of governance is even more complicated since motivations such as financial, domain control, and experience-to-cost situations arise. I will not be able to cover all of these complicated issues into a single model everyone can follow in every implementation (hence why this is an issue in the first place), but based on my experiences in the SOA Governance model, I will try to provide some simple recommendations that might make the implementation of governance easier for all of these different types of teams.

Understand SOA Governance’s level of control.

The first mistake an organization can make in implementing SOA Governance is to try to control the lowest level of decisions being made in the implementation of SOA services. For example, defining every development tool that a team should use in the implementation of the service is a recipe for disaster. I am not saying that there should not be guidelines around this tooling – only that there has to be a sense of reasonableness and flexibility to prevent teams from needing to “force” a tool to do something it is not really designed to do well. For example, in a presentation at JUG, an architecture member presented their “Toolset Recommendation for SOA Implementations.” It looked something like this.

It was explained to the room that the developers were advised that if they did not use these (and only these) tools in their service development, that their implementation would not be approved by the governance board and hence not eligible for implementation. I can’t fault them in attempting to resolve a major issue in most development communities: toolset control/toolset domain knowledge. After all, how many times have we all been on a maintenance effort where we find that the previous team used an obscure library that takes us days to figure out how to use (much less find a download or instructions for in the community)? However, while their approach was a valiant attempt towards a resolution to these common issues, I believe that it causes a condition I like to term as “over-control.” In my experiences, any approach that causes over-control will generally cause it to fail.

As an example of a condition of over-control; what if a service implementation contains a requirement to provide session support? In Apache CXF, session support is not supported out of box due thread-safe risks. While this condition can be mitigated through the use of some of the tricks of the trade, a different Service framework may have made this implementation easier to develop successfully. However, in the case above, the SOA Governance “standard” required the use of a single service framework. Therefore, the team is quickly out of compliance with governance for picking a different service framework and either fails to implement or implements potentially buggy workarounds to change to the governance standard.

How do you resolve this conflict? Create a toolset board that contains all of your brightest and most reasonably vocal of senior developers. That board should provide multiple recommended toolset options for each framework area. For example, the board should use their past experiences and combined knowledge to define recommended toolsets including documenting each tool’s “sweet spots” and “limitations.” This information should be readily available to the projects teams as then begin lower level design. However, the most important feature of this process is to allow the projects teams to challenge/add new toolsets to the standard through the board. If that case is accepted, the organization as a whole will benefit from the experiences of its project teams, the governance standards stay current and flexible, and projects teams don’t implement a toolset that has known issues identified by the other projects teams. With this approach the governance document for approved toolsets would look like the below (simplified view).

With these types of toolset control mechanisms implemented in SOA governance, we get the best tools being chosen for each of the areas of our implementations along with growing our organizations and development communities. In the end, we have achieve the needs to each of our organizational areas: the development managers get a list of tools to ensure core competency in their teams, the developers get their flexibility and some experience-based advice, and architecture gets their reuse and standardization.