Version Control of API and Other Documentation

In his post regarding “Version Control Your API Documentation with Github”, Kin Lane discusses an approach for using Github to version API documentation.  The post references using a single public repository to store and version the API documents with the APIs.  I agree with his thoughts and position.  However, I’d like to extend Kin’s thoughts into a practical problem seen if that solution is incorrectly understood and implemented at more of a root source control level.

The problem:  Assume that you have the repository structure below and you ask your novice developer to “check in” their API documents, database scripts, etc. related to this project (basically anything that is not really needed to “run” the code).

The concern: In this project structure (a Maven Archetype default), there is a strong risk that developers will place their API documentation into the resources folder.  However, since anything placed into this location is generally packaged with the deployable unit, suddenly your repositories (source control, Maven repositories, deployment folders, etc.) could see a swelling of storage needed to support the effort.

This storage concern is not a critical issue if your project is small and your documentation very minor.  However, small project efforts that allow bad habits have a tendency to create bad habits for a development organization as a whole on larger projects if they are not watched closely.  After all – who do you place on your projects other than a team of developers that have “already done it” but just on smaller efforts?

In a large project, you might suddenly see gigabytes of documents, diagrams, database schemas, etc. showing up in this resources folder.  Assuming that the projects might undergo hundreds of releases, branches, tags, and forks – this storage suddenly becomes an issue.

Imagine if your developers placed 2 gigabyte of “documentation stuff” into the project’s resource folder and requested that your build system should check out “a fresh” copy of the project once every hour so that it can execute a build and package after checkout (insane requirement – but I digress).  That is over 48 GB of data flowing back and forth between the source control system and your build servers each day.  Extending that assumption further- if you have more than one project at your company with that requirement – you might get a call from your LAN team.

The recommendation: Use strategies such as Kin mentions (along with the options below) to move your documents into a documentation repository and/or repository folder that is not automatically packaged with the release.

Option #1 (Recommended): Use a dedicated repository for documentation and hook them via technologies such as Github.

Option #2: Place the documents into a folder outside of the packaging process.  For example, place your documents into a folder at the same level of the root folder of the project.  You will still face the checkout issues – but at least you are not deploying your compiled package with a large set of documents.

Option #3: If you can afford it, invest into commercial toolsets.  Since I come from a SOA background there are things such as Websphere Service Registry and Repository, HP SOA Systinet, and Software AG.

Coding Standards: A Beginning

010712_1933_SOAGovernan1.pngWhile working on a proposal for a new open source incubator project, it came as no surprise that the topic of which coding standards we should use came to the top of my task list as code formatting arguments were raised.  In a flash of inspiration, I immediately provided the standard quick and concise answer:  “Lets use the Oracle Java Coding Conventions standard.”  Suddenly, the sun burned brighter and the birds took up in song as the brilliance of my efficient answer was delivered.  Later in the day, when I had more time to consider the ramifications of my earlier answer, I pondered that perhaps I had been too simplistic in the view of what the Coding Standards means to me, my project, and the information technology industry as a whole.

So…let’s be honest with ourselves here.  When push comes to shove, we do what we need to do to get the product out to the markets. How often do we tell ourselves, “who really cares if I used 5 or 10 spaces of indent” and “Why does anyone care that all of my variables contain a single letter?”  We know that true “format” issues don’t really matter to anyone other than only the most critical of code reviewers.  Also, we always tell ourselves that we will get back and fix all those little shortcuts we took (no comments, JavaDoc statements, commented out code, etc) just as soon as we have a little more time.  Besides, we also all know that badly “formatted” code runs in production just as well “formatted” code…right?

However, as I found some free time to myself  (aka the holiday period), I wondered if perhaps there were some things that are defined in high-quality Coding Standards that are perhaps a little more complicated that pure formatting.  An example of one of those items is found below.

STRUCTURE GUIDELINE – “Avoid nested method call at all costs unless it is completely guaranteed not to throw an NullPointerException.”

Example #1


In the above example, there is a good possibility that this efficiently written single line of code will return a NullPointerException to the caller.  Code reviewers generally see samples of where this exception prone code is wrapped (usually later) as the example bellow shows.

Example #2

  1. try{
  2.      this.theInstance.theMethod.getMap.get(“key”);
  3. } catch (NullPointerException npe) {
  4.      log.error(npe.getMessage(), npe);
  5. }
  6. return npe;

When the NullPointerException message is inspected from the code above, the stack trace will tell you the line number that caused the exception (line 2), but cannot tell you if the Null object in this line was theInstance, theMethod, or getMap.  Suddenly, we begin to realize that perhaps high-quality Coding Standards can help us write more “reliable” code.

In summary, delve deeper into the coding standards available in the community and consider if your projects should use code formatting tools such as Checkstyle (my current preference) in their efforts.  It worked for me and hopefully it will work for you also.