Version Control of API and Other Documentation

In his post regarding “Version Control Your API Documentation with Github”, Kin Lane discusses an approach for using Github to version API documentation.  The post references using a single public repository to store and version the API documents with the APIs.  I agree with his thoughts and position.  However, I’d like to extend Kin’s thoughts into a practical problem seen if that solution is incorrectly understood and implemented at more of a root source control level.

The problem:  Assume that you have the repository structure below and you ask your novice developer to “check in” their API documents, database scripts, etc. related to this project (basically anything that is not really needed to “run” the code).

The concern: In this project structure (a Maven Archetype default), there is a strong risk that developers will place their API documentation into the resources folder.  However, since anything placed into this location is generally packaged with the deployable unit, suddenly your repositories (source control, Maven repositories, deployment folders, etc.) could see a swelling of storage needed to support the effort.

This storage concern is not a critical issue if your project is small and your documentation very minor.  However, small project efforts that allow bad habits have a tendency to create bad habits for a development organization as a whole on larger projects if they are not watched closely.  After all – who do you place on your projects other than a team of developers that have “already done it” but just on smaller efforts?

In a large project, you might suddenly see gigabytes of documents, diagrams, database schemas, etc. showing up in this resources folder.  Assuming that the projects might undergo hundreds of releases, branches, tags, and forks – this storage suddenly becomes an issue.

Imagine if your developers placed 2 gigabyte of “documentation stuff” into the project’s resource folder and requested that your build system should check out “a fresh” copy of the project once every hour so that it can execute a build and package after checkout (insane requirement – but I digress).  That is over 48 GB of data flowing back and forth between the source control system and your build servers each day.  Extending that assumption further- if you have more than one project at your company with that requirement – you might get a call from your LAN team.

The recommendation: Use strategies such as Kin mentions (along with the options below) to move your documents into a documentation repository and/or repository folder that is not automatically packaged with the release.

Option #1 (Recommended): Use a dedicated repository for documentation and hook them via technologies such as Github.

Option #2: Place the documents into a folder outside of the packaging process.  For example, place your documents into a folder at the same level of the root folder of the project.  You will still face the checkout issues – but at least you are not deploying your compiled package with a large set of documents.

Option #3: If you can afford it, invest into commercial toolsets.  Since I come from a SOA background there are things such as Websphere Service Registry and Repository, HP SOA Systinet, and Software AG.

Leave a Reply