Intro to the SVN Importer
I’ve previously posted about the SVN Importer tool here and hoped at some point to follow up on my experiences converting from specific version control tools. Well, after the OpenMake team did a StarTeam conversion project last year that was easily an order of magnitude larger than any other conversion project I’ve ever done, I think I’m fairly well qualified to write on the topic. I had previously done some small conversions using StarTeam 2005 (aka version 11) but for this project, the customer was using StarTeam 2009 (aka version 12.5). Oh, and I when I say this effort was big, I mean REALLY big: the largest project had almost 20 million file revisions and the whole system had around 50 million file revisions.
The first thing I noticed in doing other smaller conversions using the SVN Importer is that StarTeam lacks certain critical functions in its command line interface (CLI) that allow these sorts of conversions. Because of this, the SVN Importer developers, out of necessity I believe, choose to use the StarTeam API to perform the conversion to SVN. This requires that you have the StarTeam SDK installed on your conversion machine. Also, if you are converting very large projects (greater than 1 million file revisions) as I was, it means you’ll need a 64-bit version of the SDK. While I was able to track this down for StarTeam 2009, I don’t believe this exists in earlier versions. You’ll also need to make sure that the correct version of the StarTeam API jar file is in the classpath of the importer and that the Lib directory of the StarTeam SDK is included in your PATH environment variable.
Once I actually got my conversions running with SVN Importer things went well converting the trunk of projects but I encountered the following error anytime I tried to convert any branches, aka derived views in StarTeam:
INFO historyLogger:84 - EXCEPTION CAUGHT: org.polarion.svnimporter.svnprovider.SvnException: Unknown branch:
Since I was familiar with the inner workings of SVN Importer and the source was freely available, I worked to debug this issue and was able to find a simple coding error that was easily corrected. As I recall it was because the code in question was using the wrong method, with the wrong return type, to get the branch name.
Later on, I encountered another problem where the same file would be added twice in the same SVN revision in the output dump files. When attempting to load these dumps into a SVN repository, I would see the error message ‘Invalid change ordering: new node revision ID without delete.’ After some detective work I determined that the same file was being added to revisions multiple times when there were multiple StarTeam labels (equivalent to SVN tags) for the same set of changes. I made a small adjustment to the model for StarTeam to check if a file exists in a revision before trying to add and this resolved the issue.
Besides these more significant problems, there were a few things I wanted to improve about how the conversion process worked. To start, the converter was performing duplicate checkouts for each file revision that was adding a good deal of extra time to the conversion process. In addition, because the conversions I was doing were on very large repositories, over the course of a longer conversion certain StarTeam operations could fail for various reasons (for example network and/or server flakiness) and the converter was written in a such a way that a failure on any StarTeam operation would cause the whole conversion to fail. To mitigate this issue, I wrapped each call to StarTeam in some logic to retry the operation if there was an error. Once all these changes were made, I was ready to tear though these projects … or perhaps crawl is a better way to describe it!
Make it go
If you have ever done a version control history migration, you know that these migrations can take a long time to run as the process checks out every version of every file and constructs the new repository. When we ran smaller tests we found the performance to be a bit slow, but nothing prepared us for the projects with millions of file revisions.
As we moved to larger and larger projects, not only did the time requirements swell, but also the hardware requirements. While projects with tens (or even hundreds) of thousands of revisions were achievable with 8 GB RAM, we found that this was not enough RAM for projects with millions of file revisions. This could be very frustrating because the conversions could sometimes run for over a day before erroring out and when they did there was no way to recover the conversion; you had to start all over from the beginning. When even 16 GB was not enough for the very largest project (consisting of roughly 18 million file revisions), I even had doubts that increasing our RAM up to 32 GB would be sufficient. Fortunately, once at 32 GB of RAM we never had to worry about RAM again.
In all, the conversion process for this largest project took almost 2 weeks (!) to complete its processing, and almost just as long to validate. The validation portion of a conversion is probably most often overlooked, and it is mostly simple to do, but still necessary. The process of loading very large SVN repositories takes nearly as long as the conversion process itself. One issue that we encountered on this project was actually a limit on the filesystem inodes for ext3. While this was simple enough to handle, I’m glad we did the validation load to test everything before moving on to the load of the production SVN system.
All in all, this StarTeam to SVN conversion effort took roughly 3 months and was not without its share of challenges but was ultimately worth the effort for the customer. There really is no substitute for this sort of migration. In most cases, without a migration like this, companies that need this data available will keep an older VCS running for years, with all the associated costs, in order to stay in compliance with their internal policies or external regulations.
If you’d like to know more about the code changes made to SVN Importer, here’s the situation. I have made all of these updates available to Polarion, but as of now I don’t have an idea when these changes will be made publicly available through their SVN repository. If you have questions about StarTeam conversions or the code changes I made, respond in the comments and I can give more detail and possibly find another way to share my changes.