Developers have been versioning their source files since the early days of Unix. Does anyone else remember SCCS? Back in those days, storage was at a premium: disks were small and very expensive and it was too inefficient to store a complete copy of each version of the source file. So version control tools would simply store deltas – the changes made from one version of the file to another. This was more efficient since if you only changed one line in a 3,000 line file then the system would simply store the new line and some meta-data about where it fitted in the file and who had made the change.
Some tools would save the latest version of the file and use reverse-deltas to record the changes if you wanted to revert to an earlier version. This was usually more efficient since generally developers would want the latest version of the file and it was more efficient to apply deltas to get earlier versions if required than it was to start with a 3-year-old file and roll forward through 472 changes to get the file you wanted.
These days, disk storage is literally 3 million percent cheaper than it was in 1981 (the storage in my phone would have cost 38 million dollars in 1981!) Because of this, storing deltas is not the panacea it used to be. In addition, CPUs can now crunch numbers a little faster than they used to – so using maths to compress a complete copy of the new version of the file and storing that is possible. So that’s what modern revision control tools such as Git do.
Now that’s all very interesting, but the title of this post is “Using Application Deltas in Deployments”. So why am I rattling on about Source Code versioning?
Well, here’s the rub – whilst deltas may no longer have much of a part to play in tracking changes to source files, they most certainly do when it comes to deploying applications, and for similar reasons that plagued us (or our parents!) 30 years ago.
Applications are made of Components
Let’s be frank – an application that consists of a single binary on a single server is not going to need deltas – that would be akin to a change to a single-line source file. All you need to do is to replace the old binary with the new and you’ve deployed. But applications are getting more complex:
- The application may be split across multiple servers. An n-Tier application may have different tiers running on different servers (and even different operating systems)
- The application could be wholly or partially containerised. You can have multiple containers each running different micro-services.
- There could be database changes associated with an application change. Indeed, the application change could – potentially – consist solely of a database change. Database updates are pretty much always deployed as deltas – alter scripts are used to add and remove columns, tables, and indexes.
In the same way that a single binary can be built from lots of different source files (each of which is under its own version control), so an application can be made up of lots of different binary components. Clearly, if the application consists of lots of different components and those components are resident on different servers or containers then deploying every component every time a new application version is deployed would be wasteful, time-consuming, and error-prone.
An application version “delta” would therefore track which components have changed between the version of the application being deployed and the versions of the components on the target environment. This allows the deployment tool to only deploy the components that have changed. So what do we need to track to make this work?
- Component Versions. Obviously, we need to version each individual component so that we know which version of which component is included in any particular application version.
- We need to track which component version is present on any particular end-point within a target environment. That way, if the application we’re about to deploy has version 5 of a component (a DLL or a WAR file, say) and version 5 is already present in the target environment then we don’t deploy that particular component.
This approach has a number of advantages. You can deploy to an existing environment and only deploy the components that have changed – minimising the downtime and reducing the risk of failure. You can deploy to a newly provisioned environment and deploy all the components within the application. So if you need to fire up a new virtual machine (or a container) then the deployment pushes everything – but the next time you deploy to that environment, only the components that have changed are deployed.
We may need to consider special cases when it comes to database components. Remember we mentioned that database changes are nearly always delivered as deltas – SQL “alter” scripts that make changes to the previous version of the database schema? Well, if all you’re doing is deploying the next version of the application then that works fine.
However, what if you’re deploying a version that’s several releases ahead of what’s in the environment? This can easily happen if the test and production environments are “pulling” deployments into their environment (as opposed to having them “pushed” via a Continuous Delivery process). Now, just applying the alter script that is associated with this application version won’t work in any environment that doesn’t contain the previous version of the application.
Picture an application (“My Application”) that is up to version 6. It has two components, a WAR file and a Database component that contains alter scripts that roll the database schema forward. Here are what the last 4 versions look like:
So version 3 has a new version of the WAR file. Version 4 has a new version of the WAR file and a database alter script that adds a column to a table. Version 5 only changes the database (it amends a stored procedure which uses this new column). Version 6 applies a new change to the WAR file.
Now, in a Continuous Delivery process the test rig being targeted will receive each new version. So when version 4 is deployed, the rig will receive version 4 of the WAR file and the alter script will be executed to add the column to the table. Then when version 5 is deployed, only the DB alter script will run (alter.sql;2) in order to amend the stored procedure. The WAR file is not deployed since the server already contains version 4 of the WAR file. When version 6 is deployed, only version 5 of the WAR file is deployed. This is exactly what we want.
But now what happens when we move to UAT? By this stage, we’ve done as much automated testing as we can get away with – in UAT real users need to log onto our test system to make sure they’re happy with everything. So we’re not going to do a “push”. In UAT, the test lead does a “pull” when they’re ready to accept the new version of the application for testing. So they pull “My Application Version 6” into their test environment and get their testers lined up with coffee and pizza.
But what happens if the test environment is currently on version 3? Well, for the WAR file, there’s no problem – there’s a difference in the component version for the WAR file so the new version is deployed. But what happens to the database? There are no alter scripts associated with version 6 of the application – but without applying alter.sql;1 and alter.sql;2 the database schema is not going to be valid for use with myapp.war;5. The testing will fail – not so much “falling at the first hurdle” as “falling in the paddock”.
Deltas allow us to cure this issue. What we need to do is identify the components that represent the database changes and get them to be applied in sequence for each interim version. In that way, successive database “deltas” are applied in order to roll the database forward to the correct schema version before the required version of the application is deployed.
So, in our case, going from version 3 to version 6 would deploy the following components:
So we will deploy (and run) alter.sql;1 to add the new column, then alter.sql;2 to amend the stored procedure, and finally, deploy myapp.war;5. Deltas mean we do the minimum required to get the application version to the desired state.
Versioning the components that make up an application means it is easy to determine the deltas (changes) between one application version and another. Recording what version(s) of the components are on the end-points in a target environment makes it easy to only deploy the components that have changed.
Only deploying the components that have changed minimizes downtime and makes the deployment quicker and less error-prone. Identifying the Deltas for a database gives us the ability to roll databases forward and jump between versions. This version jumping is invaluable when transitioning from an agile “push” to a waterfall-centric “pull” deployment model.