Deployment of new code is not instantaneous and can bring unexpected challenges, even in seemingly simple cases. Take, for example, the deprecation of a form page that has front and back-end code changes. If the back-end change gets deployed before the front end, you end up with a major user experience failure. This problem is frequently encountered by an application that runs on dozens of nodes, and each node fields web requests behind a load balancer.
Deprecating a Feature
My team recently decided to deprecate a feature of a tool that had two file importers; one of which had all of the other’s functionalities. The team decided to rely on the single importer, making it easier for our customers to get their job done, and less code for us to maintain.
Conceptually, the code change is straight-forward:
- Delete form to submit file for import.
- Delete deprecated route.
- Delete the deprecated controller action.
Is this safe to deploy?
Sometimes, engineers responsible for writing application code think that they can ignore how the application is deployed. It's not that simple, though. Especially, when dealing with no-downtime deployments. At Procore, when new code is deployed (which can be multiple times a day), new nodes are spun up with the new code and slowly replace the old nodes in the load balancer. This process takes around 10 minutes. We can further assume that Procore's traffic is high enough that every route is going to be used multiple times during this time.
If we take a close look at the feature we are deprecating, we can observe that it works across two different requests. On the first request, the upload form is rendered in a view (we can think of this as the front-end code). On a second request, the file is submitted for import (the back-end code).
During a deployment, both the first and the second request can be served by a node running the old code (v0) or the new code (v1). A timing diagrams shows which code is running during the deployment:
The diagram implies that the following combinations can occur:
The FEv0/BEv0 and FEv1/BEv1 combinations are clearly compatible: They are our starting and ending states. FEv1/BEv0 is not really possible, since a browser that doesn't show the import form can't submit it.
The incompatibility results when a user is served a request from a node running v0. They see the form and use it to submit a file. That second request is routed to a server running v1, which doesn't know how to handle it. The server returns a 500 error. Sadness for the customer. Sadness for the developer team.
Notice that BEv0 is compatible with both front-end versions. If we split the code changes into FE and BE, we can sequence the deployments. The first deployment keeps the back-end code, but the front-end no longer shows the link to delete. On the second deployment, the routing and controller changes are made.
With this configuration, the incompatible combination of requests can't happen anymore because FEv0 and BEv1 never run concurrently.
Writing and deploying code are sometimes seen as unrelated activities, handled by different engineering teams. However, since deployments are not instantaneous, how the code is deployed needs to be considered by engineers writing application code. Even trivial code changes should be evaluated at a certain scale for potential effects when multiple code versions are running simultaneously.
If solving tricky deployment issues appeals to you, come join us!