In hundreds of conversations I have had with developers, cloud architects, and SREs, they all express the feeling of ‘flying blind’ and a lack of confidence when moving microservice updates to clusters. It was not too surprising when I began hearing the term ‘microservice blast radius,’ describing the impact a microservice update may have on their end-users.
The truth is, microservices are intended to be reused across multiple teams, but the implementation of a service-oriented architecture requires a shift in the way we organize and release microservices across all clusters. More than anything else, it requires a commitment to facilitating and leveraging the reuse factor across teams and across environments.
As we move away from monolithic methods where libraries are statically linked into binaries, we enter a practice where services are linked via APIs at runtime. This is a big shift. We must rethink the organization, tracking, and visualization of services in order to simplify the use of microservices and to fully recognize their core benefits.
At its core, not knowing your microservice blast radius is a configuration management issue. In order to address this challenge head-on, we need to go back to our roots and re-think software configuration management (SCM). While SCM may sound like an old and tired concept, it is actually the exact discipline needed to expose your microservice blast radius before you deploy.
SCM overall includes many concepts, including the use of version control, build management (compiling and linking the code), and configuration auditing (BOM reports). In a monolithic architecture, these basic constructs have served as our ‘north star.’ Yes, in a microservice architecture we will still use version control, but how much branching and merging will occur with thousands of smaller pieces of code? What does build management look like with microservices? In many cases, code is not compiled and the build is the process of creating a container image. So now what is a BOM report in a microservices application? Is it only a list of artifacts that went into the container for that service, with no impact on a larger application? And what is an application? Is it just a single service?
When we begin to pose these questions it becomes clear why a microservice blast radius is difficult to predict. We are no longer performing the kind of SCM that became a common habit in monolithic development. Static linking meant we could reuse libraries, but make them our own by easily renaming them – a common Microsoft development practice. If we take a similar approach in microservices, we begin to create microservice sprawl, a condition that further complicates the problem.
Microservice Sprawl – Don’t Do That
A lack of visibility into the usage of a single microservice across all of your clusters is a dangerous condition. When you don’t know your microservice blast radius and want to avoid an adverse impact on end-users, you are left with some undesirable options. One option is to run back to monolithic practices even in a cloud-native environment. What I’m seeing is the practice of using Kubernetes namespaces to silo ‘application’ configurations. In this scenario, all microservices used by a particular application team (product team) are running in their own cluster. This is a monolithic practice that leads to both cluster sprawl and service sprawl. This scenario almost guarantees that you will experience drift of your services across the clusters.
The more problematic issue is common services are deployed in multiple locations. While this method protects you from the impact of a new microservice across multiple applications, it forces you back to a monolithic approach. It does not facilitate a service-oriented architecture or leverage some of the scaling features of Kubernetes.
In the example below, we see cluster sprawl and microservice sprawl with drift. In this configuration, we are managing 3 clusters, 3 Namespaces, and 12 Pods, and that is for just 1 environment. If you have a dev, test, prod pipeline, you can multiply this number by 3. So we go from 12 to 36 services very quick. Notice in the example, the ‘shipping’ service drifts at the Clothing Store Cluster. This can easily occur if it is not known that the Clothing Store application used the Shipping service, or if the Clothing Store team chooses not to update their cluster with the new version. Both scenarios are common occurrences in this configuration.
Implementing a Shared Service-Oriented Architecture
To minimize the number of resources required, and reduce the microservice sprawl and drift, we need to understand what teams are using which services. This requires knowing your blast radius and communicating updates prior to a release. If you have a method of tracking which teams are using shared services, you can confidently build out a shared microservice architecture that includes notification to all teams that a new release is on its way. A shared services architecture reduces the complexity of microservices by eliminating the confusion caused by drift. Most importantly, a shared service architecture ensures that critical updates of the shared services are being consumed by all teams. A single release fixes the problem for all teams, even before they are aware that a problem exists.
In the example below, all 3 stores, Candy, Toy, and Clothing, receive the Shipping update at the same time. We are now building out a true service-oriented architecture where business agility can be achieved.
But in order to gain this level of coordination, there must be a method for automating the application’s service usage. Tracking relationships manually is an option for smaller organizations with relatively few shared microservices. But as your shared services grow, and you have more application teams consuming those services, automated configuration management, and tracking is required. Re-imagining of software configuration management in a microservice architecture is needed.
Let’s Start Re-Imagining – DeployHub Microservice Blast Radius Map
A microservice architecture is vastly different from monolithic practices and requires re-imagining of the SCM process if we are to solve the problem of the microservice blast radius. We need new methods that can predict microservices usage, impact, dependencies, and value across all clusters. This is one of the core benefits of DeployHub’s microservice catalog tool.
By publishing your shared services to the DeployHub Domain catalog, application teams can then package their application base version by identifying the shared services they use. As shared services are updated, DeployHub automatically builds an Impact Analysis map that clearly shows the blast radius for that service, listing all dependent applications even before you deploy. With this information, you can automatically push testing steps of all impacted applications and determine if the new service is ready for prime time. In addition, you can automate the notifications of all impacted teams so they are aware that a new service is on its way.
Regardless of which method your organization chooses, siloed clusters or a shared service strategy, the need for understanding the blast radius remains. However, if you understand microservice impact you can confidently move to a shared service architecture and finally retire a monolithic practice.
Building a shared service-oriented architecture has always been a challenge. In a Kubernetes environment with microservices, we have an opportunity to realize this goal – but not without some re-imagining. Our habit will be to return to our old ways of thinking and rely on siloed environments reinforcing old ideas of ownership that create inflexible architectures, even in a cloud-native environment.
To maximize on sharing services and facilitating reuse visibility around application dependencies and microservice usage becomes critical. DeployHub is designed to perform automated software configuration management, tracking application to service relationships, including the blast radius, removing much of the complexity around managing microservices in a shared service environment.
We can achieve a true service-oriented architecture. We simply need to re-imagine software configuration management in a microservice architecture and automate the process as soon as a new service release candidate is staged to go.