The Emergence of the Software Bill of Material
Software Bill of Material has finally been recognized as an essential tool in the security toolbox. While we have been generating SBOMs for a very long time, it is only in recent months that we have started talking about using SBOMs as part of our security arsenal. In this article, we review SBOMs.
- What is an SBOM?
- SBOM Formats
- Challenges with SBOMs
- Understanding the Different Levels of SBOMs
- Tooling to make SBOMs Accessible
What is an SBOM?
Software Bill of Material (SBOM) reports document the source code, library and package dependencies, and the licenses consumed by the compiler or packager used to create a distributable artifact. The SBOMs are JSON or YAML files that live side-by-side with the built artifact.
Software Bill of Material is required to take the next step in supply chain security, the vulnerability scan. To know your vulnerabilities, you need to know what code, library, and package dependencies to scan. So SBOMs become the first level of defense required for all downstream reporting.
CycloneDX and SPDX are the two popular formats for SBOMs.
- CycloneDX has a viewpoint of SBOMs from the OWASP world.
- SPDX has a viewpoint originating from a license consumption model.
The formats achieve the same result, documenting what is used to create an artifact by digging into obfuscated transient open-source dependencies.
Between CycloneDX and SPDX, you will recognize an overlap in terminology with a different overall structure in the file formats. See https://cyclonedx.org/specification/overview/ and https://spdx.dev/specifications/
These schema definitions equate to a series of tables and relationships between the tables in the RDBM world. The CycloneDX JSON structure and documentation are easier to understand and follow. SPDX goes into much more detail, but the examples are XML snippets. With SPDX, there is no single view of the whole spec represented as a single YAML or JSON file.
Challenges with SBOMs
While Software Bill of Material is critical to our overall security practices, they are not perfect. Here are the top challenges with SBOMs.
Documenting Supply Chain
There is no complete set of SBOM tools that will document the entire supply chain that makes up a single software solution. In addition, there are many gaps that hackers can take advantage of to introduce nefarious artifacts into your software system. For example, using wild cards in configuration files passed to your packaging step can easily pull objects into your package that could be problematic. Think of the Solarwinds hack.
Overall, every artifact must be reproducible, have explicitly defined inputs, be signed and verifiable, and have an immutable SBOM. However, we still have a long way to go to build this level of security into our SBOM process. For example, a standard SBOM that reports the source code used to compile an artifact is unavailable. OpenMake Software included a build Audit with OpenMake Meister, and IBM Rational offered ClearMake that provided an audit. However, most hardcoded build scripts do not have this level of SBOM generation.
Vulnerabilities of Tools
Another issue is the tools to generate SBOMs have vulnerabilities themselves. The vulnerability resides when the tools run, post-build or post-packaging. SBOM generators use files that the compiler or package program creates, such as Rust’s cargo.lock, NPM’s package-lock.json, or an apt-cache. The gap between the build and SBOM creation provides a hacking opportunity where the SBOM report can be manipulated, resulting in untrue SBOMs.
Lack of Signing of the SBOM
Other challenges include the lack of signing of the SBOM, so we know who created the artifact and the corresponding SBOM report.
The biggest challenge is the mutability of the SBOM files. SBOMs can be edited without any trace to manipulate the data to deceive the reader. For example, hackers can easily update an NPM package-lock.json prior to your NPM SBOM generation. By doing so, they can disguise any hack in the SBOM. Immutability becomes an issue at all levels of SBOM generation.
What the SBOM was generated for, a microservice or an entire application? In traditional development, SBOM is created at the point in time when the application version’s artifacts are created, giving us an SBOM based on that application version. In microservices, each service is built and deployed independently. An application is a logical collection of microservices. Application versions are created when an underlying microservice version is updated. So how do we track application-level SBOMs when the application is only a logical representation and never built as a complete unit?
Understanding the Different Levels of Software Bill of Material
Multiple Software Bill of Materials are produced at different levels of the software stack. The following SBOMS are needed to have a complete audit trail of the entire supply chain used to create a single artifact (microservice, binary, library, etc.):
At the lowest level, the supply chain starts at the underlying hardware used, for example, the processor type (amd64 vs arm64). Yes, this level of the stack can also have vulnerabilities. However, this information is not currently gathered by default, even though the SBOM schema can store it. In CycloneDX this data is stored as Components.
Knowing the hardware used by your build is essential in recreating an artifact. For example, tools in the future may use a ‘consensus build networks’ where all machines must be identical, starting at the lowest level. The hardware SBOM is the way to confirm this level of parity. Check out JFrogs Pyrsia open source project for a consensus build network.
Operating System SBOM
The next level of the supply chain is the OS. We need to know the underlying operating system packages installed on the build machine. These operating systems packages affect how the compiler and packagers produce their artifacts and impact the ability to reproduce a build. It could also introduce vulnerabilities from old OS packages still being consumed. OS SBOMSs are not currently gathered by default, even though the SBOM schema can store them. In CycloneDX this data is stored as Components.
The software compile translates source code into objects. To create a source-to-object parity, we must know what the compiler consumed for input and produced as output. Unfortunately, Software Bill of Materials at this level do not exist outside of commercial build tools such as OpenMake Meister and IBM Rational ClearCase (ClearMake).
Most build scripts that produce a ‘build audit’ report on the files found in the ‘local’ build directory and not based on what the compiler used. These files are pulled from a source repository (git directory) and listed as the source SBOM. However, the ‘local’ build directory is not the only place where the compiler will find files. If we are to be accurate in our SBOMs we cannot miss files. Software Bill of Material must include all files, even when managed outside your versioning tool. In other words, it must be done at compile/link time to be accurate.
In the future, compilers need to be updated to produce SBOMs as standard output with the ability to control the inputs and outputs carefully. OpenMake Meister achieved this control using of a Search Path and generated Build Control Files. Another approach is to monitor the file system for reads and writes by the compiler, like ClearMake. The concepts of the GnuMake VPATH, with a ‘first found’ reference is as relevant today as it was 30 years ago.
Transitive Dependencies SBOM
Transitive dependencies happen when A depends upon B and B depends upon C. C is a transitive dependency to A. SBOM tools will rely on the compiler or packager to output the transitive dependencies, or they will be determined using a recursive lookup. Most Software Bill of Material at this level is accurate and represents where most teams have focused their attention. Transitive Dependency SBOMs are critical as they expose open-source packages.
Packagers, such as NPM, docker build, and dpkg-deb, pull together files and artifacts into an installable package based on a scripted configuration file. They produce two outputs, the package, and the package dependency file. A package dependency file, like package-lock.json or cargo.lock, are example outputs of the packager.
Some Package SBOMs at this level may contain source code dependencies based on the programming language, packager, and SBOM tools used. Like the compiler SBOMS, the package SBOM may only reference files from the ‘local’ build directory, ignoring all other file locations.
Application SBOMs aggregate together all of the lower SBOMs into a single view. Tools such as DeployHub and Ortelius snapshot all of the available SBOMs for each artifact in the supply chain. When consumed by an application, the Software Bill of Materials data is aggregated to the logical application level, creating an Application SBOM.
Without this aggregation ability, we lose the application level SBOM.
Tooling to make SBOMs Accessible
While your team may be doing a great job of generating the different levels of SBOMs for each build, we are far from finished. We need the ability to see the information so that it can be used and acted upon by multiple team members. And the SBOM information must have immediate relevance. After all, what good is a microservice’s SBOM if we have no insight into which applications are using the service?
Maturing in the use of SBOMs will drive the need to aggregate the data and display the information in a way that is easy to read and useful to all. The first step is finding an easy way to associate the generated SBOMs based on the artifact in a simple dashboard view. Unfortunately, we cannot expect everyone to know where the last build was executed and where the SBOM data lies.
Secondly, in a microservice architecture, we need a method of aggregating SBOM data up to the ‘logical’ application level. This aggregation requires tooling that can define a ‘logical’ application and build relationships between the microservices (or any component) and each ‘logical’ application version.
As you can see, there are many places where hackers can manipulate artifacts and Software Bill of Materials. OpenSSF is working to secure this process by implementing signature tools like Sigstore and Notary v2 for signed artifacts, packages, and SBOMs. JFrog is working on an open-source solution called Pyrsia for implementing consensus-based build networks that minimize hacked build machines.
DeployHub is working on Ortelius, a unified governance catalog for tracking microservice SBOM data, based on versions and aggregating all of the SBOMs based on a logical application. It will be the task of all to begin building this level of audit control into our DevOps pipelines and acting on the DevOps intelligence continuously. In other words, much investment is now going into solving the security problem.
Centralizing all SBOM Data with DeployHub
With microservices at scale, Developers, DevOps, and Security teams struggle to understand dependencies, SBOMS, vulnerabilities, impact, and microservice ownership. They grapple with microservice versions drifting across clusters or the sprawl of redundant services.
DeployHub is a microservice catalog that provides governance around the software supply chain. It centralizes SBOM and CVE information for each microservice and aggregates this data up to the application level. DeployHub’s microservice catalog centralizes this level of information, making what’s hard about microservices easy.
Unique to DeployHub is its method of versioning microservice updates. DeployHub automatically creates new versions of the ‘logical’ application when the underlying microservice change.
DeployHub integrates into the CI/CD pipeline to continually monitor microservice updates. It tracks ownership, SBOM, CVE, consumers, and inventory across the enterprise. DeployHub simplifies cloud-native architecture by governing your microservice supply chain in one place.
Aggregated Application SBOM and CVE
DeployHub is based on the Ortelius open-source project incubating at the Continuous Delivery Foundation.