Understanding Software Bill of Materials
Software Bill of Material has finally been recognized as an essential tool in the security toolbox. While we have been generating SBOMs for a very long time, it is only in recent months that we have started talking about using SBOMs as part of our security arsenal. In this article, we review what to know about Software Bill of Materials.
Defining Software Bill of Materials
What is an SBOM?
Software Bill of Materials (SBOMs) are used to document the source code, library and package dependencies, and the licenses consumed by the compiler or packager used to create a distributable artifact. The SBOMs are JSON or YAML files that live side-by-side with the built artifact.
Software Bill of Material is required to take the next step in supply chain security, the vulnerability scan. To know your vulnerabilities, you need to know what code, library, and package dependencies to scan. So SBOMs become the first level of defense required for all downstream reporting.
With the complexities of software, the adoption of SBOMs (software bill of materials) is on the rise. Despite the rise of use, many organizations are still unfamiliar with the current SBOM formats.
What Are SBOM Formats?
SBOM formats are standards that define a unified structure for generating SBOM and sharing them with end users. These formats describe the composition of software in a common format that other tools can understand.
CycloneDX and SPDX are the two popular formats for Software Bill of Materials.
- CycloneDX has a viewpoint of SBOMs from the OWASP world.
- SPDX has a viewpoint originating from a license consumption model.
In terms of SBOM formats, CycloneDX and SPDX achieve the same result. These formats document what is used to create an artifact by digging into obfuscated transient open-source dependencies.
CycloneDX is a modern standard for SBOM with a viewpoint of SBOMs from the OWASP world. It is a lightweight Software Bill of Materials (SBOM) standard for application security contexts and supply chain component analysis.
Using CycloneDX, supplier, manufacturer, and target component, tools used to create the BOM, license information for the BOM.
SPDX is another SBOM format with a viewpoint originating from a license consumption model. It is an open standard for communicating SBOM information. This information includes components, licenses, copyrights, and security references. It is designed to streamline work and improve compliance by providing a common format to share data.
Differences Between CycloneDX vs. SPDX
How do these two formats stack up? Understanding the difference between these common SBOM formats is important. Here is a breakdown of each format and how they are different.
|Description||CycloneDX is a lightweight SBOM standard for application security contexts and supply chain component analysis.||SPDX is formed with the intent of creating a common data exchange format for information related to software packages for sharing and collection.|
|supported formats||CycloneDX supports XML, JSON, and protocol buffers and get source code on GitHub.||SPDX supports RDFa, .xlsx, .spdx and expands into other formats such as .xml, .json, and .yaml. There’s an online tool and GitHub repository.|
|support||supports referencing components, services, and vulnerabilities in other systems and BOMs as well|
|what it is||BOM Metadata, Components, Coordinates (group, name, version), Package URL, Common Platform Enumeration (CPE), SWID (ISO/IEC 19770-2:2015), Cryptographic hash functions (SHA-1, SHA-2, SHA-3, BLAKE2b, BLAKE3) |
Extensions: extension points to support future use cases and functionality
|SPDX Document Creation Information, package Information, file Information, snippet Information, other licensing information detected, relationships Between SPDX Elements, Annotations|
|features||CycloneDX can only be used to track the software components in a project.||SPDX tracks the software components in a product and shares information about the software components with other developers, buyers, and sellers.|
Between CycloneDX and SPDX formats, you will recognize an overlap in terminology with a different overall structure in the file formats. See https://cyclonedx.org/specification/overview/ and https://spdx.dev/specifications/
These schema definitions equate to a series of tables and relationships between the tables in the RDBM world. The CycloneDX JSON structure and documentation are easier to understand and follow. SPDX goes into much more detail, but the examples are XML snippets. With SPDX, there is no single view of the whole spec represented as a single YAML or JSON file.
Challenges with SBOMs
While Software Bill of Material is critical to our overall security practices, they are not perfect. Here are the top challenges with SBOMs.
- Documenting supply chain
- Vulnerabilities of tools
- Lack of signing of the SBOM
Documenting Supply Chain
There is no complete set of SBOM tools that will document the entire supply chain that makes up a single software solution. In addition, there are many gaps that hackers can take advantage of to introduce nefarious artifacts into your software system. For example, using wild cards in configuration files passed to your packaging step can easily pull objects into your package that could be problematic. Think of the Solarwinds hack.
Overall, every artifact must be reproducible, have explicitly defined inputs, be signed and verifiable, and have an immutable SBOM. However, we still have a long way to go to build this level of security into our SBOM process. For example, a standard SBOM that reports the source code used to compile an artifact is unavailable. OpenMake Software included a build Audit with OpenMake Meister, and IBM Rational offered ClearMake that provided an audit. However, most hardcoded build scripts do not have this level of SBOM generation.
Vulnerabilities of Tools
Another issue is the tools to generate SBOMs have vulnerabilities themselves. The vulnerability resides when the tools run, post-build or post-packaging. SBOM generators use files that the compiler or package program creates, such as Rust’s cargo.lock, NPM’s package-lock.json, or an apt-cache. The gap between the build and SBOM creation provides a hacking opportunity where the SBOM report can be manipulated, resulting in untrue SBOMs.
Lack of Signing of the SBOM
Other challenges include the lack of signing of the SBOM, so we know who created the artifact and the corresponding SBOM report.
The biggest challenge is the mutability of the SBOM files. Software Bill of Materials can be edited without any trace to manipulate the data to deceive the reader. For example, hackers can easily update an NPM package-lock.json prior to your NPM SBOM generation. By doing so, they can disguise any hack in the SBOM. Immutability becomes an issue at all levels of SBOM generation.
What was the SBOM generated for, a microservice or an entire application? In traditional development, SBOM is created at the point in time when the application version’s artifacts are created, giving us an SBOM based on that application version. In microservices, each service is built and deployed independently. An application is a logical collection of microservices. Application versions are created when an underlying microservice version is updated. So how do we track application-level SBOMs when the application is only a logical representation and never built as a complete unit?
Understanding Levels of Software Bill of Material
Multiple Software Bill of Materials are produced at different levels of the software stack. The following levels are needed to have a complete audit trail of the entire supply chain used to create a single artifact (microservice, binary, library, etc.):
At the lowest level, the supply chain starts at the underlying hardware used, for example, the processor type (amd64 vs arm64). Yes, this level of the stack can also have vulnerabilities. However, this information is not currently gathered by default, even though the SBOM schema can store it. In CycloneDX this data is stored as Components.
Knowing the hardware used by your build is essential in recreating an artifact. For example, tools in the future may use a ‘consensus build networks’ where all machines must be identical, starting at the lowest level. The hardware SBOM is the way to confirm this level of parity. Check out JFrogs Pyrsia open source project for a consensus build network.
Operating System SBOM
The next level of the supply chain is the OS. We need to know the underlying operating system packages installed on the build machine. These operating systems packages affect how the compiler and packagers produce their artifacts and impact the ability to reproduce a build. It could also introduce vulnerabilities from old OS packages still being consumed. OS SBOMs are not currently gathered by default, even though the SBOM schema can store them. In CycloneDX format, this data is stored as Components.
The software compile translates source code into objects. To create a source-to-object parity, we must know what the compiler consumed for input and produced as output. Unfortunately, Software Bill of Materials at this level do not exist outside of commercial build tools such as OpenMake Meister and IBM Rational ClearCase (ClearMake).
Most build scripts that produce a ‘build audit’ report on the files found in the ‘local’ build directory and not based on what the compiler used. These files are pulled from a source repository (git directory) and listed as the source SBOM. However, the ‘local’ build directory is not the only place where the compiler will find files. If we are to be accurate in our SBOMs we cannot miss files. Software Bill of Material must include all files, even when managed outside your versioning tool. In other words, it must be done at compile/link time to be accurate.
In the future, compilers must be updated to produce SBOMs as standard output with the ability to control the inputs and outputs carefully. OpenMake Meister achieved this control using of a Search Path and generated Build Control Files. Another approach is to monitor the file system for reads and writes by the compiler, like ClearMake. The concepts of the GnuMake VPATH, with a ‘first found’ reference is as relevant today as it was 30 years ago.
Transitive Dependencies SBOM
Transitive dependencies happen when A depends upon B and B depends upon C. C is a transitive dependency to A. SBOM tools will rely on the compiler or packager to output the transitive dependencies, or they will be determined using a recursive lookup. Most Software Bill of Material at this level is accurate and represents where most teams have focused their attention. Transitive Dependency SBOMs are critical as they expose open-source packages.
Packagers, such as NPM, docker build, and dpkg-deb, pull together files and artifacts into an installable package based on a scripted configuration file. They produce two outputs, the package, and the package dependency file. A package dependency file, like package-lock.json or cargo.lock, are example outputs of the packager.
Some Package SBOMs at this level may contain source code dependencies based on the programming language, packager, and SBOM tools used. Like the compiler SBOMS, the package SBOM may only reference files from the ‘local’ build directory, ignoring all other file locations.
Application SBOMs aggregate together all of the lower Software Bill of Materials into a single view. Tools such as DeployHub and Ortelius snapshot all available SBOMs for each artifact in the supply chain. When consumed by an application, the Software Bill of Materials data is aggregated to the logical application level, creating an Application SBOM.
Without this aggregation ability, we lose the application level SBOM.
Tooling for Software Bill of Materials
Tooling makes Software Bill of Materials accessible. While your team may be doing a great job of generating the different levels of SBOMs for each build, we are far from finished. We need the ability to see the information so that it can be used and acted upon by multiple team members. And the SBOM information must have immediate relevance. After all, what good is a microservice’s SBOM if we have no insight into which applications are using the service?
Maturing in the use of SBOMs will drive the need to aggregate the data and display the information in a way that is easy to read and useful to all. The first step is finding an easy way to associate the generated SBOMs based on the artifact in a simple dashboard view. Unfortunately, we cannot expect everyone to know where the last build was executed and where the SBOM data lies.
Secondly, in a microservice architecture, we need a method of aggregating SBOM data up to the ‘logical’ application level. This aggregation requires tooling that can define a ‘logical’ application and build relationships between the microservices (or any component) and each ‘logical’ application version.
As you can see, there are many places where hackers can manipulate artifacts and Software Bill of Materials. OpenSSF is working to secure this process by implementing signature tools like Sigstore and Notary v2 for signed artifacts, packages, and SBOMs. JFrog is working on an open-source solution called Pyrsia for implementing consensus-based build networks that minimize hacked build machines.
DeployHub is working on Ortelius, a unified governance catalog for tracking microservice SBOM data based on versions and aggregating all of the SBOMs based on a logical application. It will be the task of all to begin building this level of audit control into our DevOps pipelines and acting on the DevOps intelligence continuously. In other words, much investment is now going into solving the security problem.
Centralizing all SBOM Data with DeployHub
In a cloud-native microservices architecture, your SBOMs are generated and managed at the microservice level. Microservices are pushed across your continuous delivery pipeline independently and frequently. Every time a new microservice is updated, all of the consuming ‘logical applications’ have a new version with a new SBOM and CVE report. Developers, DevOps Engineers, and Security teams struggle to keep up with the changes and cannot easily provide SBOM and CVE reporting for all impacted applications. The result is the absence of governance, or a historical audit trail of the changes pushed to end users.
DeployHub’s SBOM automation tool solves this problem by centralizing the ‘evidence store’ data and continuously aggregating the information to the critical level, the ‘logical application.’ DeployHub provides the insights needed to harden the security of the software your end users consume.
Aggregated Application SBOM and CVE
DeployHub is based on the Ortelius open-source project incubating at the Continuous Delivery Foundation.
DeployHub Key Features
- Microservice Supply Chain Risks Managed
- Microservice Version Management
- Simplifying Microservice Applications
- Domain-Driven Design for Microservices
- Microservice Ownership
- SBOM Automation
- SBOM Management