Thoughts on securing the software supply chain for development organizations

Andy Micone
9 min readApr 25, 2022
Photo by FLY:D on Unsplash

Ever since the SolarWinds incident, the security of the software supply chain has been a lingering topic in Information Security. Organizations, in general, organize security operations around security frameworks that have proven useful to and align organizational processes to best security practices. However, these frameworks whether highly prescriptive or whether they simply help align security control families and the processes involved in implementing them, haven’t done much to help secure the supply-chain. For example, one major public repository, npmjs, suffered several supply-chain attacks in 2022 with attackers using obfuscation techniques over a decade old. Clearly, the risk management practices around supply chains are not keeping pace with advancements in software development, because the software supply-chain is not being treated as an area of risk.

Security frameworks don’t seem to be helping

Security frameworks like NIST CSF and ISO 27001 help organizations understand the iterative nature of Security Operations, continuous improvements in each part of identifying resources to protect, protecting them, detecting vulnerabilities and security incidents, responding to those incidents, and recovering from them. As a baseline for security operations, a process of continuous improvement should be core to a viable security program. However, various industries, government entities, and suppliers require ever more prescriptive frameworks and controls for how security operations conduct its business. Though all these frameworks embody useful practices and controls, too often organizations buckle under the weight of bureaucratic baggage, checkmark compliance, and documentation fire-drills. Further, frameworks like ISO 27001 and CMMI have become sales qualification tools in their respective industries, making them more desirable for marketing purposes than as a roadmap to greater security.

Despite the universe of critical thinking that has gone into these frameworks as comprehensive frameworks for how to organize security operations, none of them despite how prescriptive or non-prescriptive they are seem to have helped with the software supply-chain issue. If the issue isn’t the framework an organization uses, then what is the issue?

We don’t talk enough about risk management

Software supply-chain risk is one of the preeminent risks that most information security professionals must deal with today. When you think about the fact that packaging an out of date log4j class library packaged in an otherwise unassuming jar file could compromise an entire enterprise, we need to discuss why our current frameworks for addressing risk aren’t addressing that risk. When I surveyed the overall focus of integrated risk management practices across 170 highly cited academic papers , I discovered that 20% of papers discuss integrated enterprise risk management without discussing information security at all, 10% concern themselves with solely with information security risk management, and the other 70% concern themselves solely with vertical industry risks.

In general, organizations simply aren’t having enough conversations about integrating risk management outside of industry vertical concerns and risk management in specific practice areas. For example, project management is one practice area that is deeply concerned with risk management, but not in a way that easily integrates information security risks. The minority of companies have robust risk management practices across organizational concerns that include information security. It should be then wholly unsurprising that supply-chain risk is a tertiary concern in enterprises, only gaining attention when high-profile, critical severity exploits become public. If we look at more prescriptive information security frameworks that support tiered adoption models and ask what a more mature risk practice would look like, they themselves really have very little to say on the topic.

Though this is clearly a more expansive discussion, if we consider the narrower case of what an optimized level of practice would be in software supply-chain risk management, we can see a few general practices across other practice-area frameworks along with several widely adopted technical safeguards adopted as best practices within the industry.

Technical Safeguards

In general, the most effective means to mitigate supply-chain risks in software distribution outlined above have their own issues that must be dealt with. Looking at the common-industry safeguards:

  • Security qualified software repositories
  • Checksum and signature validation
  • Security disqualified block lists
  • Transitive dependency isolation and qualification
  • Package SBOM origin tracking and validation
  • Open standards for supply-chain risk management

There are synergistic concerns. For example, in a typical continuous integration pipeline, transitive dependencies may be pulled in a software’s SBOM without the developer being aware of their origin or modification by intermediaries. Even with security qualified software repository subscriptions, developers must know enough to change their build systems not to pull in transitory dependencies, and this goes back to the lack of open standards for SBOM as developers have no way to determine if those transitory dependencies are contained in larger packages without standardization across different repositories.

What’s going to fix the supply-chain?

The real question is what’s going to fix the supply-chain? From a security operations perspective the “shopping list security” solution is to adopt a security qualified repository for both your operating system and software component repositories. For example, a security qualified distribution of Linux is one of Red Hat’s larger contributions to the industry from an Information Security perspective and several solutions like Artifactory, MyGet, and Nexus claim to solve the secure repository problem. However, these have not solved the issue as a misconfiguration in a build file can easily pull in transitive dependencies from a public repository. Developers in a time-crunch often will pull in software components from public repositories that seem to be a quick fix, but often contain hidden strings attached. Indeed, several security researchers I interacted with writing this article tell me that they’ve been seeding docker images out in the wild to see how many developers they can catch unaware, and those results have been depressingly predictable. So, what will solve the problem?

  • A Real Open Standard for SBOM’s — The primary issue is having the scaffolding to characterize what is in a repository package, the “Software Bill of Materials.” Every large company leveraging OSS software components has proposed its own standard or working group. Organizations like the Mozilla Foundation and the Free Software Foundation offered standards and reference components that solve parts of the problem. Traditional OSS methods like providing hash codes for validation of code and software components still work, but don’t provide mechanisms for handling these in an automated fashion. Traditional package managers only examine packages at a surface-level, and even these have been shown lately to be vulnerable to manipulation through credential compromise. Until someone comes up with a real open standard to characterize what is in a package, with reference code and appropriate tooling, that works across languages and architectures, the chance of being able to tackle the problem without significant advancement in current prevention technologies (with the attendant price tag) seems insurmountable. Packages contain too many things, from too many places, some of which are hard to validate proprietary components mixed with OSS components. Ultimately, the problem isn’t that there isn’t the technology out there to solve the problem, there’s just too many technologies out there trying to solve the problems, and no standard that works for everyone.
  • A Problem in Search of a Business Owner — As discussed, the software-supply chain issue is ultimately a risk issue. Though it is easy to create a list of best practices, it ultimately is a risk that needs a business owner to drive putting the practices into place. Development would seem to be the obvious owner, but expediency often takes precedence over policy in development. If information security is the owner, it pits the businesses rollout schedule against the security operations, which is a battle that information security always loses. An integrated risk management model is needed where information security risks are understood in parallel with business risks, but as has been demonstrated, there are precious few enterprises where such a model exists.
  • Better Build Management — Build systems have become increasingly complex to the point where DevOps is a separate workstream from Development. Though a lot of IDE tools attempt to solve this problem, they often do so in a way where security isn’t considered, and in some cases even clobber security specific configurations. The complexity is particularly vexing. After every major, widely publicized supply-chain incident you can see numerous questions to the effect “how can you turn off transitive dependency inclusion in <build system>?” on StackExchange, with a multitude of answers, very few of which seem to work. DevOps needs an expert on the build system and a method to maintain the integrity of build files so specific tweaks for security are not overwritten by unwitting developers fighting deadlines.
  • Code Staging Still Needs to be Considered — The promise of DevOps was the elimination of strict code staging environments and methods through implementation of a robust CI/CD pipeline. The CI/CD pipeline was supposed to perform quality checks that ensured as code promoted from development and production that the entirety of the regression testing and security qualification was built-in the pipeline. In a mature DevOps environment this approach generally works very well. However, the number of enterprises that have CI/CD pipelines that perform these robust validation tasks are far outnumbered by the number of enterprises with aspirations to be like them. The tendency in development over the past few years has been to treat the as-is CI/CD pipeline as if it was the to-be pipeline; essentially a “hope to be pipeline.” Blame for this doesn’t wholly lie with developers. Many developers’ education has not included secure development practices, and they follow a pattern of development they’ve been told is secure. Those patterns more generally do some rudimentary security checks, but for the most part, they only validate that the application or software component operates as expected from an interface standpoint. For many environments, the old model of having a properly configured staging or pre-production environment with controls that prevent developers from promoting code directly to production still need to be considered. At a minimum, availability is still an information security concern. One of the most preventable impacts on security operations is when something gets rolled to production, bugs prevent it from operating correctly, and downtime is blamed on either hacking or security tools.
  • Entropy Management and Chaos Engineering are Still Underutilized — Embedded hardware manufacturers have long had a practice of entropy management; ensuring that turmoil to the code base is minimized before product rollout. When the costs of recalling products were astronomical, this practice made sense. As the cost of pushing out a new release through a CI/CD pipeline became trivial, “ship it and fix it” became an increasingly commonplace practice. This practice, unfortunately, is also responsible for multiple security incidents. Though old practices like having code change committees are now uncommon and will probably never return, the use of chaos engineering is a modern practice that can have the same impact. Chaos engineering, however, remains the realm of a few AppSec practitioners, with few if any developers being educated in its practice. Use of these practices, again, still requires an appropriate business owner to ensure that developers are properly educated, and the techniques of chaos engineering are part of DevOps. For example, consider Netflix’s well known Chaos Monkey, a chaos engineering tool that randomly brings down production servers to test the robustness of Netflix’s streaming infrastructure. For that tool to be viable, a business owner had to decide that having an infrastructure that resilient was an important, measurable goal and that SRE and development would agree to engineer systems so that they could be randomly toppled over at any time. Not only was a business owner required to own the risk, but several cross functional teams across the enterprise needed to understand both the necessity and operation of the tool and develop secure development techniques to make their infrastructure survivable enough to compensate for its use in production.

--

--

Andy Micone

An adept road warrior. An authority on nothing. I'm a futurist, not a fantabulist. I keep on trying till I run out of cake.