Often forgotten and taken for granted, an artifact repository is an indispensable part of the software development process. Throughout the build process, artifacts are created, and as the Continuous Integration process completes, new artifacts are produced and stored, allowing the Continuous Delivery process to take it and promote artifacts through the different environments leading into production. Artifacts are the lifeblood of the delivery process, where having performative, consistent, and sustainable results depends on well thought out artifact management. Versioning, security, and dependency management just starts to scratch the surface of the key benefits provided in the middle of the CI/CD process.
The core functionality of an artifact repository is to securely store artifacts generated. These artifacts are used for optimization during builds, and ultimately distribution. Container images, binary artifacts, java artifacts, and software packages are just a few examples of the types of artifacts that need to be managed. This is a function different from source code management, in that artifacts are treated and versioned as distinct objects.
For the sake of simplicity (and your time, dear reader), we will be focusing on the two most common high-level use cases you’ll encounter in modern DevOps. The tools used may overlap, or be entirely distinct.
In the cases of most complex software systems, a build involves multiple stages, many dependencies, and multiple components on the way to producing artifacts intended for deployment. Build tools provide a framework for managing these dependencies, and optimizing the process of taking source code to compiled binary artifacts. Build tools of note include Maven, Gradle, Make, Bazel, as well as a number of language and use case-specific artifact management tools.
Distinct from intermediate artifacts, deployment artifacts are versioned, secured, and centralized as the single source of truth. Typically configured as remote repositories, these are used as the handoff point from CI into CD. Container images are quickly becoming one of the most popular formats; though binaries, configurations like Helm Charts, and more are quite common as well. The most popular repositories today include DockerHub, ECR, JFrog, and more.
Let’s run down some of the most important qualities you should look for when making decisions around what repository fulfills your needs.
For security reasons, as well as efficiency, where you set up your repository matters. There are centralized models, models that utilize local caching, and decentralized/public models. For governance and simplicity, a centralized remote repository enables closer tracking of what software is being used, and streamlining release processes. The trade off in centralization is potential for poor performance, where pushing and pulling artifacts may take longer across different regions.
A locally-managed repository may be more performant, but also comes with overhead in managing servers, software, and storage. As cloud providers become more ubiquitous, a benefit they provide is the ability to centralize and host data in different regions to help with performance.
Beyond naming conventions, proper storage of your metadata such as date, version, and other common attributes provide proper documentation and traceability. In contrast to ad-hoc practices falling out of favor, discrete artifacts reduce the risk of the wrong code getting deployed, or loss of the ability to audit what was deployed and when.
Security in layers is best practice, and one of many ways to secure artifacts is to control how they are stored and distributed. Role-based access control (RBAC) should dictate who can store artifacts in which repositories, and who can pull those same artifacts.
Versioning meets permissions. Auditability means having a full paper trail for access, artifact history, and any other events that affect your artifacts.
As a part of a well-designed software delivery process, your artifact repository provides a convenient bottleneck, right before code hits production. If everything that goes onto a server must pass through a repository (no ad-hoc hotfixing), operations have the ability to scrutinize every aspect of the deliverable before it can be shipped. This can include security scans looking for vulnerable libraries, malicious code, validating license usage, and more.
The bill of materials (BOM) has been an integral part of modern manufacturing of physical products. As software becomes more complex, with more dependencies on third party libraries, software teams don’t have time to compile and understand every single component that goes into the building of a complex system. The process of pulling all the information together spans software development, and the repository provides the final gate for taking a full inventory of every component that went into building the software destined for production.
There’s no way to create an exhaustive list here, as there is an endless amount of possible artifacts. There are best in class tools for specific artifacts, and there are universal package repository managers. Here’s a short list of the most common categories you’ll see.
In an increasingly dockerized world, this category is quickly becoming front of mind for development teams. The key benefit of containers is the ability to ship all your code and dependencies as one, while the downside of this is the potential for large artifacts and shipping vulnerabilities hidden beneath several layers of abstraction. The best tools in this category solve for the proper storage of layers, as well as providing the ability to scan for potential vulnerabilities.
Maven repositories are a popular example. They are used both locally for intermediate artifacts, and remotely, including a number of publicly-available repos.
Package management tools like Pip, npm, Ivy, etc. depend on a wide variety of language/technology-specific package managers such as PyPi, and npm repositories.
Here’s a sampling of different tools you’ll see in use by teams today. Given how wide this topic is, these are not all competing tools, as much as tools that touch on different aspects of artifact management.
Maven is most commonly used for Java, though it can also be used to support other languages. Maven is primarily responsible for building software and managing dependencies. Within a Maven Project, dependencies and other build information are described in an XML file. As a part of the build process, Maven will download libraries from a remote Maven Repository.
Released by Google in 2015, it has quickly become popular for blazing speed and excellent dependency management through use of a dependency graph analysis. At Harness, we recently made the switch from Maven to Bazel to accelerate our development process. In short, a lot of the dependency management provided by Maven did not suit our needs as a fast-moving organization. Bazel also allows for the use of external dependencies, both public and private, with the help of Bzlmod.
Hard to pick just one here. Each cloud provider provides their own in Amazon’s ECR, Google’s GCR, and Azure’s ACR. If these are not the answer for you, there’s a plethora of great tools out there. JFrog Artifactory provides a plethora of package formats, Dockerhub is the default repository of choice (it’s in the name!), and VMWare's Harbor is an open-source option built with security in mind.
A few simple, but important best practices.
CI/CD was recently identified in a report as one of the prime targets for malicious actors, given the high levels of access to systems and end artifacts. Design your pipelines with security in mind, don’t add it after the fact!
Automation allows for repeatability and reduces manual toil. Small inconsistencies in manual processes put you at risk of process breakdowns as well as failed audits.
Artifact management in CI/CD has been solved many times over. Don’t write custom integrations you’ll have to maintain later.
Build processes today are increasingly complex, with many considerations around securing your software bill of materials, proper testing, speed, and efficient use of resources - and all this while not adding additional toil to teams that are already being asked to deliver faster than ever.
Does your build tool support all these ambitions? Harness CD integrates to all the best in class artifact repositories, and Harness CI automates the pains of managing your build tools. All this while letting you focus on satisfying the needs of your customers. Try it out today.