The Challenge of ML Systems

Many of the famous research papers to emerge in the Software Engineering space for ML are topics on building maintainable applications to service ever-changing end-user requirements. Continuous delivery enables software changes of all types to reach production environments in a safe, quick, and sustainable way.

To enable data scientists, data engineers, and ML engineers to scale their processes around data management, model training, and deployment and operationalization, we’ll break down the components needed to do continuous delivery for ML. 

Production ML systems are large ecosystems that contain three main components the data, the model, and the code that serves the output of the model to a consumer. Each of the elements of this ecosystem requires care and maintenance to reach production.  

Often changes made to data sets have effects on the data scientists who create the ML models and how ML engineers integrate with changes. Below is an illustration of the common functional silo that tends to emerge.

Continuous Delivery for Machine Learning Systems - challenges deploying ML applications - Harness
Common functional silos in large organizations can create barriers, stifling the ability to automate the end-to-end process of deploying ML applications to production.

Traditional functional silos in large organizations can create barriers, stifling the ability to automate the end-to-end process of deploying ML applications to production.

Defining your CD Pipeline

Define a deployment pipeline for a modern-day ML service. You’ll need to address a workflow for the data, model, and code. A pipeline that consists of modular workflows can be reused and modified for future deployments. Here are the steps to a typical deployment pipeline:

  1. Write code
  2. Commit code
  3. Build artifacts
  4. Test artifacts
  5. Deploy artifacts
  6. Verify artifacts
  7. Rollback artifacts

Let’s discuss the steps for an ML systems pipeline. 

Step 1: Building Discoverable and Accessible Data

The end state for this step is a versioned, discoverable, and accessible data artifact. Create a workflow that runs data cleaning and preprocessing on a data set to produce an artifact stored in a repository. You can leverage your Jenkins pipeline to trigger this process. Pull the artifact into your environment for the next stage. 

Step 2: Model Training

Your pipeline should include a workflow for model training and testing. Recall that workflows automate three things: the deployment of the service, test, and verification of the service and then rollback (if necessary). Consider the following workflow for model training:

  1. Pull the data artifact
  2. Run the function that splits the data into the training and validation sets
  3. Produce the model given the training data

Generate the model artifact for the next step.

Step 3: Real-World Exposure

We’ve produced a model artifact to be used by consumers; now, it’s time to deploy the model. There are many strategies for serving the model, including embedding the model into an existing application service, deploying the model as a separate service, or publishing the model to service data at runtime. Whichever pattern create a deployable artifact containing the model for the following workflow:

  1. Deploy said artifact to an environment, define a release strategy
  2. Test, Validate, and verify the latest model
  3. Rollback, if necessary

Add additional steps and stages to promote or deploy the artifact to a higher environment. 

Creating workflows that separate data, model, and code gives you the building blocks for robust and effective ML pipelines. Add more steps to any of the workflows to further validate and test your ML system.

Other Considerations

We described a barebones template for an end-to-end production pipeline for ML systems. Creating additional workflows and pipelines can be useful in cases where roles would like separation of concerns. Performing a deployment or verification step without triggering the entire pipeline to production. Here are some additional fitting scenarios to be included in your continuous delivery pipeline.

Data Freshness: 

For use cases where data freshness is critical to the accuracy of the model, continuous data collection. Create a separate pipeline that deploys a service to collect, organize, and produce the final data sets. Gather valid verification steps and tests for proper data handoff and build them into your pipeline. This gives your data engineering teams ownership of their data changes and revisions. 


You should have criteria or thresholds for the performance of a model. Utilize your monitoring tools to integrate with your CD pipeline for automated verification built into your pipeline. Monitoring tools and dashboards expose API endpoints or plugins used for continuous verification. This way, you can ensure the quality of your model trained on your latest data set. 


Trigger your rollbacks through your failure criteria. You can use a rollback to trigger the pipeline that will retrain your model with the latest data.


Continuous delivery (CD) enables software changes of all types to reach production environments in a safe, quick, and sustainable way. The goal is to make deployments, in whatever architecture, predictable, and routine such that developers can perform deployments on demand. This gives teams the freedom to innovate and move fast without breaking things.

We discussed a start-to-finish template for the continuous delivery of Machine Learning systems. There are tools like Tensorflow that help augment the workflows needed for the continuous delivery of models. When automating an end to end process, it’s essential to consider the future cost and flexibility of defining cloud, code, or framework based deployments. We also discussed additional scenarios to consider when automating this process. I hope this blog post, including insights into your continuous delivery ML pipelines. Learn more about Harness and our software delivery platform.