3 min read

Machine Learning Operations - MLOps explained

Machine Learning Operations - MLOps explained

MLOps is the continuation of DevOps, extended for Machine Learning. It allows data scientists, data engineers, application developers, and the operations team to collaborate, reducing the time from model creation towards first production deployment.

"Putting Machine Learning models in production is easy when you have just a few, however when you work at scale and have to manage thousands of them, such technologies do not exist yet." - Xavier Geerinck

It was seen by Gartner in the phase of "Peak of Inflated Expectations" in its 2020 Hype Cycle for Data Science and Machine Learning report, with the plateau being reached in 2 to 5 years. Illustrating the expectations that are being set. MLOps is just as DevOps, not a technology, but rather a methodology that should be followed to increase efficiency and stability.

Now you may wonder: "Why would I adopt MLOps in my organization?". To answer that question, we should start by looking at the typical lifecycle in more detail.

MLOps Lifecycle

In the illustration above, we can see that a typical lifecycle starts with a data scientist experimenting and eventually creating a machine learning model. This however is not the bulk of the work, typically taking 1 week to create such a model.

1 week is an underestimation, not including the time for data gathering, extensive data processing, …

Once a model is created, collaboration is needed to promote it to production. This is what leads to the biggest chunk of the work, as different parties have to be involved to do so.

It starts by working together with an application developer, whose responsibility is typically to define parameters, define startup and finally package the model ready for deployment. A model is typically integrated through a REST API interface, such that anyone requesting the model can easily interface with it, send the request and receive a response (often a prediction or classification) that can be consumed by the demanding application.

Afterward, the model is provided to the operations team. They will take the packaged model and put it in production. Creating a production deployment is then similar to a typical deployment. Taking into account aspects such as scaling, tuning, caching, versioning, …

While the last block is similar to MLOps, where we create a pipeline that allows us to deploy models to production and manage them. The biggest nuance is in the type of system we are deploying here. In Machine Learning, we often work with "closed-loop" systems (ref. Closed-Loop Control Systems). Where feedback has to be gathered and taken into account for a future iteration.

MLOps Value

But where does the value of MLOps lie? Just as in DevOps, MLOps focuses on streamlining the production process, allowing for a continuous cycle. When this cycle is implemented well, MLOps will allow your organization to gain several benefits (including but not limited to):

  1. Reduction in model publishing cost
  2. Improved monitoring
  3. Correct feedback loop implementation
  4. Cross-team collaboration

MLOps at Scale

MLOps is still in its infancy stage, with Microsoft stating: "Putting Machine Learning models in production is easy when you have just a few, however when you have thousands of them, it becomes a hurdle"

"Putting Machine Learning models in production is easy when you have just a few, however when you work at scale and have to manage thousands of them, such technologies do not exist yet." - Xavier Geerinck

Because of this, companies such as Microsoft, Uber, Airbnb, and others build these platforms themselves. Often releasing this knowledge to the public through their cloud platforms, technology sessions, or open-source code available on GitHub.

Conclusion

If your company is utilizing Machine Learning today and you have multiple models in production but are not yet utilizing the MLOps methodology, it might be worth look into this! The benefits MLOps brings in the long-term will outweigh the disadvantages.