4 min read

A Multi-Language Reinforcement Learning Digital Twin Environment

A Multi-Language Reinforcement Learning Digital Twin Environment

One of the ideas I have been playing around with the last couple of months is the combination of Digital Twins and Reinforcement Learning. This is an experimental idea where I would love to hear your opinions about it (feel free to comment below, send me an email or reach out to me on Social Media such as Twitter / LinkedIn), and that will be refined over the coming months.

But before I can explain more in depth on what this means let's quickly go through the concepts of Digital Twins and Reinforcement Learning again:

  • Reinforcement Learning: "Reinforcement Learning is a Machine Learning paradigm that automatically teaches agents to take actions on an environment, such that the future reward is maximized". (More: Xavier Geerinck - RL Intro)
  • Digital Twins: "A Digital Twin is a virtual representation of a physical object" (More: Xavier Geerinck - Digital Twin)

After this small refresher, let's dive into the concept!


Reinforcement Learning Challenges

Reinforcement Learning - just as any new concept - has its own unique set of challenges that come along with it. Some of these challenges make it extremly hard to implement these algorithms, which is what I would like to tackle with this new concept. To name some of the issues in the Reinforcement Learning ecosystem today:

  • Integration with existing simulators is hard and requires the most time
  • Deployment of the model in an edge environment
  • Shaping of the reward (Reward Engineering)
  • Hierarchical State Overview
  • Subject matter expertise is required
  • Hiring of Reinforcement Learning specialists

Hypothesis Creation - Digital Twin and RL

Now when we look at the concept of Digital Twins, we can actually relate a Digital Twin as the "World" or "Environment" in Reinforcement Learning, which describes how the different sensors are coupled together. From this we can define an hypothesis that will help us understand the concept: "If we shape the environment in our simulation and physical environment in the same way (as a Digital Twin), would this provide benefits such as improved integration speed and separated concerns?"

I personally believe that the answer to the hypothesis above is "Yes we can", due to the following reasons:

1. Reward shaping can be integrated into the "Digital Twin" definition language

Looking at the example of our Windmill in the previous post, an extension can be added to this that allows us to describe the reward which is required in Reinforcement Learning. For Example, when we want to train our algorithm to maximize the blade rotation and constrain the oilLevel to make sure it stays ok, we could imagine the following being created:

twin windTurbine {
  windTurbineSensors wt
  weather apiWeather
  reward {
    (wt.gearbox.g1.isOk ? 0 : -100)
    + (wt.blade.b3.RPM)

This is actually very interesting for us! Seeing that no prior knowledge of the environment physics is required to tune these parameters and which is something that can be done without having a technical background (A Subject Matter Expert is thus able to write this as well). In our example, the algorithm will have to find out which action to take such that the blade rotation is maximized (e.g. do we have to engage our Yaw Motor to turn the blade around the tower? If yes, how much should we move?).

An extra advantage is that we can make this model more complex by adding state information previously unknown. This way we can for example add weather information, which could have a benefit for our algorithm.

2. Subject Matter expertise is split off

As discussed partly in point 1, since the actual integration between the ingestion system (events of physical and simulation environment) and processing system (reinforcement learning engine) is being split off, we allow a new role to purely focus on the Digital Twin definition. This is something a Subject Matter Expert should definitely be able to do, utilizing his knowledge of the existing system to focus on the reward engineering (which was previously known as a hard thing to do). A technology company can then focus on the processing system creation or ingestion system.

3. Extra: Cross Language compatibility potential

Since a layer is created that will focus on the translation of our events to the Digital Twin concept, we will have the possibility to ease the integration between the processing system and integration layer such that it becomes more accessible to other developers out there (currently mainly being constrained to Python developers).

Therefor I suggest utilizing a high-throughput technology such as gRPC to become language independent.


Translating all of the above into an architecture, an architecture should be created that acts as an intermediate layer between ingestion system (where events from physical and virtual environment are coming in) and the Reinforcement Learning system. This intermediate layer should have a metastore where the state representation can be created in an unique language, such that a Subject Matter Expert is able to create a state mapping without having knowledge of the underlying technical system.

Taking a shot on creating such an architecture, the following was found:



To illustrate the advantages of such a system, let's take the example of our Windmill example again. When business now comes to us with the following questions:

  • "When should we perform maintenance on the WindTurbine, such that downtime is minimal and thus economical impact is the least"
  • "How and when should we turn the Turbine to catch the most wind?"

We are now able to answer these through a self-learning algorithm that can steer itself without manual intervention. Achieving the real promise of AI in an understandable and easy to integrate way. Leveraging in-house knowledge from Subject Matter experts, without having to onboard new roles (e.g. data scientists).


In this blog post I explained the benefits of combining the concepts of Digital Twins and Reinforcement Learning to solve core issues in the ecosystem. To quickly summarize this concept, the different layers are being created with their respective personas:

  • 1 - Ingestion System (events of physical and simulation environment)
  • Responsible: Tech Company
  • Role: IoT Engineer
  • 2 - Digital Twin Definition System
  • Responsible: Your Company
  • Role: Subject Matter Expert
  • 3 - Processing System
  • Responsible: Tech Company
  • Role: Data Scientist

Currently this is an idea in development which I wanted to share for further refinement. Already in development as well is a technical proof-of-concept system that will be tackled in a future blog post. Looking forward to hear your comments on this idea!