Next to spinning up a Ray cluster on a Kubernetes Cluster, it's also possible to deploy it on Azure! (actually it's even easier).
Make sure you have ray installed and are connected to Azure.
# Install ray + azurepip install ray azure-cli azure-core# Authenticate on Azure + set subscription idaz loginaz account set -s <SUBSCRIPTION_ID>
Installing Ray on Azure
To start on this, we can utilize the Ray AutoScaler from the repository. Download this file and edit it to your wishes, afterwards execute the command below.
⚠️ You can edit the
example-full.yamlfile to set another subscription than the one logged in with (default).
In our example, I added the following EXTRA commands:
head_setup_commands:- pip install azure-cli-core==2.20.0 azure-mgmt-compute==19.0.0 azure-mgmt-msi==1.0.0 azure-mgmt-network==18.0.0- sudo apt-get install libglib2.0-0worker_setup_commands: setup_commands: 
Finally, start the cluster with:
ray up example-full.yaml
⚠️ At the time of writing this article, there was a small issue (PR #14750) in the upstream version. To correctly deploy this, make sure to include
setup_comands: in your deploy file, else they will be overwritten .
Once this setup finishes, we will see something like this as output:
Acquiring an up-to-date head nodeLaunched a new head nodeFetching the new head node<1/1> Setting up head nodePrepared bootstrap configNew status: waiting-for-ssh[1/7] Waiting for SSH to become availableRunning `uptime` as a test.Fetched IP: <MASKED_IP>ssh: connect to host <MASKED_IP> port 22: Connection refusedSSH still not available (SSH command failed.), retrying in 5 seconds.# -- snippedWarning: Permanently added '<MASKED_IP>' (ECDSA) to the list of known hosts.To run a command as administrator (user "root"), use "sudo <command>".See "man sudo_root" for details.09:45:57 up 1 min, 1 user, load average: 2.38, 0.77, 0.28Shared connection to <MASKED_IP> closed.Success.Updating cluster configuration. [hash=60c1cc4dff2c06f8a558dd628bc149cd3fad461d]New status: syncing-files[2/7] Processing file mountsShared connection to <MASKED_IP> closed./home/ubuntu/.ssh/id_rsa.pub from /home/xavier/.ssh/id_rsa.pubShared connection to <MASKED_IP> closed.Shared connection to <MASKED_IP> closed.[3/7] No worker file mounts to syncNew status: setting-up[4/7] Running initialization commandsWarning: Permanently added '<MASKED_IP>' (ECDSA) to the list of known hosts.To run a command as administrator (user "root"), use "sudo <command>".See "man sudo_root" for details.Connection to <MASKED_IP> closed.Warning: Permanently added '<MASKED_IP>' (ECDSA) to the list of known hosts.Connection to <MASKED_IP> closed.[5/7] Initalizing command runnerWarning: Permanently added '<MASKED_IP>' (ECDSA) to the list of known hosts.Shared connection to <MASKED_IP> closed.nightly-py37: Pulling from rayproject/ray5d3b2c2d21bb: Pull complete3fc2062ea667: Pull complete75adf526d75b: Pull completecb9cc0ffd7d7: Pull complete20e6bba2821c: Pull complete5f94c257d7a8: Pull complete8d2d31defa88: Pull complete0dc6a7b56a50: Pull complete96fa1d3e5cdd: Pull completeDigest: sha256:f3f7961c9b2fba6f870027b279fada0f5f53bdd02b23c95310c57bf6ab4c154cStatus: Downloaded newer image for rayproject/ray:nightly-py37docker.io/rayproject/ray:nightly-py37Shared connection to <MASKED_IP> closed.NVIDIA-SMI has failed because it could not communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.Shared connection to <MASKED_IP> closed.2021-03-19 10:47:53,745 WARNING command_runner.py:901 -- Nvidia Container Runtime is present, but no GPUs found.Shared connection to <MASKED_IP> closed.bc7d51d7d412b7286c45dc1c1ac5ba256f31b322619d7102c55d6dd6815c0d32Shared connection to <MASKED_IP> closed.# -- snipped python packages installationSuccessfully installed PyJWT-1.7.1 azure-cli-core-2.20.0 azure-mgmt-compute-19.0.0 azure-mgmt-core-1.2.2 azure-mgmt-network-18.0.0 cryptography-3.3.2 knack-0.8.0rc2 msal-1.10.0Shared connection to <MASKED_IP> closed.[7/7] Starting the Ray runtimeDid not find any active Ray processes.Shared connection to <MASKED_IP> closed.Local node IP: <LOCAL_NODE_IP>2021-03-19 02:48:49,516 INFO services.py:1256 -- View the Ray dashboard at http://127.0.0.1:8265--------------------Ray runtime started.--------------------Next stepsTo connect to this Ray runtime from another node, runray start --address='<LOCAL_NODE_IP>:6379' --redis-password='<REDIS_PW>'Alternatively, use the following Python code:import rayray.init(address='auto', _redis_password='<REDIS_PW>')If connection fails, check your firewall settings and network configuration.To terminate the Ray runtime, runray stopShared connection to <MASKED_IP> closed.New status: up-to-dateUseful commandsMonitor autoscaling withray exec /home/xavier/Projects/azure-rllib/rw-train/azure/deploy.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'Connect to a terminal on the cluster head:ray attach /home/xavier/Projects/azure-rllib/rw-train/azure/deploy.yamlGet a remote shell to the cluster manually:ssh -tt -o IdentitiesOnly=yes -i ~/.ssh/id_rsa [email protected]<MASKED_IP> docker exec -it ray_container /bin/bash
Running a Test
Since our cluster is now installed, it's useful to check out what it can do! So let's start by running a test.
Create a python file named
cartpole.py with the following content:
Once that is created, we can submit it to the cluster with:
ray submit deploy.yaml cartpoly.py --start --tmuxray attach deploy.yaml --tmux
This command will take care of starting the cluster when needed and execute our command resulting in the below as output:
The Ray library is simply amazing in what it does and how it does it. Running distributed compute clusters in cloud has been made super easy. Together with Spot instances, it's a clear choice to utilize Ray whenever we are working with for example Reinforcement Learning!