Being able to leverage remote machines with more compute power or different accelerators than your local machine is a very nice capability to have - especially for the large C++ codebases I’m used to in the self-driving car world.

Background

There’re many different ways to try to lean on remote machines in your workflows, a non-exhaustive list of examples is:

You don’t need to use these methods individually, for example you may get the best results from combining all three:

My Specific Problem

Back in the day I used to do almost all my development in neovim over SSH, but over the past 4 years or so I’ve been seduced by VS Code. Usually I’ll alternate at random between sitting on my couch using VS Code Remote from a laptop and sitting at my desk and developing on my desktop machine locally. Recently the problem I’ve wanted to solve is being able to waste my life playing video games without having to kill my ML model training runs.

I spend most of my time in Ubuntu and flip over into Windows when I feel the itch to play a video game that involves crabs constantly leaping through the air spraying bullets. The obvious problem here (aside from my addiction to pretending to being a crab) being that the need to reboot over to Windows means that even if my RTX 3090 has spare capacity to render crab graphics I still have to stop my training jobs regardless.

Requirements

Based on the problem and my preferences I’ve drawn up some loose requirements for a remote editing solution.

Potential Solutions

The Dell R730 that runs a lot of my Kubernetes workload, including this blog, has reasonable compute capacity: 128GB DDR4 RAM, 72 logical CPU cores, a large amount of RAID’d storage, and a GTX 980 Ti. The GPU is nothing to write home about but I plan to replace it at some point in the near future with something more powerful, likely the 3090 out of my desktop when I upgrade to a 4090. Given that I’ve got a cluster with the hardware I need, the question is just what exactly do I run on it.

A Quick Market Survey

A quick glance shows that there are some good looking open source options floating around. The ones that caught my attention are:

If I were trying to support a large organization of developers I’d probably evaluate Eclipse Che, but since it’s just me and VS Code supports attaching to Kubernetes pods all I really need is a quick way to deploy, track, and connect to pods.

The Chosen Solution

I’ve taken a long road to an unimpressive solution for my use case: just use kubectl and label things well.

Since VS Code can handle attaching to pods, forwarding ports, and forwarding SSH credentials the only thing I need a solution to is provisioning. Unsurprisingly it turns out kubectl does that quite well out of the box.

commands

# Create a development environment
kubectl create -f example.yaml
kubectl label -f example.yaml domain.example/dev-environment=ml-1

# Find all dev environments
kubectl get all --selector "domain.example/dev-environment"

# Find resources in particular dev environments
kubectl get all --selector "domain.example/dev-environment=ml-1"
# Delete a particular dev environment
kubectl delete all --selector "domain.example/dev-environment=ml-1"

example.yaml

apiVersion: v1
kind: Pod
metadata:
  name: dev-pod
spec:
  containers:
    - name: primary
      image: ubuntu:lunar
      command: [sleep, infinity]
      resources:
        # If your development is really bursty you may want to set lower requests
        limits:
          cpu: "32000m"
          memory: "64Gi"
          nvidia.com/gpu.shared: "1"

Improving on kubectl

Using kubectl directly works fine for my small use case, but to support multiple users this approach needs work:

To solve automatic label management you could either make some shell functions and aliases or create a small Python (or your language of choice) program. I strongly favor an anything-but-shell approach if only because unit testing becomes a lot easier - but even in Python I’d probably still subprocess kubectl for convenience rather than use the Kubernetes API.

Providing the ability to customize the development environment really depends on what your existing deployment system looks like. For my purposes I use Jsonnet for most things, but you could also use Kustomize, Jinja templates, or generate your manifest from Python.

Stephan Wolski is a robot engineer, founder, angel investor, penguin enthusiast, and all around cliche.