Skip to content

Get started

o‑o is a command line interface for running jobs on ephemeral cloud instances with tracked inputs and outputs. Building data or MLOps pipelines is as simple as stringing together multiple commands, as easily as running locally, but with the power of cloud compute and storage.

Highlights

  • Support for Scaleway and Google Cloud (more to come...)
  • Easily run any command in any language in cloud compute environments
  • Flexible. Define your own run environments with docker images and machine types
  • Traceable. Trace all inputs and outputs to the commands that produced them
  • You control your data. Data and source code is stored in your managed buckets

Installation and Setup

First install o‑o. It is recommended to install via pipx but installation with pip also works:

$ pipx install o-o
$ pip install o-o

Then login, and create a test directory. You will need a password-less ssh key, which you can generate, if you don't already have one.

$ mkdir test-o-o
$ cd test-o-o
$ o-o login
Visit https://o-o.tools/activate
Paste token:
SSH key location (~/.ssh/id_ed25519):

A cloud provider, machine type, and Docker image define an environment in which your commands are run. Datastores define where the output files of your commands are stored. To define environments and datastores, select the tab for one of the currently supported cloud providers: Scaleway or Google Cloud.

Start by generating an API key. On step 8, after clicking "Generate API key", you will see a "Credentials Usage" page. On this page, select the drop down "add API keys to your environment", and select the .env tab. Copy and paste the contents into a .env file to your current working directory:

.env
SCW_ACCESS_KEY=####################
SCW_SECRET_KEY=########-####-####-####-############
SCW_DEFAULT_ORGANIZATION_ID=########-####-####-####-############
SCW_DEFAULT_PROJECT_ID=########-####-####-####-############

Next, upload your ssh key to the same project as the API key.

Create a new private bucket that is used to define a datastore. For this example, name it o-o-data and choose the Paris region (fr-par).

Finally, create an .ooconfig file that defines an environment to run our commands and a datastore where outputs of these commands are stored:

.ooconfig
project: test-project
environments:
  - name: my-simple-env
    provider: Scaleway
    image: docker.io/debian:stable-slim
    machinetype: STARDUST1-S
    region: fr-par-1
datastores:
  - name: my-datastore
    provider: Scaleway
    bucket: o-o-data
    region: fr-par

You should now have two files, .env and .ooconfig, in your current directory.

We will use a service account for authentication with Google Cloud. In a new or existing project, create a service account. Add "Compute Admin", "Storage Admin" and "Service Account User" roles to the accounts permissions. Once created, select "Manage Keys" from the new service account, then "Add Key" -> "Create new Key" and create a JSON key. Download the generated key file and set the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to its location:

$ export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

Create a new private bucket in the same project as the service account. This bucket will be used to define a datastore. For this example, we will name it o-o-data.

Create an .ooconfig file that defines an environment to run our commands and a datastore where outputs of these commands are stored:

.ooconfig
project: test-project
environments:
  - name: my-simple-env
    provider: gcp
    image: docker.io/debian:stable-slim
    machinetype: e2-highcpu-2
    region: northamerica-northeast1-b
datastores:
  - name: my-datastore
    provider: gcp
    bucket: o-o-data

You should now have an .ooconfig file in your current directory, and the GOOGLE_APPLICATION_CREDENTIALS environment variable set.

It is recommended to configure environments and datastores on the same provider, but not necessary. You are free to configure multiple environments and datastores on all supported providers.

Hello World Example

We are now able to run a simple hello world example in our configured environment:

$ o-o run --message "example run" -- echo "Hello World"
Hello World

And that's it. This is an extremely inefficient hello world, and it will likely take a couple of minutes to complete, but it demonstrates how to easily run commands in a cloud environment.

Multistep Example

The real power of o‑o comes when stringing commands together. So let's try a multistep example to demonstrate connecting outputs to inputs of another step. The following steps create files in the special o://output/ directory:

$ o-o run --message "create hello" -- 'echo "Hello" > o://output/hello.txt'
$ o-o run --message "create world" -- 'echo "World" > o://output/world.txt'

We can see the history of our runs with o-o run --list:

$ o-o run --list
cn7gnwiapo example run
ntus965ryy create hello
cxdbx8am38 create world

Note

cn7gnwiapo, ntus965ryy and cxdbx8am38 are unique identifiers that will be different for your runs. Similar to Git commits, a shortened version of the full 45 character identifier is printed here. While using o‑o commands, you only need enough characters to uniquely identify a run.

o://output/ is the output space for the current step. To use files placed in this directory on subsequent steps as inputs, we reference them with the run id (ntus965ryy and cxdbx8am38). For example, a third step prints out the contents of the output files:

$ o-o run --message "print files" -- \
    'cat o://ntus965ryy/hello.txt && cat o://cxdbx8am38/world.txt'
Hello
World

Again, we can show our run history:

$ o-o run --list
cn7gnwiapo example run
ntus965ryy create hello
cxdbx8am38 create world
7it9dmgncy print files

and get more detailed information of our "print files" step with o-o show:

$ o-o show 7it9dmgncy --inputs
Run 7it9dmgncynuu7j54njgizi5s1ai3b3ajg4auwbw5oggc
Creator: Jon Doe <mail@example.com>
Started: Sat, 1 Feb 10:00:00 2025 -0500
Ended:   Sat, 1 Feb 10:02:00 2025 -0500
Command: cat o://ntus965ryy/hello.txt && cat o://cxdbx8am38/world.txt

    print files

Inputs:
  |\
  o | cxdbx8am38 create world
   /
  o ntus965ryy create hello

Working with Git

Often times you are working within a Git repository and need the source code in the run environment. To demonstrate how to use o‑o with a Git repository, create the following print.py source file:

print.py
from pathlib import Path
import sys

print(Path(sys.argv[1]).read_text())
print(Path(sys.argv[2]).read_text())

and add it to a new Git repository:

$ git init ./
$ git add print.py
$ git commit --message "Add print.py"

Now we need to configure o‑o to use the source code in this repository. This is done by simply adding sourcecode: true to our .ooconfig file. Since this is Python code, we also need to add a new environment (my-python-env) with Python installed:

.ooconfig
project: test-project
sourcecode: true
environments:
  - name: my-simple-env
    provider: Scaleway
    image: docker.io/debian:stable-slim
    machinetype: STARDUST1-S
    region: fr-par-1
  - name: my-python-env
    provider: Scaleway
    image: docker.io/python:3-slim
    machinetype: STARDUST1-S
    region: fr-par-1
datastores:
  - name: my-datastore
    provider: Scaleway
    bucket: o-o-data
    region: fr-par
.ooconfig
project: test-project
sourcecode: true
environments:
  - name: my-simple-env
    provider: gcp
    image: docker.io/debian:stable-slim
    machinetype: e2-highcpu-2
    region: northamerica-northeast1-b
  - name: my-python-env
    provider: gcp
    image: docker.io/python:3-slim
    machinetype: e2-highcpu-2
    region: northamerica-northeast1-b
datastores:
  - name: my-datastore
    provider: gcp
    bucket: o-o-data

Finally, we are all set to run our checked in code with the new Python environment:

$ o-o run --environment my-python-env --message "print files with Python" -- \
    python print.py o://ntus965ryy/hello.txt o://cxdbx8am38/world.txt
Hello
World