Running Python Trainings

Introduction

Prerequisites

End Goal

  • Submit Python trainings using the runai-bgu CLI.

Submitting a Python Training Job

  1. Constructing your command

    At the very least, your command should have two steps in it:

  1. Activating your environment.

    As mentioned before, you must have already configured a Conda environment with a given name. If for example, your Conda environment is named torchvision, the first part of your command should be conda activate torchvision

  2. Calling your script.

    As explained in the setup guide, your Python script should run the training process based on a CLI call. If for example, your Python file is named main.py, the first part of your command should be python main.py

+ These parts, and any part you choose to add in between/anywhere else, should all be joined with a double ampersand (i.e. &&).

Example 1. Explicit resources
$ ssh bhn20 (1)
$ runai-bgu submit python \ (2)
  -n conv-2-64 \ (3)
  -c 2 \ (4)
  -m 4Gi \ (5)
  -g 1 \ (6)
  --conda torchvision \ (7)
  -- "python main.py" (8)
1 Configure SSH connection to bhn20
2 Specifies it is a Python.
3 Specifies the name of the job.
4 Allocates 2 CPU cores.
5 Allocates 4GiB of memory*.
6 Specifies the GPU allocation(Whole or fractions). If you do not need GPU, do not use this flag.
7 Tells The Job to use the torchvision Conda environment.
8 The command to run, here python.
The space ( ) between the two dashes (--) and the command is intentional. As are the quotes (") surrounding the command.
When running the command you can change directory to the location of your python file (cd ~/path/to/file && python main.py), or give the full path (python ~/path/to/file/main.py).

You can also use a predefined resource template. Check out the guide for the template CLI Introduction.

Example 2. Using User Templates:
$ runai-bgu submit python \ (1)
  -n conv-2-64 (2)
  --ut train-over-quota-user (3)
  --conda torchvision \ (4)
  -- "python main.py" (5)
1 Submit a python workload
2 Specify the job name
3 Use the --ut specify the user template
4 Tells The Job to use the torchvision Conda environment.
5 The command to run, here python.
The space ( ) between the two dashes (--) and the command is intentional. As are the quotes (") surrounding the command.
When running the command you can change directory to the location of your python file (cd ~/path/to/file && python main.py), or give the full path (python ~/path/to/file/main.py).
Example 3. Using Group Templates:
$ runai-bgu submit python \ (1)
  -n conv-2-64 (2)
  --ug train-over-quota-group  (3)
  --conda torchvision \ (4)
  -- "python main.py" (5)
1 Submit a python workload
2 Specify the job name
3 Use the --ug specify the group template
4 Tells The Job to use the torchvision Conda environment.
5 The command to run, here python main.py.
The space ( ) between the two dashes (--) and the command is intentional. As are the quotes (") surrounding the command.
When running the command you can change directory to the location of your python file (cd ~/path/to/file && python main.py), or give the full path (python ~/path/to/file/main.py).

Submitting Job

The CLI will display messages about the job creation and status.

Example 4. Synopsis
Start job for training
$ ssh bhn20 (1)
$ runai-bgu submit python -n conv-2-64 --ut train-over-quota -- "python main.py" (2)
Waiting for the job to be created...

Job conv-2-64 submitted successfully.
You can check the status of the job by running:
        runai describe job conv-2-64 -p myproj
1 Configure SSH connection to bhn20 manual.adoc[runai-bgu]