Running OneAPI Trainings

Introduction

End Goal

  • Submit OneAPI trainings using the runai-bgu CLI.

Submitting a OneAPI Training Job

  1. Constructing your command

    At the very least, your command should have two steps in it:

  1. Activating your environment.

    As mentioned before, you must have already configured a Conda environment with a given name. If for example, your Conda environment is named torchvision, the first part of your command should be conda activate torchvision

  2. Calling your script.

    As explained in the setup guide, your OneAPI script should run the training process based on a CLI call. If for example, your OneAPI file is named <FILE>, the first part of your command should be <COMMAND> <FILE>

+ These parts, and any part you choose to add in between/anywhere else, should all be joined with a double ampersand (i.e. &&).

Example 1. Explicit resources
$ ssh bhn20 (1)
$ runai-bgu submit oneapi \ (2)
  -n train \ (3)
  -c 2 \ (4)
  -m 4Gi \ (5)
  -g 1 \ (6)
  --conda torchvision \ (7)
  -- "<COMMAND> <FILE>" (8)
1 Configure SSH connection to bhn20
2 Specifies it is a OneAPI.
3 Specifies the name of the job.
4 Allocates 2 CPU cores.
5 Allocates 4GiB of memory*.
6 Specifies the GPU allocation(Whole or fractions). If you do not need GPU, do not use this flag.
7 Tells The Job to use the torchvision Conda environment.
8 The command to run, here <COMMAND>.
The space ( ) between the two dashes (--) and the command is intentional. As are the quotes (") surrounding the command.
When running the command you can change directory to the location of your file (cd ~/path/to/file && <COMMAND> <FILE>), or give the full path (<COMMAND> ~/path/to/file/<FILE>).

You can also use a predefined resource template. Check out the guide for the template CLI Introduction.

Example 2. Using User Templates:
$ runai-bgu submit oneapi \ (1)
  -n train (2)
  --ut train-over-quota-user (3)
  --conda torchvision \ (4)
  -- "<COMMAND> <FILE>" (5)
1 Submit a oneapi workload
2 Specify the job name
3 Use the --ut specify the user template
4 Tells The Job to use the torchvision Conda environment.
5 The command to run, here <COMMAND>.
The space ( ) between the two dashes (--) and the command is intentional. As are the quotes (") surrounding the command.
When running the command you can change directory to the location of your file (cd ~/path/to/file && <COMMAND> <FILE>), or give the full path (<COMMAND> ~/path/to/file/<FILE>).
Example 3. Using Group Templates:
$ runai-bgu submit oneapi \ (1)
  -n train (2)
  --ug train-over-quota-group  (3)
  --conda torchvision \ (4)
  -- "<COMMAND> <FILE>" (5)
1 Submit a oneapi workload
2 Specify the job name
3 Use the --ug specify the group template
4 Tells The Job to use the torchvision Conda environment.
5 The command to run, here <COMMAND> <FILE>.
The space ( ) between the two dashes (--) and the command is intentional. As are the quotes (") surrounding the command.
When running the command you can change directory to the location of your file (cd ~/path/to/file && <COMMAND> <FILE>), or give the full path (<COMMAND> ~/path/to/file/<FILE>).

Submitting Job

The CLI will display messages about the job creation and status.

Example 4. Synopsis
Start job for training
$ ssh bhn20 (1)
$ runai-bgu submit oneapi -n train --ut train-over-quota -- "<COMMAND> <FILE>" (2)
Waiting for the job to be created...

Job train submitted successfully.
You can check the status of the job by running:
        runai describe job train -p myproj
1 Configure SSH connection to bhn20 manual.adoc[runai-bgu]