Resources Management in Run:AI

The BGU HPC cluster utilizes the Run:AI platform to manage computational resources effectively, ensuring researchers can maximize productivity while adhering to fair usage policies. This page explains how resources work in the platform, outlines resource limits, and offers guidance on choosing the right resource configuration for your workloads. It also addresses the concept of over-quota usage.

How Resources Work

The cluster’s resources are allocated and managed using Run:AI, which provides a layer of abstraction over physical compute resources. Key resource types include:

GPU: Graphics Processing Units are the primary computational units for deep learning and other high-performance workloads.
CPU: Central Processing Units handle general-purpose computation and coordination tasks.
Memory: Random Access Memory (RAM) is used to store data required for computations.

When submitting a job, you can specify the amount of each resource type your workload requires. The Run:AI scheduler matches your request to the available resources in the cluster, optimizing for efficiency and fairness.

Understanding Resource Limits

Each researcher is assigned resource quotas to prevent any single user from monopolizing the cluster. Quotas are enforced at the project level and department level and include limits on:

GPU count: The maximum number of GPUs that can be used simultaneously.
CPU cores: The maximum number of CPU cores available for workloads.
Memory: The total memory allocation across all workloads.

Choosing the Right Resource Configuration

Selecting the appropriate resources depends on your workload’s requirements:

Deep Learning Models: These typically benefit from high GPU usage. Request the number of GPUs and GPU memory needed for training or inference.
Data Preprocessing: CPU and memory are often sufficient. Request more CPU cores if the task involves heavy data manipulation.
Mixed Workloads: Balance GPU, CPU, and memory requests based on the workload’s specific computational needs.

Understanding Over-Quota Usage

Run:AI supports over-quota usage when there are unallocated resources in the cluster. Over-quota jobs are queued and run only if resources become available. These jobs have lower priority compared to jobs within quota.

Do not run time-sensitive jobs on over-quota, as your job may be paused by other jobs with higher priority.

Allocation and Utilization

Efficient resource allocation and utilization are critical for maintaining the performance and fairness of the BGU HPC cluster. The goal is to minimize resource allocation while maximizing utilization, ensuring resources are neither wasted nor underused. High alignment between allocation and utilization results in more effective cluster operations.

Why It Matters

Fair Usage: Keeping allocation to a minimum ensures fair access to resources for all researchers.
Cost-Effectiveness: Over-allocated but underutilized resources are wasted, leading to inefficiency and higher operational costs.

Best Practices

Estimate Needs Accurately: Before submitting a workload, evaluate its resource requirements based on past performance or benchmarks.
Iterative Optimization: Start with conservative resource requests, then scale up based on observed needs.
Avoid Over-Allocation: Requesting more resources than needed reduces the cluster’s ability to serve other workloads effectively.