runai-bgu logs Manual

Introduction

runai-bgu logs is a command-line interface (CLI) for viewing logs of workloads on the BGU HPC cluster. This command provides access to the output and error logs from both workspace and training workloads. The command automatically detects the workload type and retrieves logs with various filtering and formatting options including real-time following, time-based filtering, and container-specific log extraction.

This manual explains how to use runai-bgu logs to monitor and troubleshoot your workloads.

Quick Start

To view logs for a workload, use:

$ runai-bgu logs my-workload

Shows all available logs for the specified workload.

Basic Usage

View Workload Logs

Get logs for a specific workload:

$ runai-bgu logs research-job

Displays all logs from the workload named research-job.

Follow Logs in Real-time

Monitor logs as they are generated:

$ runai-bgu logs research-job --follow

Continuously streams new log entries as they appear, useful for monitoring running workloads.

View Recent Logs

Show only the last few lines of logs:

$ runai-bgu logs research-job --tail=100

Displays the last 100 lines of logs, helpful for quickly checking recent activity.

Show Timestamps

Include timestamps with log entries:

$ runai-bgu logs research-job --timestamps

Adds timestamp information to each log line for better debugging.

Advanced Filtering

Time-based Filtering

Relative Time

Show logs from the last specific duration:

$ runai-bgu logs research-job --since=5m

Shows logs from the last 5 minutes. You can use s for seconds, m for minutes, or h for hours.

Absolute Time

Show logs since a specific timestamp:

$ runai-bgu logs research-job --since-time=2023-05-30T10:00:00Z

Shows logs generated after the specified RFC3339 timestamp.

Container and Pod Selection

Specific Container

View logs from a specific container:

$ runai-bgu logs research-job --container=worker-container

Useful when your workload has multiple containers and you want logs from a specific one.

Specific Pod

View logs from a specific pod:

$ runai-bgu logs research-job --pod=research-job-worker-0

Helpful for multi-pod workloads where you need logs from a particular instance.

Size and Data Limits

Limit Bytes

Control the amount of log data returned:

$ runai-bgu logs research-job --limit-bytes=1024

Limits the log output to 1024 bytes, useful for large log files.

Previous Instance

View logs from a previous run of the workload:

$ runai-bgu logs research-job --previous

Shows logs from the previous instance if the workload has been restarted.

Timeout and Waiting

Set timeout for log availability:

$ runai-bgu logs research-job --wait-timeout=30s

Waits up to 30 seconds for the workload to be ready for log streaming.

Common Use Cases

Debugging Failed Jobs

Check why a workload failed:

$ runai-bgu logs failed-job --tail=50 --timestamps

Monitoring Training Progress

Follow training logs in real-time:

$ runai-bgu logs training-job --follow --since=1h

Quick Status Check

View recent activity:

$ runai-bgu logs my-workspace --tail=20

Troubleshooting Startup Issues

Check logs from the beginning with timestamps:

$ runai-bgu logs new-job --timestamps

Understanding Log Output

The logs command displays output from your workload containers:

Application Output

Standard output from your running applications and scripts.

Error Messages

Error logs and stack traces when issues occur.

System Messages

Container and system-level messages about resource allocation and status.

Timestamps

When enabled, shows when each log entry was generated.

Tips for Effective Log Usage

Real-time Monitoring

Use --follow to monitor active workloads and catch issues as they happen.

Time Filtering

Use --since to focus on recent activity and avoid overwhelming output.

Container Isolation

Use --container when debugging specific components in multi-container workloads.

Limit Output

Use --tail and --limit-bytes to manage large log volumes effectively.