Why Did My Training Job Stop?

Training jobs may stop for various reasons. Below are some common problems and their solutions.

GPU Idle Timeout

GPU idle timeout is a feature that automatically stops a job if it has been idle for a certain period of time. This is to prevent unnecessary resource usage when the job is not actively processing data.

This may occur during long data processing or extended data loading operations.

Job was Manually Stopped

If you or someone else manually stopped the job.

The job can be stopped by the user, department head or system administrator. If you believe this was done by the deparment head or system administrator, please contact them for more information.