Skip to main content

Overview

The initialize_distributed_training() function sets up PyTorch’s distributed training infrastructure automatically. It configures the NCCL backend, assigns GPUs to processes, and synchronizes all workers.

Function Signature

def initialize_distributed_training() -> None

Basic Usage

Call this function at the start of your training script, before creating models or loading data:
import sf_tensor as sft

if __name__ == "__main__":
    # Initialize distributed training
    sft.initialize_distributed_training()

    # Rest of your training code...

How It Works

The initialization process follows these steps (you don’t need to worry about these when using the function):
  1. Load Cluster Configuration: Reads cluster specifications from environment variables
  2. Check GPU Count: Skips initialization if running on a single GPU (no distribution needed)
  3. Initialize Process Group: Sets up NCCL backend for GPU communication
  4. Assign Devices: Binds each process to its designated GPU using LOCAL_RANK
  5. Synchronize: Creates a barrier to ensure all processes are ready

Complete Example

import os
import torch
import torch.nn as nn
import sf_tensor as sft

if __name__ == "__main__":
    # Get local rank for this process
    local_rank = int(os.environ.get("LOCAL_RANK", 0))
    torch.cuda.set_device(local_rank)

    # Initialize distributed training
    sft.initialize_distributed_training()

    # Get device for this process
    device = sft.get_device()

    # Create model
    model = nn.Sequential(
        nn.Linear(784, 256),
        nn.ReLU(),
        nn.Linear(256, 10)
    ).to(device)

    # Log initialization (only prints on rank 0)
    sft.log(f"Initialized on {torch.cuda.device_count()} GPUs")

    # Your training loop here...

Checking Initialization Status

You can check if distributed training was initialized using PyTorch’s built-in function:
import torch.distributed as dist

if dist.is_initialized():
    print(f"Running on rank {dist.get_rank()} of {dist.get_world_size()}")
else:
    print("Running in single-GPU mode")

Best Practices

Call initialize_distributed_training() as early as possible in your script, before creating models or loading data. This ensures proper GPU assignment.