Initialization

Overview

The initialize_distributed_training() function sets up PyTorch’s distributed training infrastructure automatically. It configures the NCCL backend, assigns GPUs to processes, and synchronizes all workers.

Function Signature

def initialize_distributed_training() -> None

Basic Usage

Call this function at the start of your training script, before creating models or loading data:

import sf_tensor as sft

if __name__ == "__main__":
    # Initialize distributed training
    sft.initialize_distributed_training()

    # Rest of your training code...

How It Works

The initialization process follows these steps (you don’t need to worry about these when using the function):

Load Cluster Configuration: Reads cluster specifications from environment variables
Check GPU Count: Skips initialization if running on a single GPU (no distribution needed)
Initialize Process Group: Sets up NCCL backend for GPU communication
Assign Devices: Binds each process to its designated GPU using LOCAL_RANK
Synchronize: Creates a barrier to ensure all processes are ready

Complete Example

import os
import torch
import torch.nn as nn
import sf_tensor as sft

if __name__ == "__main__":
    # Get local rank for this process
    local_rank = int(os.environ.get("LOCAL_RANK", 0))
    torch.cuda.set_device(local_rank)

    # Initialize distributed training
    sft.initialize_distributed_training()

    # Get device for this process
    device = sft.get_device()

    # Create model
    model = nn.Sequential(
        nn.Linear(784, 256),
        nn.ReLU(),
        nn.Linear(256, 10)
    ).to(device)

    # Log initialization (only prints on rank 0)
    sft.log(f"Initialized on {torch.cuda.device_count()} GPUs")

    # Your training loop here...

Checking Initialization Status

You can check if distributed training was initialized using PyTorch’s built-in function:

import torch.distributed as dist

if dist.is_initialized():
    print(f"Running on rank {dist.get_rank()} of {dist.get_world_size()}")
else:
    print("Running in single-GPU mode")

Best Practices

Initialize Early

Call initialize_distributed_training() as early as possible in your script, before creating models or loading data. This ensures proper GPU assignment.

Overview

SF Tensor Library

Distributed Training

Overview

Function Signature

Basic Usage

How It Works

Complete Example

Checking Initialization Status

Best Practices

Overview

SF Tensor Library

Distributed Training

​Overview

​Function Signature

​Basic Usage

​How It Works

​Complete Example

​Checking Initialization Status

​Best Practices

Overview

Function Signature

Basic Usage

How It Works

Complete Example

Checking Initialization Status

Best Practices