Overview
In distributed training, each GPU runs a separate process. Without proper handling, logging statements would print multiple times (once per process), cluttering your output and making it difficult to track progress.
SF Tensor provides rank-aware logging utilities that automatically ensure messages are only printed once, from the rank 0 process, and persisted to durable storage you can access later and from the web interface.
Available Functions
log()
Log general messages during training.
def log(string: str) -> None
Parameters:
string: The message to log
Example:
import sf_tensor as sft
sft.log("Starting training...")
sft.log(f"Epoch 1 completed with loss: {loss:.4f}")
sft.log("Training finished!")
logAccuracy()
Log accuracy metrics during training.
def logAccuracy(accuracy: Union[int, float]) -> None
Parameters:
accuracy: Classification accuracy value (int or float)
Example:
import sf_tensor as sft
# Log accuracy after validation
val_accuracy = 0.8542
sft.logAccuracy(val_accuracy)
Both functions automatically check if they’re running on rank 0. Only rank 0 will print the message. Other ranks will silently skip the logging operation.
Basic Usage
import torch
import sf_tensor as sft
if __name__ == "__main__":
# Initialize distributed training
sft.initialize_distributed_training()
# This message prints only once (from rank 0)
sft.log("Training initialized")
# Create model
model = YourModel()
device = sft.get_device()
model = model.to(device)
# Training loop
for epoch in range(10):
train_loss = train_one_epoch(model, train_loader)
# Only prints once, even with multiple GPUs
sft.log(f"Epoch {epoch + 1}: train_loss={train_loss:.4f}")
# Validation
val_acc = validate(model, val_loader)
sft.logAccuracy(val_acc)
sft.log("Training complete!")
Output:
Training initialized
Epoch 1: train_loss=2.3054
Epoch 2: train_loss=1.8923
Epoch 3: train_loss=1.5432
...
Training complete!
Structured Logging
import sf_tensor as sft
import json
def log_metrics(metrics: dict, prefix: str = ""):
"""Log dictionary of metrics in structured format"""
sft.log(f"{prefix}{json.dumps(metrics, indent=2)}")
# Usage
metrics = {
"epoch": 5,
"train_loss": 1.234,
"val_loss": 1.456,
"train_acc": 0.876,
"val_acc": 0.854
}
log_metrics(metrics, prefix="Metrics: ")