Features
The library currently contains logging functions that automatically persist logs to the cloud and allow you to see them on the web interface of the Tensor Cloud. Furthermore, we provide the following convenience functions that act as thin wrappers around PyTorch:- Automatic Distributed Setup: Initializing multi-GPU training
- A decorator to ensure data downloading only happens once per node, eliminating race conditions
Installation
Install the SF Tensor library using pip:Requirements
- Python >= 3.8
- PyTorch with CUDA support
- NVIDIA GPUs with NCCL support
Quick Start
Here’s a minimal example to get started with distributed training:Architecture
The SF Tensor library consists of two main modules:-
Training Module
- Distributed training initialization
- Device management
-
Persistence Module
- Logging utilities
- Data download decorator