Welcome to PyLO !
PyLo is a PyTorch-based learned optimizer library that enables researchers and practitioners to implement, experiment with, and share learned optimizers. It bridges the gap found in the research of learned optimizers and using it for actual practical scenarios.
Checkout our paper here: arXiv
Note
New to PyLo? Check out our Usage Guide guide and explore complete training examples at pylo_examples.
Key Features
Pre-trained learned optimizers ready for production use
Seamless integration with PyTorch optim library and training loops
Comprehensive benchmarking utilities against standard optimizers
Supports sharing model weights through Hugging Face Hub
Quick Example
import torch
from pylo.optim import VeLO_CUDA
# Initialize a model
model = torch.nn.Linear(10, 2)
# Create a learned optimizer instance
optimizer = VeLO_CUDA(model.parameters())
# Use it like any PyTorch optimizer
for epoch in range(10):
optimizer.zero_grad()
loss = loss_fn(model(input), target)
loss.backward()
optimizer.step(loss) # pass the loss
More Examples
Looking for complete, runnable examples? Check out the pylo_examples repository which includes:
Image Classification - Training Vision Transformers (ViT) and ResNets on ImageNet and CIFAR-10
Language Modeling - Training GPT-2 models
Distributed Training - Multi-GPU examples with FSDP and DDP
Each example includes detailed setup instructions, training scripts, and configuration files to help you get started quickly.
Documentation
Getting Started:
API Reference:
Development:
How to Cite
If you use PyLo in your research, please cite:
@article{pylo,
title={PyLO: Towards Accessible Learned Optimizers in PyTorch},
author={Janson, Paul and Therien, Benjamin and Anthony, Quentin and Huang, Xiaolong and Moudgil, Abhinav and Belilovsky, Eugene},
journal={arXiv preprint arXiv:2506.10315},
year={2025}
}