Welcome to PyLO !

PyLo is a PyTorch-based learned optimizer library that enables researchers and practitioners to implement, experiment with, and share learned optimizers. It bridges the gap found in the research of learned optimizers and using it for actual practical scenarios.

Checkout our paper here: arXiv

Note

New to PyLo? Check out our Usage Guide guide and explore complete training examples at pylo_examples.

Key Features

Pre-trained learned optimizers ready for production use
Seamless integration with PyTorch optim library and training loops
Comprehensive benchmarking utilities against standard optimizers
Supports sharing model weights through Hugging Face Hub

Quick Example

import torch
from pylo.optim import VeLO_CUDA

# Initialize a model
model = torch.nn.Linear(10, 2)

# Create a learned optimizer instance
optimizer = VeLO_CUDA(model.parameters())

# Use it like any PyTorch optimizer
for epoch in range(10):
    optimizer.zero_grad()
    loss = loss_fn(model(input), target)
    loss.backward()
    optimizer.step(loss) # pass the loss

More Examples

Looking for complete, runnable examples? Check out the pylo_examples repository which includes:

Image Classification - Training Vision Transformers (ViT) and ResNets on ImageNet and CIFAR-10
Language Modeling - Training GPT-2 models
Distributed Training - Multi-GPU examples with FSDP and DDP

Each example includes detailed setup instructions, training scripts, and configuration files to help you get started quickly.

How to Cite

If you use PyLo in your research, please cite:

@article{pylo,
title={PyLO: Towards Accessible Learned Optimizers in PyTorch},
author={Janson, Paul and Therien, Benjamin and Anthony, Quentin and Huang, Xiaolong and Moudgil, Abhinav and Belilovsky, Eugene},
journal={arXiv preprint arXiv:2506.10315},
year={2025}
}

Welcome to PyLO !

Key Features

Quick Example

More Examples

Documentation

How to Cite