Deepspeed

What's New:

May 2023 Released models for Scaling Speech Technology to 1,000+ Languages (Pratap, et al., 2023)
June 2022 Released code for wav2vec-U 2.0 from Towards End-to-end Unsupervised Speech Recognition (Liu, et al., 2022)
May 2022 Integration with xFormers
December 2021 Released Direct speech-to-speech translation code
October 2021 Released VideoCLIP and VLM models
October 2021 Released multilingual finetuned XLSR-53 model
September 2021 master branch renamed to main.
July 2021 Released DrNMT code
July 2021 Released Robust wav2vec 2.0 model
June 2021 Released XLMR-XL and XLMR-XXL models
May 2021 Released Unsupervised Speech Recognition code
March 2021 Added full parameter and optimizer state sharding + CPU offloading
February 2021 Added LASER training code
December 2020: Added Adaptive Attention Span code
December 2020: GottBERT model and code released
November 2020: Adopted the Hydra configuration framework

see documentation explaining how to use it for new and existing projects

November 2020: fairseq 0.10.0 released
October 2020: Added R3F/R4F (Better Fine-Tuning) code
October 2020: Deep Transformer with Latent Depth code released
October 2020: Added CRISS models and code

Previous updates

Features:

multi-GPU training on one machine or across multiple machines (data and model parallel)
fast generation on both CPU and GPU with multiple search algorithms implemented:

beam search
Diverse Beam Search (Vijayakumar et al., 2016)
sampling (unconstrained, top-k and top-p/nucleus)
lexically constrained decoding (Post & Vilar, 2018)

gradient accumulation enables training with large mini-batches even on a single GPU
mixed precision training (trains faster with less GPU memory on NVIDIA tensor cores)
extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
flexible configuration based on Hydra allowing a combination of code, command-line and file based configuration
full parameter and optimizer state sharding
offloading parameters to CPU

We also provide pre-trained models for translation and language modeling with a convenient torch.hub interface:

en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')en2de.translate('Hello world', beam=5)# 'Hallo Welt'

See the PyTorch Hub tutorials for translation and RoBERTa for more examples.

Requirements and Installation

PyTorch version >= 1.10.0
Python version >= 3.8
For training new models, you'll also need an NVIDIA GPU and NCCL
To install fairseq and develop locally:

git clone https://github.com/pytorch/fairseqcd fairseq
pip install --editable ./# on MacOS:# CFLAGS="-stdlib=libc++" pip install --editable ./# to install the latest stable release (0.10.x)# pip install fairseq

For faster training install NVIDIA's apex library:

git clone https://github.com/NVIDIA/apexcd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

For large datasets install PyArrow: pip install pyarrow
If you use Docker make sure to increase the shared memory size either with --ipc=host or --shm-size as command line options to nvidia-docker run .

Getting Started

The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks.

Pre-trained models and examples

We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below, as well as example training and evaluation commands.

Translation: convolutional and transformer models are available
Language Modeling: convolutional and transformer models are available

We also have more detailed READMEs to reproduce results from specific papers:

Join the fairseq community

Twitter: https://twitter.com/fairseq
Facebook page: https://www.facebook.com/groups/fairseq.users
Google group: https://groups.google.com/forum/#!forum/fairseq-users

License

fairseq(-py) is MIT-licensed. The license applies to the pre-trained models as well.

Citation

Please cite as:

@inproceedings{ott2019fairseq,  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},  year = {2019},
}