Welcome to quickmt-train
Experimenting with training Neural Machine Translation (NMT) models from scratch using PyTorch.
Note
This project is a work in progress and is intended for experimentation and learning.
Key Features
🚀 Performance & Optimization
torch.compile: for faster training- Mixed Precision (AMP): Uses
torch.autocastwithbfloat16orfloat16for faster training and reduced memory usage - Gradient Accumulation & Clipping: Support for large effective batch sizes and stable training via gradient norm scaling
📊 Data Processing
- Streaming Dataset:
IterableDatasetimplementation for handling datasets larger than RAM - Token-Based Batching: Dynamic batching with bucket sorting to minimize padding and maximize throughput
- SentencePiece Tokenization: Integrated support for training and on-the-fly SentencePiece (unigram/BPE) tokenization
- Multi-worker Sharding: Efficient data loading with automatic sharding across multiple CPU workers
- Multi-dataset training: Train on multiple datasets at once starting/stopping at specific steps
📈 Evaluation & Monitoring
- Real-time Logging: Tracking of Loss, Perplexity (PPL), Token Accuracy etc.
- Translation Quality: In-training evaluation using BLEU and ChrF scores via
sacrebleu - Aim Tracking: Integration with
aimfor experiment tracking and visualization - Hyperparameter Optimization: Integration with
optunafor hyperparameter optimization
🛠️ Inference & Deployment
- Model Averaging: Tool for stochastic weight averaging of multiple checkpoints to improve generalization
- CTranslate2 Export: Script to convert PyTorch models to CTranslate2 format for production deployment
- quickmt compatible: Models can be used with the quickmt library for inference