Pre-Training and Fine-Tuning BERT for the IPU
Version: latest
1. Introduction
2. Pre-training BERT on the IPU-POD
3. Scaling BERT to the IPU-POD
3.1. Collective gradient reduction
3.2. Training BERT with large batches
3.2.1. Linear scaling rule
3.2.2. Gradual warmup strategy
3.2.3. AdamW optimizer
3.2.4. LAMB optimizer
3.2.5. Low-precision training
4. Training results
4.1. Pre-training accuracy
4.2. Fine-tuning accuracy
4.2.1. SQuAD v1.1
4.2.2. CLUE
5. Trademarks & copyright
Pre-Training and Fine-Tuning BERT for the IPU
»
Pre-Training and Fine-Tuning BERT for the IPU
Edit on GitHub
Pre-Training and Fine-Tuning BERT for the IPU
ΒΆ
1. Introduction
2. Pre-training BERT on the IPU-POD
3. Scaling BERT to the IPU-POD
3.1. Collective gradient reduction
3.2. Training BERT with large batches
3.2.1. Linear scaling rule
3.2.2. Gradual warmup strategy
3.2.3. AdamW optimizer
3.2.4. LAMB optimizer
3.2.5. Low-precision training
4. Training results
4.1. Pre-training accuracy
4.2. Fine-tuning accuracy
4.2.1. SQuAD v1.1
4.2.2. CLUE
5. Trademarks & copyright
Read the Docs
v: latest
Versions
latest
2.4.0
2.3.0
2.2.0
2.1.0
1.0.0
Downloads
pdf
html
On Read the Docs
Project Home
Builds