Logo
Pre-Training and Fine-Tuning BERT for the IPU
Version: latest
  • 1. Introduction
  • 2. Pre-training BERT on the IPU-POD
  • 3. Scaling BERT to the IPU-POD
    • 3.1. Collective gradient reduction
    • 3.2. Training BERT with large batches
      • 3.2.1. Linear scaling rule
      • 3.2.2. Gradual warmup strategy
      • 3.2.3. AdamW optimizer
      • 3.2.4. LAMB optimizer
      • 3.2.5. Low-precision training
  • 4. Training results
    • 4.1. Pre-training accuracy
    • 4.2. Fine-tuning accuracy
      • 4.2.1. SQuAD v1.1
      • 4.2.2. CLUE
  • 5. Trademarks & copyright
Pre-Training and Fine-Tuning BERT for the IPU
  • »
  • Pre-Training and Fine-Tuning BERT for the IPU
  • Edit on GitHub

Pre-Training and Fine-Tuning BERT for the IPUΒΆ

  • 1. Introduction
  • 2. Pre-training BERT on the IPU-POD
  • 3. Scaling BERT to the IPU-POD
    • 3.1. Collective gradient reduction
    • 3.2. Training BERT with large batches
      • 3.2.1. Linear scaling rule
      • 3.2.2. Gradual warmup strategy
      • 3.2.3. AdamW optimizer
      • 3.2.4. LAMB optimizer
      • 3.2.5. Low-precision training
  • 4. Training results
    • 4.1. Pre-training accuracy
    • 4.2. Fine-tuning accuracy
      • 4.2.1. SQuAD v1.1
      • 4.2.2. CLUE
  • 5. Trademarks & copyright
Next

Revision 80a48955.

Read the Docs v: latest
Versions
latest
2.4.0
2.3.0
2.2.0
2.1.0
1.0.0
Downloads
pdf
html
On Read the Docs
Project Home
Builds