Logo
Model Parallelism on the IPU with TensorFlow: Sharding and Pipelining
Version: latest
  • 1. Model parallelism
  • 2. Sharding
    • 2.1. Graph sharding
      • 2.1.1. API
      • 2.1.2. Code example
    • 2.2. Limitations of sharding
  • 3. Pipelining
    • 3.1. Overview
    • 3.2. Pipeline operation
    • 3.3. Pipelining API
      • 3.3.1. Inputs and outputs
      • 3.3.2. Device mapping
      • 3.3.3. Pipeline scheduling
        • Memory use
        • Execution time
        • Ramp-up and ramp-down time
        • Inter-IPU optimisations
      • 3.3.4. Keras API in TensorFlow 2
    • 3.4. Code examples
      • 3.4.1. Inference code examples
      • 3.4.2. Training code examples
    • 3.5. Optimising the pipeline
      • 3.5.1. Recomputation
      • 3.5.2. Variable offloading
      • 3.5.3. Device selection order
      • 3.5.4. Data parallelism
      • 3.5.5. Increase the gradient accumulation count
      • 3.5.6. Profiling
  • 4. PopVision™ Graph Analyser tool
  • 5. Trademarks & copyright
Model Parallelism on the IPU with TensorFlow: Sharding and Pipelining

Search help

Note: Searching from the top-level index page will search all documents. Searching from a specific document will search only that document.

  • Find an exact phrase: Wrap your search phrase in "" (double quotes) to only get results where the phrase is exactly matched. For example "PyTorch for the IPU" or "replicated tensor sharding"
  • Prefix query: Add an * (asterisk) at the end of any word to indicate a prefix query. This will return results containing all words with the specific prefix. For example tensor*
  • Fuzzy search: Use ~N (tilde followed by a number) at the end of any word for a fuzzy search. This will return results that are similar to the search word. N specifies the “edit distance” (fuzziness) of the match. For example Polibs~1
  • Words close to each other: ~N (tilde followed by a number) after a phrase (in quotes) returns results where the words are close to each other. N is the maximum number of positions allowed between matching words. For example "ipu version"~2
  • Logical operators. You can use the following logical operators in a search:
    • + signifies AND operation
    • | signifies OR operation
    • - negates a single word or phrase (returns results without that word or phrase)
    • () controls operator precedence

Model Parallelism on the IPU with TensorFlow: Sharding and Pipelining

This technical note describes how to parallelise TensorFlow models on IPU hardware.

Contents

  • 1. Model parallelism
  • 2. Sharding
    • 2.1. Graph sharding
      • 2.1.1. API
      • 2.1.2. Code example
    • 2.2. Limitations of sharding
  • 3. Pipelining
    • 3.1. Overview
    • 3.2. Pipeline operation
    • 3.3. Pipelining API
      • 3.3.1. Inputs and outputs
      • 3.3.2. Device mapping
      • 3.3.3. Pipeline scheduling
        • Memory use
        • Execution time
        • Ramp-up and ramp-down time
        • Inter-IPU optimisations
      • 3.3.4. Keras API in TensorFlow 2
    • 3.4. Code examples
      • 3.4.1. Inference code examples
      • 3.4.2. Training code examples
    • 3.5. Optimising the pipeline
      • 3.5.1. Recomputation
      • 3.5.2. Variable offloading
      • 3.5.3. Device selection order
      • 3.5.4. Data parallelism
      • 3.5.5. Increase the gradient accumulation count
      • 3.5.6. Profiling
  • 4. PopVision™ Graph Analyser tool
  • 5. Trademarks & copyright
Next

Revision 68c08659.