Logo
IPU-POD128 Reference Design: Build and Test Guide
latest
  • 1. Overview
    • 1.1. Acronyms and abbreviations
  • 2. IPU‑POD128 design components
    • 2.1. IPU‑POD64 components
    • 2.2. IPU-M2000s
      • 2.2.1. Overview
      • 2.2.2. QR code label
      • 2.2.3. LED indicators
        • Rear side LEDs
        • Front side LEDs
    • 2.3. Server
    • 2.4. Switches
      • 2.4.1. 100GE RoCE/RDMA switch (ToR switch)
      • 2.4.2. 1GE management switch
    • 2.5. Power distribution units
    • 2.6. Rack
    • 2.7. Supplementary mounting components
    • 2.8. Cables
      • 2.8.1. RJ45 cables
      • 2.8.2. OSFP cables
      • 2.8.3. QSFP cables
    • 2.9. Connecting cables between IPU‑POD64 logical racks
  • 3. IPU-POD64 rack assembly
    • 3.1. Equipment checklist
    • 3.2. Document reproduction
    • 3.3. Required tools
    • 3.4. Preparing the rack
      • 3.4.1. Rail distance
      • 3.4.2. Unpacking the rack
      • 3.4.3. Removing the side panels and doors
      • 3.4.4. Removing the vertical accessory channels
      • 3.4.5. Adjusting the rear accessory channels
      • 3.4.6. Adjusting the rear vertical rails
      • 3.4.7. Adjusting the front vertical rails
      • 3.4.8. Installing the rack rails
      • 3.4.9. Installing PDU brackets
    • 3.5. Installing the equipment
      • 3.5.1. Installing the IPU-M2000s
      • 3.5.2. Installing the management switch
      • 3.5.3. Installing the ToR switch
      • 3.5.4. Installing the PDUs
      • 3.5.5. Installing the Dell R6525 server(s)
    • 3.6. Cabling the rack
      • 3.6.1. IPU-M2000 to IPU-M2000 IPU-Link connectivity (OSFP)
      • 3.6.2. IPU-M2000 to IPU-M2000 Sync-Link cabling
      • 3.6.3. IPU-M2000 to management switch cabling (RJ45)
      • 3.6.4. Management switch: BMC cabling
      • 3.6.5. Management switch: BMC + IPU-Gateway cabling
      • 3.6.6. IPU-M2000 to ToR switch cabling (QSFP)
      • 3.6.7. Dell R6525 server(s) cabling
      • 3.6.8. ToR switch to Dell server(s)
      • 3.6.9. Management switch to Dell server(s): iDRAC
      • 3.6.10. Management switch to Dell server(s): network connector
      • 3.6.11. Management switch to Dell server(s): switch management
      • 3.6.12. Management switch to PDUs
    • 3.7. Power cabling
      • 3.7.1. IPU-M2000 power cabling
      • 3.7.2. Server power cabling: Dell R6525
      • 3.7.3. Switch power cabling
    • 3.8. Completing the rack
      • 3.8.1. Blanking panels
      • 3.8.2. Front and rear doors
      • 3.8.3. Side panels
      • 3.8.4. PDU plugs
      • 3.8.5. Packaging
  • 4. IPU‑POD64 server and switch configuration
    • 4.1. Server configuration
      • 4.1.1. Hardware recommendations
      • 4.1.2. Storage configuration recommendations
      • 4.1.3. Memory configuration recommendations
      • 4.1.4. BIOS configuration
      • 4.1.5. Operating system installation
        • Ubuntu 18.04 packages
        • Ubuntu 20.04 packages
        • CentOS 7.6 packages
        • Python packages
      • 4.1.6. User accounts and groups
      • 4.1.7. DHCP Service (Dynamic Host Configuration Protocol)
        • DHCP file templates
      • 4.1.8. Rsyslog service
        • Rsyslog file templates
      • 4.1.9. NTP service (Network Time Protocol)
        • NTP file structure
      • 4.1.10. Other configuration files and folders
      • 4.1.11. User application memory usage
    • 4.2. Network configuration
      • 4.2.1. Overview
      • 4.2.2. IPU‑POD64 network interfaces
      • 4.2.3. Management switch configuration
      • 4.2.4. ToR switch configuration
      • 4.2.5. IPU‑POD64 VLAN assignments
      • 4.2.6. Server network configuration
        • Example Netplan configuration file
  • 5. IPU‑POD64 software installation and configuration
    • 5.1. Management server
    • 5.2. V-IPU software installation and configuration
    • 5.3. IPU-M2000 software installation and configuration
      • 5.3.1. Download IPU-M2000 software update bundle
      • 5.3.2. Software update of all IPU-M2000s
      • 5.3.3. IPU-M2000 IPU-Gateway root file system config files
    • 5.4. Rack tool
  • 6. IPU‑POD64 manual installation tests
    • 6.1. Running system tests
    • 6.2. Troubleshooting
      • 6.2.1. BMC BISTs
      • 6.2.2. V-IPU built in self tests
        • IPU-Link cabling test
        • Sync-Link test
        • IPU-Link training test
        • IPU-Link traffic test
  • 7. IPU-POD128 installation
  • 8. IPU‑POD128 network configuration
    • 8.1. Overview
    • 8.2. Useful resources
    • 8.3. IP addressing
    • 8.4. Merging IPU‑POD64 racks to create IPU‑POD128
      • 8.4.1. Networking pre-requisites
      • 8.4.2. Phase 1: Edit configuration files
        • Step 1: For lrack1 and lrack2
        • Step 2: DHCP config files for lrack1 and lrack2
        • Step 3: Edit DHCP config files for lrack1
        • Step 4: Netplan setup
        • Step 5: Update vlan-11.conf files
      • 8.4.3. Phase 2: Activate new configuration
        • Step 6: Inform users of down time
        • Step 7: lrack2 DHCP server
        • Step 8: lrack1 DHCP server
        • Step 9: Restart IPU-M2000s on lrack1
        • Step 10: Netplan configuration on lrack1
        • Step 11: Update rack_config files on lrack1
        • Step 12: Restart IPU-M2000s on lrack2
        • Step 13: Verify IPU-M2000 interface access
        • Step 14: Create V-IPU cluster on lrack1
        • Step 15: Test access to V-IPU agents
        • Step 16: Create partitions on lrack1
        • Step 17: lrack1 RNIC addresses
        • Step 18: lrack2 RNIC addresses
        • Step 19: rsyslog.d
        • Step 20: chrony.conf
        • Step 21: Refresh overlay files on lrack1
        • Step 22: Check IPU-M2000s logging to lrack1
        • Step 23: Check IPU-M2000s have NTP date and time
        • Step 24: Run ML application
    • 8.5. IPU-M2000 setup files
      • 8.5.1. Syslog and chrony on the IPU-Gateway
      • 8.5.2. Syslog on BMC
    • 8.6. DHCP files
      • 8.6.1. Lrack1 and lrack2: /etc/dhcp/dhcpd.conf
      • 8.6.2. Lrack1: /etc/dhcp/dhcpd.d/ipum-dhcp.conf
      • 8.6.3. Lrack2: /etc/dhcp/dhcpd.d/ipum-dhcp.conf
      • 8.6.4. Lrack1 and lrack2: /etc/dhcp/dhcpd.d/ files
      • 8.6.5. Lrack1: /etc/dhcp/dhcpd.d/lrack1 files
        • ipum-bmc.conf
        • ipum-gw.conf
        • ipum-rnic.conf
      • 8.6.6. Lrack1: /etc/dhcp/dhcpd.d/lrack2 files
        • ipum-bmc.conf
        • ipum-gw.conf
      • 8.6.7. Lrack2: /etc/dhcp/dhcpd.d/lrack2 files
        • ipum-rnic.conf
    • 8.7. /etc/netplan files
      • 8.7.1. 1GbE management interface on lrack1 server
      • 8.7.2. RNIC interfaces on the servers
      • 8.7.3. Lrack1: rack_config.json file
  • 9. System integration testing
    • 9.1. Cluster tests
  • 10. Revision history
  • 11. Safety and compliance
  • 12. Trademarks & copyright
IPU-POD128 Reference Design: Build and Test Guide

Search help

Note: Searching from the top-level index page will search all documents. Searching from a specific document will search only that document.

  • Find an exact phrase: Wrap your search phrase in "" (double quotes) to only get results where the phrase is exactly matched. For example "PyTorch for the IPU" or "replicated tensor sharding"
  • Prefix query: Add an * (asterisk) at the end of any word to indicate a prefix query. This will return results containing all words with the specific prefix. For example tensor*
  • Fuzzy search: Use ~N (tilde followed by a number) at the end of any word for a fuzzy search. This will return results that are similar to the search word. N specifies the “edit distance” (fuzziness) of the match. For example Polibs~1
  • Words close to each other: ~N (tilde followed by a number) after a phrase (in quotes) returns results where the words are close to each other. N is the maximum number of positions allowed between matching words. For example "ipu version"~2
  • Logical operators. You can use the following logical operators in a search:
    • + signifies AND operation
    • | signifies OR operation
    • - negates a single word or phrase (returns results without that word or phrase)
    • () controls operator precedence


Revision 1963faad.