rack_tool(1)

Date

2023-10-30

Name

rack_tool - tool for doing operations related to one or more IPU-Machines in a rack

Synopsis

rack_tool [––version] [––help] <command> [<args>] [––ipum name bmc_ip gw_ip rnic_ip bmc_username bmc_password gw_username gw_password] [––config-file config_file] [––global-credentials bmc_username bmc_password gw_username gw_password] [––log-dir log_directory] [––select name]
rack_tool bist [––help]
rack_tool bmc-factory-reset [––help] [––reboot]
rack_tool clean-sel-logs [––help]
rack_tool create-user-via-ipmi [––help] [––new-user] [––new-password] [––user-id] [––priv] [––user] [––password] [––environment_passwords]
rack_tool debug-collector [––help] [list, dump, create, retrieve, delete] [––path] [––id]
rack_tool generate-rack-config [––help] ––mode mode ––size size
rack_tool hostname [––help] [––force-da]
rack_tool install-key [––help] [––force-da]
rack_tool logging-server [––help] ––address address [––force-da] ––port port ––device device
rack_tool power-cycle [––help]
rack_tool power-off [––help] [––hard]
rack_tool power-on [––help]
rack_tool run-command [––help] ––command command ––device device
rack_tool set-ethernet-switch-mode [––help] ––mode mode
rack_tool status [––help] [––no-color] [––show-only-running-version] [––show-json] [––only-ping]
rack_tool update-root-overlay [––help] [––force-da] [––overlay overlay_directory]
rack_tool upgrade [––help] [––clean-sel-logs] [––component components] [––daisy-chain] [––force] [––force-da] [––golden] [––gw-root-overlay path_to_overlay] [––sftp]
rack_tool upgrade-status [––help] [––no-color] [––show-json]
rack_tool vipu-test [––help] [––force-da] [––ipu-link-topology topology] [––vipu-path path_to_vipu_binaries]

Description

rack_tool is a tool to make it easy to do operations related to all IPU-Machines in a rack.

Options

The options ––ipum, ––global-credentials and ––config-file must go after the command parameters.

-a, ––address address

Set the IP address of the logging server.

––clean-sel-logs

Clean SEL logs on the BMC after upgrade is finished. This flag is not available for RNIC-only mode.

-c, ––command command

Command to be executed on the device.

––component components
A list of specific components to be upgraded. If not specified, then all components are upgraded. See also Modes.
The possible values are:
  • bmc-all: Upgrade all BMC components: bmc, icu and system-fpga. This flag will not work in RNIC-only mode.

  • gw-all: Upgrade all IPU-Gateway components: gw, vipu-agents and vipu-standalone. This flag will not work in BMC-only mode.

––config-file config_file
Config file with information about IPU-Machines to connect to. The config file will be ignored if the ––ipum parameter is given. See Config file format.
Example:
rack_tool upgrade --config-file /home/ipuuser/my_config_file.json
-d, ––device bmc|gw

Specify whether this command applies to the BMC or the IPU-Gateway.

––daisy-chain

Used to upgrade systems in a daisy-chained setup. This means the system-fpga will first be sequentially upgraded and then the other components will be upgraded in parallel.

-E, ––environment_passwords

You can use this option with create-user-via-ipmi to fetch IPMI passwords from environment variables. Use IPMI_PASSWORD for an existing IPMI user and IPMI_NEW_PASSWORD for a new user’s password. These passwords will have precedence over passwords given with options ––password and ––new-password.

––force

Force upgrade of components even if component versions are up to date.

––force-da

Force command to run on Direct Attach systems. Warning: This is not a recommended option as it might break the Direct Attach system setup.

––global-credentials bmc_username bmc_password gw_username gw_password
Option to set a common set of login details for the IPU-Machines selected with ––ipum option. If this option is used, the password and username parameters for the ––ipum option can be omitted.
Example:
rack_tool upgrade --ipum machine1 10.1.1.1 10.1.1.2 --ipum machine2 10.1.2.1 10.1.2.2 --global-credentials root password itadmin password
-g, ––golden

Force upgrade of the “golden” (known good) partition.

––gw-root-overlay path_to_overlay

Specify a directory containing files to overwrite those in the IPU-Gateway file system.

-H, ––hard

Turn the power off immediately, without shutting down the IPU-Machines.

-h, ––help

Prints the synopsis and a list of all available commands. ––help can also be given after a command to show individual help for each command.

-t, ––ipu-link-topology topology

Set the IPU-Link topology to either “mesh” or “torus”. The default is “mesh”.

––id id

Specify log dump ID to retrieve or delete when using debug-collector

––ipum name bmc_ip gw_ip rnic_ip bmc_username bmc_password gw_username gw_password
Option to manually define which IPU-Machines to perform operations on, instead of using a config file. Several IPU-Machines can be selected by passing the ––ipum option multiple times.
To enter IPU-Machines in a BMC-only or RNIC-only setup use the ––operating-mode option.
Example:
rack_tool upgrade --ipum machine1 10.1.1.1 10.1.1.2 10.1.1.5 root password itadmin password
––licenses

Print a summary of licenses for the modules and tools used in maintenance_tools.

––log-dir log_directory
Custom directory to write log files to.
Example:
rack_tool upgrade --log-dir /home/ipuuser/logs
––mode mode
When used with set-ethernet-switch-mode: The Ethernet switch forwarding mode. Valid modes are: [Open, BmcOnly, BmcGatewaySplit] (see Ethernet switch in the BMC user guide for more information).
When used with generate-rack-config: Mode used for setting up IP addresses in a rack_config.json file. Valid modes are: [all, bmc, gw, minimal].
––new-password new_password

Password for the new user created with create-user-via-ipmi.

––new-user new_user

Username for the new user created with create-user-via-ipmi.

––node ip username password
Option to manually define which IPU-Machines to perform operations on, instead of using a config file. Unlike ––ipum this option requires only: <ip> <user> <password>. This must be used with the ––target option to specify the target device. Available target devices are: [ bmc ]. Several IPU-Machines can be selected by passing the ––node option multiple times.
Example:
rack_tool upgrade --target bmc --node <ip> <user> <password>
––no-color

Don’t use colour when printing status information.

––only-ping

Only ping components when running status. When used, the output of status is reduced to only the UP/DOWN/UNAVAILABLE status of devices in the system.

-o, ––overlay overlay_directory

Specify a directory containing files to overwrite those in the IPU-Machine file system.

––operating-mode operating_mode
Set the IPU-Machine operating mode when using the ––ipum option. The possible values are bmc-only, rnic-only or managed.
When BMC-only mode is used all “gw” and “rnic” fields must be omitted from ––ipum and ––global-credentials.
When RNIC-only mode is used all “bmc” and “gw” fields must be omitted from ––ipum and ––global-credentials.
  • Required arguments for BMC-only mode are: ––ipum <name> <rnic_ip> <gw_user> <gw_pass>

  • Required arguments for RNIC-only mode are: ––ipum <name> <bmc_ip> <bmc_user> <bmc_pass>

  • Required arguments for MANAGED mode are: ––ipum <name> <bmc_ip> <gw ip> <rnic ip> <bmc username> <bmc password> <gw username> <gw password>

––path path

Local path to save log dump to when using debug-collector.

––password password

Password for the existing user account that will be used to create the new user with create-user-via-ipmi.

-p, ––port port

Set the port to use on the logging server.

––priv priv

Privilege level for new user when using create-user-via-ipmi.

––reboot

Reboot BMC after factory reset when using bmc-factory-reset.

––select name

Run command on a selected subset of machines from a rack_config.json file.

––size size

Number of IPUs in an IPU-Pod. Valid options are: [16, 32, 64, 128]

––sftp

Use sftp instead of scp for file transfers to the BMC.

––show-json

Display the current version information in JSON format.

––show-only-running-version

Display the versions of the currently running software components on the IPU-Machine.

––skip-validation

Skip any validation, done by rack_tool at startup, for status, passwords or installed programs.

––target target_component

The device targeted when using ––node to specify IPU-Machine via command line. Currently available device targets are: [ bmc ]

––user user

Username for the existing user account that will be used to create the new user with create-user-via-ipmi.

––user-id user-id

User ID for new user when using create-user-via-ipmi.

-v, ––version

Print the version of the tool.

––vipu-path path_to_vipu_binaries

Path to the binaries needed to run the V-IPU test (vipu-server and vipu-admin).

Commands

bist

Run built-in self test that checks that most components on the board are available and functional.

bmc-factory-reset

Do factory reset on the BMC on all IPU-Machines.

clean-sel-logs

Clean SEL logs on the BMCs of all IPU-Machines. This option is not available for RNIC-only mode.

create-user-via-ipmi
Create a new user in the system via IPMI. To do this a user needs to exist on the system with the privilege to create new users. The command will login as that user to create the new user. ––user and ––password are the login credentials for the existing user. ––new_user and ––new_password are the login credentials for the new user. The new user also needs to be assigned a user id and privilege level via the options ––user_id and ––priv.
Example:
rack_tool create-user-via-ipmi --user <existing_user> --password <existing_password> --new_user <new_user> --new_password <new_password> --user_id <id> --priv <priv>
debug-collector

Manage logs from BMC debug collector. The following sub-commands are available:

  • list: list dump log entries. This is the default command.

  • create: create a new dump log entry. Warning: The creation of new log dump could take up to one hour to complete, but this command will return immediately …​

  • retrieve: retrieve a dump log entry. By default retrieves the newest log.

  • dump: create a new dump log entry and retrieve it.

  • delete: delete a log dump entry. By default deletes the oldest log.

generate-rack-config

Generate rack_config.json file with a specific mode and size.

hostname

Set hostname on IPU-Gateway and BMC.

install-key

Install the current user’s public SSH key to all IPU-Machines.

logging-server

Set logging server on a device (“bmc” or “gw”) on all IPU-Machines

power-cycle

Power cycle IPU-Gateway and IPUs on all IPU-Machines

power-off

Power off IPU-Gateway and IPUs on all IPU-Machines

power-on

Power on IPU-Gateway and IPUs on all IPU-Machines

run-command

Run a command on a device (“bmc” or “gw”) on all IPU-Machines.

set-ethernet-switch-mode
Set Ethernet switch mode to configure access between IPU-Gateway and BMC.
For more information about the Ethernet switch forwarding modes see Ethernet switch in the BMC user guide.
status

Show status of IPU-Machines in a rack.

update-root-overlay
Copy all files in IPU-Gateway root overlay directory to all IPU-Machines.
upgrade
Start upgrade of IPU-Machines. This will only write the new firmware to flash memory. Some systems may require further configuration and will provide an installation script to perform the upgrade and do any necessary configuration. See the documentation for your IPU system for more information.
See Modes.
upgrade-status

Show status of IPU-Machines in a rack compared to local manifest versions.

vipu-test

Run Virtual-IPU tests.

Exit status

0 Successful program execution.
1 Unsuccessful program execution.

IPU-Gateway root overlay

This is a directory structure that contains files that should be copied to the IPU-Gateway after an upgrade. This makes it possible to provide site-specific files that should be persistent on the IPU-Gateway after upgrade. The contents of this directory are copied over to the IPU-Gateway automatically. You can specify the directory either as an object in the rack config file (see Config file format) or as an argument to the ––overlay command-line option. The directory structure should be the same as the root directory on the IPU-Gateway.

An example of files that can be useful to have in the IPU-Gateway root overlay are those that relate to the outside world, such as NTP and syslog configuration.

Config file format

rack_config.json is a JSON file that rack_tool uses to connect to all the IPU-Machines in a rack. This file is created when the system is initially set up; see the appropriate IPU-POD build and test guide for details: docs.graphcore.ai/hardware. It consists of one mandatory object “machines” and the optional objects “global_credentials”, “gw_root_overlay” and “log_directory”.

rack_tool will look for a config file in three different places with the following priority:

  1. The file passed with the ––config-file parameter

  2. A file named rack_config.json in the current working directory

  3. A rack_config.json file in the ~/.rack_tool/ directory

global_credentials

This is an object that holds the login details of the BMC and IPU-Gateway. the object has the following key/value pairs:

"global_credentials": {
    "bmc_username": "<username>",
    "bmc_passwd": "<password>",
    "gw_username": "<username>",
    "gw_passwd": "<password>"
}
gw_root_overlay

This object is a key/value pair that points to the location of the IPU-Gateway root overlay.

"gw_root_overlay": "/home/ipuuser/.rack_tool/root-overlay",
log_directory
This object is a key/value pair that points to the directory where logs should be stored. If this object is not present in the config file, rack_tool will place logs in the first of the following directories that it has write permission to:
  1. <rack_tool_location>/logs

  2. /var/log/rack_tool

  3. <current_working_directory>/rack_tool_logs

"log_directory": "/home/ipuuser/rack_tool_logs"
machines

This is an array of machine objects that holds information about each IPU-Machine in the rack. Each machine object consists of the following key/value pairs:

"machines": [
    {
        "name": "<machine name>",
        "bmc_ip": "<ip address>",
        "gw_ip": "<ip address>",
        "rnic_ip": "<ip address>"
    }
]

Modes

rack_tool modes determine what functions rack-tool can perform on an IPU-Machine, based on information in the rack_config.json file.

BMC-only mode
If the rack_config.json file only contains BMC IP addresses for the IPU-Machine then rack_tool will run in BMC-only mode. The config file must also contain BMC login information used for the BMC IP address. In this mode the only components that can be updated are bmc, icu and system-fpga. In BMC-only mode, rack_tool can only communicate with the BMC. For example, attempting to upgrade IPU-Gateway components in BMC-only mode will fail.
Example:
"global_credentials": {
    "bmc_username": "<username>",
    "bmc_passwd": "<password>"
},
"machines": [
    {
        "name": "<machine name>",
        "bmc_ip": "<ip address>"
    }
]
RNIC-only mode
If the rack_config.json file only contains RNIC IP addresses for the IPU-Machine then rack_tool will run in RNIC-only mode. The config file must also contain IPU-Gateway login information used for the RNIC IP address. In this mode the only components that can be updated are gw and vipu-agents. In RNIC-only mode, rack_tool can only communicate with the IPU-Gateway. For example, attempting to upgrade BMC components in RNIC-only mode will fail.
Example:
"global_credentials": {
    "gw_username": "<username>",
    "gw_passwd": "<password>"
},
"machines": [
    {
        "name": "<machine name>",
        "rnic_ip": "<ip address>"
    }
]
Managed mode
If the the rack_config.json also contains BMC and IPU-Gateway IP addresses for the IPU-Machine, then it will run in managed mode. The config file must also contain the BMC and IPU-Gateway login credentials. In managed mode we have full access to all rack_tool functionality since it can communicate with both the IPU-Gateway and BMC.
Example:
"global_credentials": {
        "bmc_username": "<username>",
        "bmc_passwd": "<password>",
        "gw_username": "<username>",
        "gw_passwd": "<password>"
},
"machines": [
    {
        "name": "<machine name>",
        "bmc_ip": "<ip address>",
        "gw_ip": "<ip address>",
        "rnic_ip": "<ip address>"
    }
]

Files

~/.rack_tool/

This directory is the default location for configuration files.

~/.rack_tool/rack_config.json

This is the rack configuration file for the rack.

~/.rack_tool/root-overlay/

This is the default directory for IPU-Gateway root overlay files