9. Appendix
This appendix contains example configuration files for the switches, servers and IPU-Machines.
9.1. Example SPINE switch configurations
9.1.1. Spine downlink member interface
interface Ethernet5/1
description Link-POD123-100G
dcbx mode ieee
flowcontrol send off
flowcontrol receive off
speed 100g-2
channel-group 123 mode active
priority-flow-control on
priority-flow-control priority 0 no-drop
priority-flow-control priority 1 no-drop
priority-flow-control priority 2 no-drop
priority-flow-control priority 3 no-drop
priority-flow-control priority 4 no-drop
priority-flow-control priority 5 no-drop
priority-flow-control priority 6 no-drop
priority-flow-control priority 7 no-drop
9.1.2. Spine port-channel
interface Port-Channel123
description LINK-POD123-100G
switchport trunk allowed vlan 1000-2000 # VLANs for overcloud/providers
switchport mode trunk
mlag 123
9.2. Example ToR switch configurations
9.2.1. IPUM access port
interface Ethernet9/1
description IPUMs Compute
dcbx mode ieee
switchport access vlan 1234 # managed by Ironic/Neutron/NGS
switchport trunk allowed vlan none
priority-flow-control on
priority-flow-control priority 0 no-drop
priority-flow-control priority 1 no-drop
priority-flow-control priority 2 no-drop
priority-flow-control priority 3 no-drop
priority-flow-control priority 4 no-drop
priority-flow-control priority 5 no-drop
priority-flow-control priority 6 no-drop
priority-flow-control priority 7 no-drop
spanning-tree portfast
9.2.2. Hypervisor access port
interface Ethernet1/1
description host1
dcbx mode ieee
flowcontrol send off
flowcontrol receive off
error-correction encoding reed-solomon
switchport access vlan 1002 # Overcloud provisisoning VLAN
channel-group 20 mode active
lacp port-priority 16000
priority-flow-control on
priority-flow-control priority 0 no-drop
priority-flow-control priority 1 no-drop
priority-flow-control priority 2 no-drop
priority-flow-control priority 3 no-drop
priority-flow-control priority 4 no-drop
priority-flow-control priority 5 no-drop
priority-flow-control priority 6 no-drop
priority-flow-control priority 7 no-drop
spanning-tree portfast
9.2.3. Hypervisor Portchannel
interface Port-Channel20
description T:host1:bond
switchport trunk allowed vlan 1000-2000 # VLANs for overcloud/providers
switchport mode trunk
port-channel lacp fallback individual
port-channel lacp fallback timeout 5
spanning-tree portfast
9.3. Mellanox ConnectX-5 configuration
Parameters:
<PCIe device address>
: Each device in the system will have a different PCIe address and therefore a unique<PCIe device address>
parameter.
<number of configured VFs>
: The number of Virtual Functions (VFs) will depend on the maximum number of VFs the ConnectX-5 card supports and how many you choose to set per node.
Enable SRIOV:
mlxconfig -y -d <PCIe device address> set SRIOV_EN=1
Set number of VFs:
mlxconfig -y -d <PCIe device address> set NUM_OF_VFS=<number of configured VFs>
Enable LLDP DCBX PFC:
mlxconfig -y -d <PCIe device address> set LLDP_NB_DCBX_P1=TRUE LLDP_NB_TX_MODE_P1=2 LLDP_NB_RX_MODE_P1=2 LLDP_NB_DCBX_P2=TRUE LLDP_NB_TX_MODE_P2=2 LLDP_NB_RX_MODE_P2=2
9.4. Dell R640 Intel virtualisation
9.4.1. Hypervisor configuration
Key nova.conf
for R640 hypervisor:
[libvirt]
cpu_mode = host-passthrough
cpu_model_extra_flags = topoext
[compute]
# dedicate all CPUs, except the first 4 from each socket (along with it’s paired thread)
cpu_dedicated_set = 0-95,^0,^48,^1,^49,^2,^50,^3,^51,^4,^52,^5,^53,^6,^54,^7,^55
9.4.2. Virtual Machine flavour configs (Terraform for OpenStack)
When creating virtual machines, the following metadata will create a large high-performance instance.
resource "openstack_compute_flavor_v2" "graphcore_flavor_r640_xlarge" {
name = "r640.xlarge"
vcpus = 80
ram = 573440
disk = 40
ephemeral = 0
is_public= false
extra_specs = {
"trait:HW_CPU_X86_INTEL_VMX " = "required"
"hw:cpu_policy" = "dedicated"
"hw:numa_nodes" = 2
"hw:cpu_sockets" = 2
"hw:cpu_threads" = 2
"hw:cpu_thread_policy" = "prefer"
"hw_rng:allowed" = "True"
"hw:mem_page_size" = "1GB"
"hw:pci_numa_affinity_policy" = "preferred"
}
}
9.5. Dell R6525 AMD virtualisation
9.5.1. Hypervisor configuration
Key configuration for nova.conf
:
[libvirt]
cpu_mode = host-passthrough
cpu_model_extra_flags = topoext
[compute]
# dedicate all CPUs except one from each NUMA zone (along with it’s paired thread)
cpu_dedicated_set = 0-255,^0,^16,^32,^48,^64,^80,^96,^112,^128,^144,^160,^176,^192,^208,^224,^240
9.5.2. Virtual Machine flavour configs (Terraform for OpenStack)
When creating virtual machines, the following metadata will create a large high-performance instance that uses all the available CPU and RAM from a hypervisor.
Note the NUMA nodes settings matches the physical configuration of the host.
resource "openstack_compute_flavor_v2" "graphcore_flavor_r6525_full" {
name = "r6525.full"
flavor_id = "18d98749-f2c5-4bcc-a4bf-94eb65d5a101"
vcpus = 240
ram = 491520
disk = 100
ephemeral = 4300
is_public= false
extra_specs = {
"trait:HW_CPU_X86_AMD_SVM" = "required"
"hw:cpu_policy" = "dedicated"
"hw:numa_nodes" = 8
"hw:cpu_sockets" = 2
"hw:cpu_threads" = 2
"hw:cpu_thread_policy" = "require"
"hw_rng:allowed" = "True"
"hw:mem_page_size" = "1GB"
"hw:pci_numa_affinity_policy" = "preferred"
}
}
9.6. IPU-Machine Ironic driver
Ironic can be found here: https://github.com/openstack/ironic/tree/stable/wallaby
We use a modified version of the public driver, as shown below:
Patch:
ironic/drivers/redfish.py | 73 +++++++++++++++++++++++++++++++++++++++
setup.cfg | 2 ++
2 files changed, 75 insertions(+)
diff --git a/ironic/drivers/redfish.py b/ironic/drivers/redfish.py
index d51e58b6f2..23afa0f07a 100644
--- a/ironic/drivers/redfish.py
+++ b/ironic/drivers/redfish.py
@@ -14,12 +14,20 @@
# under the License.
from ironic.drivers import generic
+from ironic.common import states
+from ironic.drivers import base
+from ironic.drivers import generic
+from ironic.drivers import hardware_type
+from ironic.drivers.modules import fake
from ironic.drivers.modules import agent
from ironic.drivers.modules import inspector
from ironic.drivers.modules import ipxe
from ironic.drivers.modules import noop
from ironic.drivers.modules import noop_mgmt
from ironic.drivers.modules import pxe
+from ironic.drivers.modules.network import flat as flat_net
+from ironic.drivers.modules.network import neutron
+from ironic.drivers.modules.network import noop as noop_net
from ironic.drivers.modules.redfish import bios as redfish_bios
from ironic.drivers.modules.redfish import boot as redfish_boot
from ironic.drivers.modules.redfish import inspect as redfish_inspect
@@ -27,6 +35,7 @@
from ironic.drivers.modules.redfish import power as redfish_power
from ironic.drivers.modules.redfish import raid as redfish_raid
from ironic.drivers.modules.redfish import vendor as redfish_vendor
+from ironic.drivers.modules.storage import noop as noop_storage
class RedfishHardware(generic.GenericHardware):
@@ -70,3 +79,67 @@ def supported_vendor_interfaces(self):
def supported_raid_interfaces(self):
"""List of supported raid interfaces."""
return [redfish_raid.RedfishRAID, noop.NoRAID, agent.AgentRAID]
+
+class RedfishNetworkAppliance(hardware_type.AbstractHardwareType):
+ """Redfish appliance moved between networks and rebooted using Ironic."""
+
+ @property
+ def supported_power_interfaces(self):
+ return [redfish_power.RedfishPower, fake.FakePower]
+
+ @property
+ def supported_inspect_interfaces(self):
+ """List of supported power interfaces."""
+ # TODO(johng): maybe we only want the port detection?
+ return [redfish_inspect.RedfishInspect, noop.NoInspect]
+
+ @property
+ def supported_network_interfaces(self):
+ """List of supported network interfaces."""
+ return [neutron.NeutronNetwork, flat_net.FlatNetwork,
+ noop_net.NoopNetwork]
+
+ @property
+ def supported_boot_interfaces(self):
+ """List of classes of supported boot interfaces."""
+ return [fake.FakeBoot]
+
+ @property
+ def supported_deploy_interfaces(self):
+ """List of supported deploy interfaces."""
+ return [NetworkOnlyDeploy]
+
+ @property
+ def supported_management_interfaces(self):
+ return [noop_mgmt.NoopManagement]
+
+ @property
+ def supported_raid_interfaces(self):
+ return [noop.NoRAID]
+
+ @property
+ def supported_rescue_interfaces(self):
+ return [noop.NoRescue]
+
+ @property
+ def supported_storage_interfaces(self):
+ return [noop_storage.NoopStorage]
+
+
+class NetworkOnlyDeploy(fake.FakeDeploy):
+ """Class for only doing the network part of a typical deployment.
+ This does only the network setup,
+ then (optionally?) reboots the appliance via redfish,
+ letting DHCP do the heavy lifting.
+ """
+
+ @base.deploy_step(priority=100)
+ def deploy(self, task):
+ task.driver.network.configure_tenant_networks(task)
+ task.driver.power.reboot(task)
+
+ def tear_down(self, task):
+ task.driver.network.unconfigure_tenant_networks(task)
+ # TODO(johng): should we power it off?
+ task.driver.power.reboot(task)
+ return states.DELETED
\ No newline at end of file
diff --git a/setup.cfg b/setup.cfg
index 9d99f6dfff..12c8015fd1 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -91,6 +91,7 @@ ironic.hardware.interfaces.deploy =
fake = ironic.drivers.modules.fake:FakeDeploy
iscsi = ironic.drivers.modules.iscsi_deploy:ISCSIDeploy
ramdisk = ironic.drivers.modules.pxe:PXERamdiskDeploy
+ network-only = ironic.drivers.redfish:NetworkOnlyDeploy
ironic.hardware.interfaces.inspect =
fake = ironic.drivers.modules.fake:FakeInspect
@@ -102,6 +103,7 @@ ironic.hardware.interfaces.inspect =
irmc = ironic.drivers.modules.irmc.inspect:IRMCInspect
no-inspect = ironic.drivers.modules.noop:NoInspect
redfish = ironic.drivers.modules.redfish.inspect:RedfishInspect
+ redfish-network-appliance = ironic.drivers.redfish:RedfishNetworkAppliance
ironic.hardware.interfaces.management =
fake = ironic.drivers.modules.fake:FakeManagement
setup.cfg | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/setup.cfg b/setup.cfg
index 12c8015fd1..fd835d426c 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -103,7 +103,6 @@ ironic.hardware.interfaces.inspect =
irmc = ironic.drivers.modules.irmc.inspect:IRMCInspect
no-inspect = ironic.drivers.modules.noop:NoInspect
redfish = ironic.drivers.modules.redfish.inspect:RedfishInspect
- redfish-network-appliance = ironic.drivers.redfish:RedfishNetworkAppliance
ironic.hardware.interfaces.management =
fake = ironic.drivers.modules.fake:FakeManagement
@@ -186,6 +185,7 @@ ironic.hardware.types =
redfish = ironic.drivers.redfish:RedfishHardware
snmp = ironic.drivers.snmp:SNMPHardware
xclarity = ironic.drivers.xclarity:XClarityHardware
+ redfish-network-appliance = ironic.drivers.redfish:RedfishNetworkAppliance
ironic.database.migration_backend =
sqlalchemy = ironic.db.sqlalchemy.migration