Preparing a VIOS for Maintenance
using the HMC
Did you know that, starting with Hardware Management Console (HMC) V9.2.950.0 (and later), you can use the HMC user interface (UI) to prepare a (Virtual I/O Server) VIOS for maintenance on your IBM Power server? If this interests you, please read on to find out how.
The VIOS is considered an appliance or (like) firmware rather than an operating system, and with a few exceptions, IBM’s general recommendation for a VIOS is to upgrade to the latest fix pack and its latest service pack on a regular basis to avoid hitting known issues and problems that are (often) fixed in the latest VIOS updates.
Traditionally, VIOS administrators would need to perform run several commands to ensure a VIOS was ready to be taken offline for maintenance. This requires the administrator to ensure that the virtual machines/logical partitions (VMs/LPARs) will not be impacted by a VIOS going down for maintenance. Some automation is possible, usually by writing ”home grown” scripts or implementing some type of automation solution, such as Ansible. Well, now the HMC now provides a new (built-in) level of automation to help ease the administrative strain that can be associated with VIOS maintenance operations. The new capability takes the following into consideration:
• A VIOS requires regular maintenance, and this typically requires a VIOS to be restarted.
• Maintenance usually consists of one or more of the following operations: Applying VIOS software updates/fixes, installing new system or adapter firmware, hardware repair/replacement and more.
• When a VIOS is rebooted or powered off for maintenance, this will most likely impact the VIO clients for which they serve storage and/or network devices.
• Traditionally, administrators would need to use a combination of HMC and/or VIOS command line interface (CLI) to validate that a VIOS was ready for maintenance.
• This would include verifying that all VIO clients have redundancy for their storage and network connections, such as, ensuring more than one path to storage via Virtual SCSI (VSCSI) or Virtual fibre channel (VFC) across more than one VIOS and network failover via Shared Ethernet Adapter failover (SEA FO) or Virtual Network Interface Card (VNIC) failover on more than one VIOS.
The HMC UI enhancement allows administrators to perform to two main functions before VIOS maintenance. The first validates that the VIOS is configured to provide redundancy for network and storage for all VIO clients. The second performs the steps required to prepare the VIOS for maintenance, such as failing over a SEA for example.
Let’s look at some examples. First we’ll show you how to validate that your VIOS is ready for a maintenance operation. And, then we’ll show you how to prepare your VIOS for maintenance using the HMC UI. Note: All examples are based on HMC V10R1 M1020 code levels (there are newer V10 releases of HMC code available today but the concepts of validating and preparing a VIOS are the same in the newer releases).
Validating
your VIOS is ready for maintenance
Let’s say that we plan on applying some updates to a VIOS. These updates will require us to reboot the VIOS after they have been applied to the system. Before we apply the fixes and reboot the VIOS, we first want to ensure that the VIOS in question is configured to provide appropriate redundancy for all the client VIO LPARs that it supports. We can do this using the HMC UI.
To validate that the VIOS is ready for maintenance, from the HMC UI main panel, we click on Resources, then click on All Systems. Next we click on the managed system (the Power server, in this case sys853 below) where the VIOS resides. Then we click on Virtual I/O Servers and check the tick box next to the VIOS to be validated for maintenance (in this case sys853-vios1). Finally we click on Actions and then select the Validate Maintenance Readiness and Prepare option from the menu.
After a few moments, the Validate Maintenance Readiness (for sys853-vios1) window is displayed. Behind the scenes the HMC performed a validation for us, checking for redundancy/failover setup for storage/network provided by VIOS (Virtual SCSI, Virtual FC, SRIOV VNIC and Shared Ethernet Adapters). In the screenshot (below) you can see that the maintenance validation completed successfully and no errors (or warnings) were found.
The HMC can validate the VIOS redundancy based on the current state of the partition and VIOS (as shown in the screenshot below). You can click View System VIOS to view information about all the VIOS on the managed system. The View System Virtual I/O Server Information window displays name of the system, its state, RMC connection status, and Remarks. RMC connectivity to the VIOS is required in order for the validate and prepare operations to be successful on the VIOS. The Remarks column indicates if there are any errors while retrieving inventory information from VIOS, which is used for the redundancy validation. Fortunately, there are no errors in our output.
It’s worth noting that various types of validation are performed at this point. Let’s look at each validation step in more detail. Below is an overview of each validation step.
VSCSI Validation
Displays the Partition Name (State), Storage Name, Storage Type, and Remarks. The Remarks column indicates whether the client partition has redundancy for the storage. A warning message is displayed if the client partition using the VIOS for virtualized storage and/or network is powered off. An error message is displayed if the VIO client partition is in any other state apart from ‘Not Activated’ state. The Storage type can be Physical Volume, Logical Volume, or Virtual Optical Media and Logical units. The HMC checks the VSCSI mapping for the Physical Volume or Logical Unit and validates that there is a redundant connection to the storage from any other VIOS partition.
VFC Validation
Displays the Partition Name (State), vFC Host Adapter, and Remarks. The remarks column indicates whether the client partition has redundant vFC storage provided through the vfchost-FC Port mapping. The vFC storage redundancy validation is performed only for the logical partition that is in a running state with an active RMC connection. A warning is shown if the client partition is not running or does not have proper RMC connection.
VNIC Validation
Displays the Partition Name (State), Virtual NIC Adapter ID, and Remarks. The remarks column indicates whether the virtual NIC adapter has an operational redundant backing device for the client partition. A warning message is displayed if the VIO client partition is shutdown. An error message is displayed if the VIOS client partition is in any other state apart from “Not activated” state.
VLAN Validation
Displays the Partition Name (State), Port Virtual LAN ID, Virtual Switch, Virtual Network Name, and Remarks. The remarks column indicates whether the virtual network that is assigned to the logical partition has redundancy. A warning message is displayed if the VIO client partition is in inactive state. An error message is displayed if the VIOS client partition is in any other state apart from the inactive state. Virtual network will have redundancy if the Shared Ethernet Adapter on the VIOS has redundant Shared Ethernet Adapter in other VIOS.
Let’s take a closer look at some of the detail provided in the Validate Maintenance Readiness output (for sys853-vios1). The following sections are displayed:
• All: Select the All option to view both the errors and the warning message information related to storage or network redundancy. By default, the All option is selected.
• Errors: Select the Errors option to view only the error message information related to storage or network redundancy.
• Warnings: Select the Warnings option to view only the warning message information related to storage or network redundancy.
We can expand each section of the validation results (below).
If we were to expand the Virtual SCSI Storage Validation output, the following information, relating specifically to Virtual SCSI Storage (VSCSI) validation, is displayed:
• Partition Name (State): Displays the name and state of the partition.
• Storage Name: Displays the name of the storage device.
• Storage Type: Displays the type of the storage such as physical volume, logical volume, virtual optical media, and logical units.
• Remarks: Displays the errors and the warning message information related to storage redundancy.
Similarly, the Virtual Fibre Channel Validation section displays the following information:
• Partition Name (State): Displays the name and state of the partition.
• vFC Host Adapter: Displays the name of the virtual Fibre Channel host adapter.
• Remarks: Displays the errors and the warning message information related to virtual Fibre Channel host redundancy.
The Virtual NIC Validation section displays the following information:
• Partition Name (State): Displays the name and state of the partition.
• VNIC Device: Displays the virtual NIC adapter value.
• Remarks: Displays the errors and the warning message information related to virtual NIC adapter redundancy.
The Virtual LAN Validation section displays the following information:
• Partition Name (State): Displays the name and state of the partition.
• Port VLAN ID: Displays the Port VLAN ID value.
• Virtual Switch: Displays the name of the virtual switch.
• Virtual Network Name: Displays the name of the virtual network.
• Remarks: Displays the errors and the warning message information related to virtual network redundancy.
The Audit Trails section can also be expanded to view the actions that were taken to validate the VIOS environment, before preparing for maintenance. The screenshots show an example Audit Trails result. There are multiple sections, split into the following areas:
• Virtual SCSI Validation Results
• Virtual FC Validation Results
• Virtual NIC Validations Results
• Virtual LAN Validations Results
From the output we observe that:
• The Virtual SCSI Validation results show that the disk, hdisk6, assigned to LPAR srr_lpar2_vscsi, has a redundant path through another VIOS, sys853-vios2. There will be no impact to this disk or the LPAR, when this VIOS, sys853-vios1, is stopped for maintenance.
• The Virtual FC Validation results shows that the LPAR, srr_lpar1_vfc, has a redundant path to its SAN disk, through another VIOS, sys853-vios2. This is a typical dual VIOS, with MPIO SAN disk configuration. No impact to the client.
We also observe the following
results for VNIC and VLAN validation:
• The Virtual NIC Validation results show that the VNIC adapter (at location U8286.42A.2143F9V-V3-C2, on the client LPAR), has redundant backing devices on another VIOS. No impact to network connectivity.
• The Virtual LAN Validation results shows the Virtual Network, VLAN1, has a redundant connection. Meaning, the Shared Ethernet Adapter (SEA), which services VLAN1, is configured for SEA failover and another VIOS will take over the job of servicing network traffic (for VLAN1), if this VIOS, sys853-vios1, is shutdown for maintenance. No impact to network connectivity.
But what happens if validation finds an error? Let’s look at some examples.
If the VIOS validation process finds an error with the VIOS environment, you can click on Errors to view only error messages (and click Warnings to view only warning messages) that are displayed in the Remarks column of all sections. Click All to view both the error and warning messages. Errors will mean that the validation process was unable to verify redundancy of some components of the VIOS environment. This typically means the VIO clients will be impacted by the selected VIOS going down for maintenance.
In our example output below, the validation for VIOS sys853-vios1 has found some problems/errors, that need to be addressed before you can perform maintenance on this VIOS, without impacting its VIO client partitions.
Under Virtual SCSI Storage Validation an error is shown stating that the disk, hdisk6, does not have a redundant path through another VIOS. As a result the LPAR, srr_lpar2_vscsi, will be impacted. Based on this information, the administrator can reconfigure the client partition’s storage or network, using the HMCs UI or CLI interface to achieve redundancy, then click Re-Validate (the circle arrow/refresh icon) to refresh the data of the impacted VIO client partitions, after the configuration changes. For instance, if an error is reported with VSCSI redundancy, you can fix the issue and then simply click on the refresh icon to perform another validation (re-validate), without exiting the Validate Maintenance Readiness window.
Here are some more example errors
(below). Under Virtual NIC Validation an error is shown stating
that a VNIC, does not have a redundant VNIC backing device on another VIOS. As
a result the LPAR, srr_lpar1_vfcs, will have its VNIC network
connectivity impacted, when the VIOS, sys853-vios1 is shutdown. The Audit
Trails section also provides all the validation steps and the results,
including any errors or warnings. In the sample output, hdisk10,
assigned to LPAR srr_lpar2_vscsi, does not have a redundant connection.
It is possible that this disk is assigned to only one VIOS and has a single
path only to the client LPAR. If the VIOS was shutdown, the client LPAR would
lose access to its disk and likely suffer an unexpected outage. If you observe
this type of configuration issue, you would resolve the problem, by adding a
second path, through another VIOS, for hdisk10. Then you would
re-validate this VIOS to ensure it is now ready for maintenance.
Once we have validated that a VIOS is ready for maintenance, and we’ve confirmed that VIO client partitions will not be negatively impacted, we can move on to the next step, which is preparation.
Preparing your VIOS for
Maintenance - Conventional Methods
Traditionally, preparing a VIOS for
maintenance would sometimes require the administrator to perform several
actions, such as initiating SEA failover and defining VFC/VSCSI adapter paths. VIOS commands could be used to place
a VIOS into the desired state, before performing maintenance, for example, to
place the primary SEA in standby mode and failover to the secondary VIOS
and place the vfchost adapters in a defined state on the VIOS:
$ chdev -dev entX
-attr ha_mode=standby
...
$ rmdev -dev
vfchostX -ucfg
...
$ rmdev -dev
vtscsiX –ucfg
...
For large environments, with many VIO clients and many VIOS, this can be a challenging task to perform manually. Prone to human error (typo’s!), requiring scripting to control or automate changes, which can often prove to be troublesome to test and maintain.
Preparing your VIOS for
Maintenance – HMC UI Method
Once validation is successful, simply click on the Prepare for Maintenance button (as shown below). In this example we are preparing the VIOS, sys853-vios1 for maintenance.
When you click the button, to prepare the selected VIOS for maintenance, the HMC performs the following steps:
• Unconfigures the Virtual SCSI Target Device of the redundant virtual SCSI mapping (rmdev -dev vtscsiX –ucfg).
• Unconfigures the Virtual Fibre Channel server adapter of the Virtual FC Mapping (rmdev -dev vfchostX -ucfg).
• Switches the Virtual NIC backing device to a redundant backing device from the other VIOS (chhwres -r virtualio -m <system name> -o so --rsubtype vnicbkdev -p <target VIOS>)
• Changes the High Availability mode of the Shared Ethernet Adapter of the failover Network Bridge to Standby state so that all the network traffic flows from the redundant Shared Ethernet Adapter of the other VIOS (chdev -dev entX -attr ha_mode=standby).
Once we’ve clicked Prepare for Maintenance, a pop-up window asks for confirmation. Click OK to continue with the prepare for maintenance operation. Select the check-box if you want to continue the prepare for maintenance operation even if there are errors, or warnings during the validation in the confirmation window. Be careful with this option! The prepare operation also runs the validation steps (again), before attempting to prepare a VIOS for maintenance. If there are errors or warnings during this validation step, and the check-box in the confirmation window was not selected, then the prepare operation will not be attempted.
If there are no errors or warnings during validation, the operation is attempted by performing the failover operations. If errors occur during the failover operation, the HMC performs a roll-back to revert the VIOS into its original configuration. And if the check-box is selected, then the prepare will continue even if there are errors/warnings during validation steps or error during failover operation. i.e. roll-back will not be attempted when the check-box is selected.
Note: While the prepare steps are being performed, no other configuration changes on the VIOS or the client partitions should be attempted.
The prepare operation may take a few minutes to complete. When finished it will report whether or not the VIOS has been successfully prepared for maintenance. You may click on the Audit Trails twisty to view the steps taken to prepare the VIOS for maintenance. The example output shows that preparing the VIOS, sys853-vios1, for maintenance was successful. After a successful “prepare for maintenance” operation, the administrator can view the audit report of the various operations that were performed during the prepare, in the Audit Trails pane.
Clicking on the Audit Trails section allows us to view the steps taken to prepare the VIOS. Below is output from the audit trail after Prepare for Maintenance is completed.
The Audit Trails shows the validation steps (performed again) and the prepare steps. The validation results for Virtual SCSI and FC. Showing confirmation of redundant paths for storage (below).
Below are the validation results for Virtual NIC and LAN, showing confirmation of redundancy for network traffic (below).
The Prepare Results sections are split into the following areas:
• Virtual SCSI Prepare Results
• Virtual FC Prepare Results
• Virtual NIC Prepare Results
• Virtual LAN Prepare Results
The Virtual SCSI Prepare Results confirms the VSCSI Virtual Target Device (VTD), lpar2_rootvg, was put into a Defined state, for LPAR srr_lpar2_vscsi. The Virtual FC Prepare Results confirm VFC host adapter, vfchost0, was put into a Defined state, for LPAR srr_lpar1_vfc. The Virtual NIC Prepare Results confirm a VNIC failover was performed for the VNIC adapter in slot 2, on the LPAR srr_lpar1_vfc. The Virtual VLAN Prepare Results confirm the SEA ent4 (on sys853-vios1) was placed into standby mode, initiating a failover to its SEA VIOS partner (sys853-vios2).
Next, let’s review what changes the HMC performed on the VIOS (sys853-vios1) and how it impacted the VIO client LPARs.
The following changes (below) were
performed on the VIOS by the HMC prepare operation (refer to output below. At
this point, the HMC has prepared the VIOS for maintenance, by first changing
the SEA HA mode to standby, triggering a SEA failover to its partner
VIOS. The HMC has also unconfigured (defined) the vfchost adapter
on the VIOS, essentially disabling this VFC path. The HMC has unconfigured (defined)
the VSCSI Virtual Target Devices (VTD) device on the VIOS,
essentially disabling the path to this VSCSI device. The HMC has initiated a
VNIC failover (to another VIOS) for the VNIC server, vnicserver0, on sys853-vios1.
The vnicstat command
confirms the VNIC backing device failover state is now inactive. The
VIOS errlog also reports a VNIC failover event has occurred (refer to
the command reference https://www.ibm.com/docs/en/power9?topic=commands-vnicstat-command):
· SEA HA mode changed to standby.
[padmin@sys853-vios1]$ lsdev -dev ent4 -attr ha_mode
Value
standby
· vfchostX adapters defined.
[padmin@sys853-vios1]$ lsmap -all -npiv
Name Physloc ClntID
ClntName ClntOS
------------- ---------------------------------- ------ --------------
-------
vfchost0
U8286.42A.2143F9V-V1-C30
0
Status:NOT_LOGGED_IN
FC name: FC loc code:
Ports logged in:0
Flags:0<>
VFC client name:
VFC client DRC:
padmin@sys853-vios1]$ lsdev | grep Def | grep vfchost
vfchost0 Defined Virtual FC Server Adapter
• The VSCSI VTD device is now defined.
[padmin@sys853-vios1]$ lsmap -all
SVSA Physloc Client
Partition ID
--------------- --------------------------------------------
------------------
vhost0
U8286.42A.2143F9V-V1-C20 0x00000014
VTD lpar2_rootvg
Status Defined
LUN
0x8100000000000000
Backing device hdisk10
Physloc
U78C9.001.WZS00YD-P1-C7-T1-W500507680C12332B-L1000000000000
Mirrored false
[padmin@sys853-vios1]$ lsdev | grep Def | grep Virtual
lpar2_rootvg Defined Virtual Target Device - Disk
• VNIC failover was performed by the HMC:
[padmin@sys853-vios1]$ oem_setup_env
[root@sys853-vios1]# vnicstat
vnicserver0 | grep "Failover State"
Failover State: inactive
[padmin@sys853-vios1]$ errlog
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
8C577CB6 0404232523 I S vnicserver0 VNIC Transport Event
...
Let’s review how the VIO client LPARs were impacted when the VIOS was prepared for maintenance.
The VIO client LPARs, lpar1 and lpar2, were not impacted. As expected, the AIX lspath command reported the failure of the associated disk paths (VSCSI and VFC) being unconfigured on sys853-vios1, VSCSI and VFC paths failed.
root@lpar1 / # lspath
Failed hdisk1
fscsi0
Failed hdisk1
fscsi0
Failed hdisk1
fscsi0
Failed hdisk1
fscsi0
Enabled hdisk1 fscsi1
Enabled hdisk1 fscsi1
Enabled hdisk1 fscsi1
Enabled hdisk1 fscsi1
root@lpar2 / # lspath
Failed hdisk0
vscsi0
Enabled hdisk0 vscsi1
The AIX error log reported the failure of the associated disk paths (VSCSI and VFC) being unconfigured on sys853-vios1:
root@lpar1 / # errpt
IDENTIFIER TIMESTAMP
T C RESOURCE_NAME DESCRIPTION
DE3B8540
0404232623 P H hdisk1 PATH
HAS FAILED
E6DB28E5
0404232623 T H fscsi0
ADAPTER ERROR
FE3E6B3B
0404232523 P S fcs0
Transport event while requests are active
root@lpar2 / # errpt
IDENTIFIER TIMESTAMP
T C RESOURCE_NAME DESCRIPTION
DCB47997
0404223523 T H hdisk0 DISK
OPERATION ERROR
DE3B8540
0404223523 P H hdisk0 PATH
HAS FAILED
VNIC failover was transparent to the VIO client LPAR, lpar1. The AIX entstat command shows the VNIC server LPAR Name has changed to sys853-vios2. This indicates a successful VNIC failover, ”VNIC Server” and “LPAR Name” change.
root@lpar1 / # entstat -d ent1 | tail -6
Server Information:
LPAR
ID: 2
LPAR
Name: sys853-vios2
VNIC
Server: vnicserver0
Backing Device: ent5
Backing Device Location: U78C9.001.WZS00YD-P1-C10-T4-S8
Your
VIOS is now ready for maintenance!
Once the VIOS has been successfully prepared for maintenance, the next step would be to actually perform the planned maintenance activity. When you have completed maintenance on the VIOS, and you have successfully rebooted/restarted the VIOS, you can verify the VIOS is fully functional and operational again. Let’s see what happens after maintenance when using this new method.
After VIOS Maintenance
After successfully performing maintenance on your VIOS and restarting it, the administrator must manually bring the SEA back online (this behaviour may change in a future release of HMC code, but for now it is a manual step for the administrator to perform). In the example below, we change the ha_mode attribute back to auto, which will initiate a fallback of the SEA adapter. This will allow the VIOS, sys853-vios1, to become the primary SEA for network traffic again. Its partner VIOS, sys853-vios2, will become the backup SEA VIOS, once again. This can also be achieved from the HMC UI (System -> Virtual Network -> Network Bridges, for information, please refer to the following Change High Availability Mode of Shared Ethernet Adapter of VIOS using the HMC UI
[padmin@sys853-vios1]$ chdev -dev ent4 -attr
ha_mode=auto
ent4 changed
[padmin@sys853-vios1]$ errlog
IDENTIFIER TIMESTAMP
T C RESOURCE_NAME DESCRIPTION
E48A73A4
0405000523 I H ent4 BECOME
PRIMARY
...
[padmin@sys853-vios2]$ errlog
IDENTIFIER TIMESTAMP
T C RESOURCE_NAME DESCRIPTION
1FE2DD91
0405000523 I H ent4 BECOME
BACKUP
After maintenance, the administrator should also switch over (fallback) all virtual NIC backing devices on the current VIOS to another VIOS (which is usually the original VIOS). The chhwres command, shown below, shows you how to initiate a VNIC failover/fallback of all VNIC server backing devices (vnicbkdev) on sys853-vios2 to other VIOS(es), which in this case is sys853-vios1. In HMC V10.1.1010, a new option was provided to failover of all the VNIC backing devices on one VIOS to redundant backing device from another VIOS. If the VNIC does not have active backing device, from any other VIO, no action will be taken on the backing device, and an error will be displayed. The command format is chhwres -r virtualio -m <system name> -o so --rsubtype vnicbkdev -p <target VIOS>), example shown below.
hmc1:~> chhwres -r virtualio -m sys853 -o so
--rsubtype vnicbkdev -p sys853-vios2
After running the chhwres command we confirmed that the “LPAR Name”, for vnicserver0, was, once again, reporting as sys853-vios1 (the original VIOS). This helps us confirm that the VNIC fallback was successful. We also need to re-enable the auto failover priority (auto_priority_failover=) after the fallback is completed (see below).
[root@sys853-vios1]# vnicstat vnicserver0
| grep "Failover State"
Failover State: active
root@lpar1 / # entstat -d ent1 | tail -6
Server Information:
LPAR
ID: 1
LPAR
Name: sys853-vios1
VNIC
Server: vnicserver0
...
hmc1:~> lshwres -r virtualio -m sys853
--rsubtype vnic --level lpar -F lpar_name,slot_num,auto_priority_failover
srr_lpar1_vfc,2,0
hmc1:~> chhwres -o s -r virtualio -m sys853
--rsubtype vnic -p srr_lpar1_vfc -s 2 -a auto_priority_failover=1
hmc1:~> lshwres -r virtualio -m sys853
--rsubtype vnic --level lpar -F lpar_name,slot_num,auto_priority_failover
srr_lpar1_vfc,2,1
That brings us to the end of this article. I hope it has provided you with some insight into how recent HMC enhancements can make your life, as an IBM Power server administrator, a little easier when it comes to performing VIOS maintenance.
Stayed tuned, as more HMC enhancements are coming that are aimed at continuing to simplify your VIOS management! For example, with HMC version 10.2.1030.0, you can now update your VIOS using the HMC, https://community.ibm.com/community/user/power/blogs/manjunath-shanbhag1/2023/05/29/update-vios-using-hmc.
More on this next time!