A colleague of mine was planning to modify the max_xfer_size attribute on a couple of FC adapters in one of his AIX LPARs. As he was describing his plan to me, I asked him how he intended to back out of the change should the LPAR fail to boot after the modifications. But, what could possibly go wrong? he fired back. I advised him to use multibos to create a standby (backup) instance of the AIX OS, just in case. He begrudgingly did so, just to keep me happy.
The next day he told me the following tale.
He had modified the FC adapters max_xfer_size attribute as planned. First, checking the current values, for the attribute on both adapters.
aixlpar1 : / # lsattr -El fcs0 -a max_xfer_size
max_xfer_size 0x100000 Maximum Transfer Size True
aixlpar1 : / # lsattr -El fcs1 -a max_xfer_size
max_xfer_size 0x100000 Maximum Transfer Size True
Hed created a standby AIX instance before making changes to the adapters. He also prevented multibos from changing the bootlist to the standby boot logical volume (BLV).
aixlpar1 : / # multibos -sXt
Initializing multibos methods ...
Initializing log /etc/multibos/logs/op.alog ...
Gathering system information ...
+-----------------------------------------------------------------------------+
Setup Operation
+-----------------------------------------------------------------------------+
Verifying operation parameters ...
Creating image.data file ...
He modified the FC adapters as planned.
aixlpar1 : / # chdev -l fcs0 -a max_xfer_size=0x200000 -P
fcs0 changed
aixlpar1 : / # chdev -l fcs1 -a max_xfer_size=0x200000 -P
fcs1 changed
aixlpar1 : / # lsattr -El fcs0 -a max_xfer_size
max_xfer_size 0x200000 Maximum Transfer Size True
aixlpar1 : / # lsattr -El fcs1 -a max_xfer_size
max_xfer_size 0x200000 Maximum Transfer Size True
He verified that the standby instance still held the original values for both FC adapters.
aixlpar1 : / # multibos -S
Initializing multibos methods ...
Initializing log /etc/multibos/logs/op.alog ...
Gathering system information ...
+-----------------------------------------------------------------------------+
Multibos Shell Operation
+-----------------------------------------------------------------------------+
Verifying operation parameters ...
+-----------------------------------------------------------------------------+
Mount Processing
+-----------------------------------------------------------------------------+
Mounting all standby BOS file systems ...
Mounting /bos_inst
Mounting /bos_inst/usr
Mounting /bos_inst/var
Mounting /bos_inst/opt
+-----------------------------------------------------------------------------+
Multibos Root Shell
+-----------------------------------------------------------------------------+
Starting multibos root shell ...
Active boot logical volume is hd5.
Script command is started. The file is /etc/multibos/logs/scriptlog.120713124518.txt.
aixlpar1 : / # lsattr -El fcs0 -a max_xfer_size
max_xfer_size 0x100000 Maximum Transfer Size True
aixlpar1 : / # lsattr -El fcs1 -a max_xfer_size
max_xfer_size 0x100000 Maximum Transfer Size True
aixlpar1 : / # exit
Script command is complete. The file is /etc/multibos/logs/scriptlog.120713124518.txt.
Stopping multibos root shell ...
Compressing script log file ...
Compressed script log file is /etc/multibos/logs/scriptlog.120713124518.txt.Z
+-----------------------------------------------------------------------------+
Mount Processing
+-----------------------------------------------------------------------------+
Unmounting all standby BOS file systems ...
Unmounting /bos_inst/opt
Unmounting /bos_inst/var
Unmounting /bos_inst/usr
Unmounting /bos_inst
Log file is /etc/multibos/logs/op.alog
Return Status = SUCCESS
Then he manually changed the LPARs boot list to include the standby BLV.
aixlpar1 : / # bootlist -m normal hdisk2 blv=hd5 hdisk2 blv=bos_hd5
aixlpar1 : / # bootlist -m normal -o
hdisk2 blv=hd5 pathid=0
hdisk2 blv=hd5 pathid=1
hdisk2 blv=bos_hd5 pathid=0
hdisk2 blv=bos_hd5 pathid=1
He carefully recorded the bootlist output, just in case the boot failed with new max_xfer_size values. He could use the vdevice name and location to manually select the standby BLV to start the system in an emergency.
aixlpar1 : / # bootlist -m normal -ov
'ibm,max-boot-devices' = 0x5
NVRAM variable: (boot-device=/vdevice/vfc-client@30000014/disk@50060e8006d0206a,1000000000000:2 /vdevice/vfc-client@3000001e/disk@50060e8006d0207a,1000000000000:2 /vdevice/vfc-client@30000014/disk@50060e8006d0206a,1000000000000:4 /vdevice/vfc-client@3000001e/disk@50060e8006d0207a,1000000000000:4)
Path name: (/vdevice/vfc-client@30000014/disk@50060e8006d0206a,1000000000000:2)
match_specific_info: ut=disk/fcp/htcvspmpio
hdisk2 blv=hd5 pathid=0
Path name: (/vdevice/vfc-client@3000001e/disk@50060e8006d0207a,1000000000000:2)
match_specific_info: ut=disk/fcp/htcvspmpio
hdisk2 blv=hd5 pathid=1
Path name: (/vdevice/vfc-client@30000014/disk@50060e8006d0206a,1000000000000:4)
match_specific_info: ut=disk/fcp/htcvspmpio
hdisk2 blv=bos_hd5 pathid=0
Path name: (/vdevice/vfc-client@3000001e/disk@50060e8006d0207a,1000000000000:4)
match_specific_info: ut=disk/fcp/htcvspmpio
hdisk2 blv=bos_hd5 pathid=1
He restarted the LPAR using the primary BLV, with the modified FC attributes.
The system hung at LED 554.
hscroot@hmc1:~> lsrefcode -m 795-1 -r lpar --filter "lpar_names=aixlpar1" -F lpar_name:refcode
aixlpar1:0554
He restarted the LPAR in Open Firmware mode.
At the Open Firmware prompt, he entered the following to boot the LPAR from the standby instance BLV:
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
1 = SMS Menu 5 = Default Boot List
8 = Open Firmware Prompt 6 = Stored Boot List
Memory Keyboard Network SCSI Speaker ok
0 > boot /vdevice/vfc-client@30000014/disk@50060e8006d0206a,1000000000000:4 |
The system booted OK on the standby AIX instance.
Elapsed time since release of system processors: 28858 mins 55 secs
-------------------------------------------------------------------------------
Welcome to AIX.
boot image timestamp: 02:39:13 07/13/2012
The current time and date: 02:54:02 07/13/2012
processor count: 1; memory size: 4096MB; kernel size: 35062697
boot device: /vdevice/vfc-client@30000014/disk@50060e8006d0206a,1000000000000:4
-------------------------------------------------------------------------------
The FC adapters were running on the previous values, stored in the standby instance of AIX.
aixlpar1 : / # lsattr -El fcs0 -a max_xfer_size
max_xfer_size 0x100000 Maximum Transfer Size True
aixlpar1 : / # lsattr -El fcs1 -a max_xfer_size
max_xfer_size 0x100000 Maximum Transfer Size True
The original AIX instance still held the modified values for the FC adapters.
aixlpar1 : / # multibos -S
Initializing multibos methods ...
Initializing log /etc/multibos/logs/op.alog ...
Gathering system information ...
+-----------------------------------------------------------------------------+
Multibos Shell Operation
+-----------------------------------------------------------------------------+
Verifying operation parameters ...
+-----------------------------------------------------------------------------+
Mount Processing
+-----------------------------------------------------------------------------+
Mounting all standby BOS file systems ...
Mounting /bos_inst
Mounting /bos_inst/usr
Mounting /bos_inst/var
Mounting /bos_inst/opt
+-----------------------------------------------------------------------------+
Multibos Root Shell
+-----------------------------------------------------------------------------+
Starting multibos root shell ...
Active boot logical volume is bos_hd5.
Script command is started. The file is /etc/multibos/logs/scriptlog.120713125542.txt.
aixlpar1 : / # lsattr -El fcs0 -a max_xfer_size
max_xfer_size 0x200000 Maximum Transfer Size True
aixlpar1 : / # lsattr -El fcs1 -a max_xfer_size
max_xfer_size 0x200000 Maximum Transfer Size True
aixlpar1 : / # exit
Script command is complete. The file is /etc/multibos/logs/scriptlog.120713125542.txt.
Stopping multibos root shell ...
Compressing script log file ...
Compressed script log file is /etc/multibos/logs/scriptlog.120713125542.txt.Z
+-----------------------------------------------------------------------------+
Mount Processing
+-----------------------------------------------------------------------------+
Unmounting all standby BOS file systems ...
Unmounting /bos_inst/opt
Unmounting /bos_inst/var
Unmounting /bos_inst/usr
Unmounting /bos_inst
Log file is /etc/multibos/logs/op.alog
Return Status = SUCCESS
The cause of the 554 hang appeared to be related to the fact that the VIOS physical adapters needed their max_xfer_size value changed to the new value before the client LPAR virtual fibre channel adapters were modified.
My colleague was glad he used multibos. It saved his bacon.