Recently I had the pleasure of configuring a couple of POWER7 720s for a customer. Each 720 was to host roughly 12 LPARs each. There were two VIO servers and a NIM server per 720.
Everything
went along nicely and according to plan. In a few days we had both systems
built. My steps for building the systems were almost identical to those
described by Rob McNelly in a recent post.
All four VIO servers were running the latest VIOS code i.e. 2.2.0.10-FP-24 SP-01. All the client LPARs were running AIX 6.1 TL6 SP3. Each VIOS was configured with two FC paths to the SAN and the SAN storage device was an IBM DS5020.
Native AIX
MPIO was in use on the VIOS and the AIX LPARs. I did not deploy SDDPCM on the
VIOS as this is currently unsupported with the DS5020.
Once the LPAR builds were complete we performed a number of integration tests. These typically involve disconnecting network and SAN cables from each VIOS and observing how the VIOS and LPARs respond to and recover from these types of conditions.
One of the
integration tests required that BOTH fibre cables be disconnected from the
first VIOS and confirm that the client LPARs were not impacted i.e. that all
I/O travelled via the second VIOS.
During the test we notice the following:
What's even more puzzling was that if we simply rebooted the first VIOS,
everything worked as expected i.e. the client LPARs were not impacted, I/O continued
as normal and when the first VIOS was back up, the paths on the client LPARs
recovered quickly.
After doing some research, we discovered the following post on the IBM developerWorks AIX forum:
http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14472352&tstart=0
This post highlighted a number of things we needed to check and also confirmed several decisions wed made during the design process, such as SDDPCM not being supported with DS5020 storage and VIOS (this was good as some people were starting to believe we should have installed SDDPCM to resolve this problem. Id only be happy to do this if it was a supported combination and its not).
Finally we found the following IBM tech note that related directly to our issue.
IZ66020: ACTIVE/PASSIVE PCM CONTROLLER
HCHECK SUPPORT
https://www-304.ibm.com/support/docview.wss?uid=isg1IZ66020
The following statement seemed to match our exact problem.
"For active/passive storage device, such as DS3K, DS4K, or DS5K if
complete access is lost to the storage device, then it may take greater than 5 minutes to fail I/O. This feature is for
Active/Passive storage devices, which are running with the AIX Default A/P PCM. This includes DS3K, DS4K, and DS5K family of
devices.
The new feature was described as follows.
Added feature which health checks
controllers, when an enabled path becomes unavailable, due to transport problems.
By default this feature is DISABLED.
To enable this feature set the following ODM attributes for the active/passive storage
device. Enabling this feature, results
in faster I/O failure times.
cntl_delay_time:
Is the amount of time in seconds the storage device's controller(s) will be
health checked after a transport failure. At the end of this period, if
no paths are detected as good, then all pending and subsequent I/O to the
device will be failed, until the device health checker detects a failed path
has returned.
cntl_hcheck_int:
The first controller health check will only be issued after a storage fabric transport failure had been
detected. cntl_hcheck_int is the amount of time in seconds, which the next
controller health check command will be issued. This value must be less than
the cntl_delay_time (unless set to
"0",
disabled).
If you wish to allow the storage device 30 seconds to come back on the fabric (after leaving the fabric), then you can set cntl_delay_time=30 and cntl_hcheck_int=2.
The device, /dev/hdisk#, must not be in use, when setting the ODM values (or
the chdev "-P" option must be used, which requires a
reboot).
CAUTION: There are cases where the storage
device may reboot both of the controllers and become inaccessiblefor
a period of time. If the controller health check sequence is enabled, then this
may result in an I/O failure. It is recommended to to make sure you have
an mirrored
volume
to failover to, if you are running with controller health check enabled (especially
with under 60 second cntl_delay_time).
And as I suspected the issue was related to the type of storage we were using. It appears the I/O delay was attributed to the following attributes on the DS5020 hdisks on the VIOS:
cntl_delay_time 0 Controller
Delay Time
True
cntl_hcheck_int 0 Controller Health Check Interval True
Based on the tech note, I attempted several tests with various values for both parameters e.g.
$ chdev -dev hdiskX -attr
cntl_delay_time=30
$ chdev -dev hdiskX -attr cntl_hcheck_int=2
After making the changes to the hdisks on all VIOS, I performed the same test i.e. disconnected BOTH fibre cables from the first VIOS and continued to write a file to a file system on the client LPARs. By modifying these values on all the DS5020 disks, on all the VIO servers, the I/O delay was reduced to seconds rather than five minutes!
The following attributes were used for the hdisks and adapters in the final configuration.
On the VIO servers:
$ lsdev -type disk | grep DS5020
hdisk3 Available MPIO DS5020 Disk
hdisk4 Available MPIO DS5020 Disk
hdisk5 Available MPIO DS5020 Disk
hdisk6 Available MPIO DS5020 Disk
hdisk7 Available MPIO DS5020 Disk
hdisk8 Available MPIO DS5020 Disk
hdisk9 Available MPIO DS5020 Disk
hdisk10 Available MPIO DS5020 Disk
hdisk11 Available MPIO DS5020 Disk
hdisk12 Available MPIO DS5020 Disk
hdisk13 Available MPIO DS5020 Disk
hdisk14 Available MPIO DS5020 Disk
hdisk15 Available MPIO DS5020 Disk
hdisk16 Available MPIO DS5020 Disk
hdisk17 Available MPIO DS5020 Disk
hdisk18 Available MPIO DS5020 Disk
hdisk19 Available MPIO DS5020 Disk
hdisk20 Available MPIO DS5020 Disk
hdisk21 Available MPIO DS5020 Disk
hdisk22 Available MPIO DS5020 Disk
hdisk23 Available MPIO DS5020 Disk
hdisk24 Available MPIO DS5020 Disk
hdisk25 Available MPIO DS5020 Disk
hdisk26 Available MPIO DS5020 Disk
hdisk27 Available MPIO DS5020 Disk
hdisk28 Available MPIO DS5020 Disk
hdisk29 Available MPIO DS5020 Disk
hdisk30 Available MPIO DS5020 Disk
hdisk31 Available MPIO DS5020 Disk
hdisk32 Available MPIO DS5020 Disk
$ lsdev -dev hdisk3 -attr
attribute value description user_settable
PCM PCM/friend/otherapdisk Path Control Module False
PR_key_value none Persistant Reserve Key Value True
algorithm fail_over Algorithm True
autorecovery no Path/Ownership Autorecovery True
clr_q no Device CLEARS its Queue on error True
cntl_delay_time 15 Controller Delay Time True
cntl_hcheck_int 2 Controller Health Check Interval True
dist_err_pcnt 0 Distributed Error Percentage True
dist_tw_width 50 Distributed Error Sample Time True
hcheck_cmd inquiry Health Check Command True
hcheck_interval 60 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
location Location Label True
lun_id 0x11000000000000 Logical Unit Number ID False
lun_reset_spt yes LUN Reset Supported True
max_retry_delay 60 Maximum Quiesce Time True
max_transfer 0x40000 Maximum TRANSFER Size True
node_name 0x20040080e5187564 FC Node Name False
pvid 00f6482f7869e92d0000000000000000 Physical volume identifier False
q_err yes Use QERR bit True
q_type simple Queuing TYPE True
queue_depth 24 Queue DEPTH True
reassign_to 120 REASSIGN time out value True
reserve_policy no_reserve Reserve Policy True
rw_timeout 30 READ/WRITE time out value True
scsi_id 0x11600 SCSI ID False
start_timeout 60 START unit time out value True
unique_id 3E21360080E50001875DE000005224D2CD63B0F1814 FAStT03IBMfcp Unique device identifier False
ww_name 0x20150080e5187564 FC World Wide Name False
$ lsdev -dev fscsi0 -attr
attribute value description user_settable
attach switch How this adapter is CONNECTED False
dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x11400 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
$ r oem
oem_setup_env
# manage_disk_drivers
Device Present Driver Driver Options
2810XIV AIX_AAPCM AIX_AAPCM,AIX_non_MPIO
DS4100 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4200 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4300 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4500 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4700 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4800 AIX_APPCM AIX_APPCM,AIX_fcparray
DS3950 AIX_APPCM AIX_APPCM
DS5020 AIX_APPCM AIX_APPCM
DS5100/DS5300AIX_APPCM AIX_APPCM AIX_APPCM
DS3500 AIX_APPCM AIX_APPCM
Usage :
manage_disk_drivers [-l]
manage_disk_drivers -d device -o driver_option
For entries with multiple model names use the first one listed.
Ex. DS5100/DS5300 use DS5100.
manage_disk_drivers h
# mpio_get_config -Av
Frame id 0:
Storage Subsystem worldwide name: 609e50018345de00004da7998
Controller count: 2
Partition count: 1
Partition 0:
Storage Subsystem Name = 'MyApp-DS5020'
hdisk LUN # Ownership User Label
hdisk3 17 B (preferred) LPAR1
hdisk4 18 A (preferred) LPAR1datavg
hdisk5 19 A (preferred) LPAR1appvg
hdisk6 16 B (preferred) LPAR2
hdisk7 20 A (preferred) LPAR2datavg
hdisk8 21 B (preferred) LPAR2data3vg
hdisk9 15 A (preferred) LPAR3
hdisk10 14 B (preferred) LPAR4
hdisk11 22 B (preferred) LPAR5
hdisk12 23 A (preferred) LPAR6
hdisk13 13 B (preferred) LPAR6datavg
hdisk14 12 A (preferred) LPAR7
hdisk15 24 B (preferred) LPAR7datavg
hdisk16 25 A (preferred) LPAR8
hdisk17 11 B (preferred) LPAR8datavg
hdisk18 10 A (preferred) LPAR8datavg
hdisk19 9 B (preferred) LPAR8datavg
hdisk20 8 A (preferred) LPAR8datavg
hdisk21 26 A (preferred) LPAR9
hdisk22 7 A (preferred) LPAR9datavg
hdisk23 6 B (preferred) LPAR9datavg
hdisk24 27 B (preferred) LPAR9appvg
hdisk25 28 A (preferred) LPAR9binvg
hdisk26 5 A (preferred) LPAR10
hdisk27 4 B (preferred) LPAR10datavg
hdisk28 29 B (preferred) LPAR10datavg
hdisk29 30 A (preferred) LPAR11
hdisk30 3 A (preferred) LPAR11datavg
hdisk31 32 B (preferred) LPAR12
hdisk32 50 B (preferred) LPAR12datavg
On the VIO clients:
- Changed the following attributes for all virtual SCSI disks to:
# lsattr -El hdisk0
PCM PCM/friend/vscsi Path Control Module False
algorithm fail_over Algorithm True
hcheck_cmd test_unit_rdy Health Check Command True
hcheck_interval 60 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
max_transfer 0x40000 Maximum TRANSFER Size True
pvid 00f6482f7869e92d0000000000000000 Physical volume identifier False
queue_depth 24 Queue DEPTH True
reserve_policy no_reserve Reserve Policy True
- Changed the following attributes for all virtual SCSI adapters to:
# lsattr -El vscsi0
vscsi_err_recov fast_fail N/A True
vscsi_path_to 30 Virtual SCSI Path Timeout True
Note: With regard to SDDPCM support for DS5020 with VIOS, Ive been referring to the following IBM website:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S4000201
The site has a link to the SDDPCM readme file for DS5020 storage (under SDDPCM Package for DS5000):
ftp://ftp.software.ibm.com/storage/subsystem/aix/2.5.2.0/sddpcm.readme.2.5.2.0.txt
The readme states:
Note: VIOS is not supported with SDDPCM on DS4000/DS5000/DS5020/DS3950 subsystem devices.
During our
research we had this confirmed by IBM support that, at this time, SDDPCM
is not supported
on VIOS with DS5000.
I hope this helps others that may be about to implement this type of storage with a VIOS.