Recently I had the pleasure of configuring a couple of POWER7 720s for a customer. Each 720 was to host roughly 12 LPARs each. There were two VIO servers and a NIM server per 720.

Everything went along nicely and according to plan. In a few days we had both systems built. My steps for building the systems were almost identical to those described by Rob McNelly in a recent post.

All four VIO servers were running the latest VIOS code i.e. 2.2.0.10-FP-24 SP-01. All the client LPARs were running AIX 6.1 TL6 SP3. Each VIOS was configured with two FC paths to the SAN and the SAN storage device was an IBM DS5020.

Native AIX MPIO was in use on the VIOS and the AIX LPARs. I did not deploy SDDPCM on the VIOS as this is currently unsupported with the DS5020.

Once the LPAR builds were complete we performed a number of integration tests. These typically involve disconnecting network and SAN cables from each VIOS and observing how the VIOS and LPARs respond to and recover from these types of conditions.

One of the integration tests required that BOTH fibre cables be disconnected from the first VIOS and confirm that the client LPARs were not impacted i.e. that all I/O travelled via the second VIOS.

During the test we notice the following:

I/O on the client LPAR would hang for approximately 5 minutes. Our test was simple enough, create a file in a file system. Eventually the I/O would continue after the 5 minute delay.

The VIOS became sluggish and took some time to respond. The lspath command would take a very long time to return (or in some cases it would never return). During one test, the VIOS actually hung and had to be restarted (however we were not able to reproduce this issue again).

When we reconnected both the cables, the paths would recover on the first VIOS after another 5 minutes. On the client however, it took roughly 20 minutes for the paths to recover. And yes, the hcheck_interval was set to 60 on all disks.

What's even more puzzling was that if we simply rebooted the first VIOS, everything worked as expected i.e. the client LPARs were not impacted, I/O continued as normal and when the first VIOS was back up, the paths on the client LPARs recovered quickly.

After doing some research, we discovered the following post on the IBM developerWorks AIX forum:

http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14472352&tstart=0

This post highlighted a number of things we needed to check and also confirmed several decisions wed made during the design process, such as SDDPCM not being supported with DS5020 storage and VIOS (this was good as some people were starting to believe we should have installed SDDPCM to resolve this problem. Id only be happy to do this if it was a supported combination and its not).

Finally we found the following IBM tech note that related directly to our issue.

IZ66020: ACTIVE/PASSIVE PCM CONTROLLER HCHECK SUPPORT
https://www-304.ibm.com/support/docview.wss?uid=isg1IZ66020

The following statement seemed to match our exact problem.

"For active/passive storage device, such as DS3K, DS4K, or DS5K if complete access is lost to the storage device, then it may take greater than 5 minutes to fail I/O. This feature is for Active/Passive storage devices, which are running with the AIX Default A/P PCM. This includes DS3K, DS4K, and DS5K family of devices.

The new feature was described as follows.

Added feature which health checks controllers, when an enabled path becomes unavailable, due to transport problems. By default this feature is DISABLED. To enable this feature set the following ODM attributes for the active/passive storage device. Enabling this feature, results in faster I/O failure times.

cntl_delay_time:
Is the amount of time in seconds the storage device's controller(s) will be health checked after a transport failure. At the end of this period, if no paths are detected as good, then all pending and subsequent I/O to the device will be failed, until the device health checker detects a failed path has returned.

cntl_hcheck_int:
The first controller health check will only be issued after a storage fabric transport failure had been detected. cntl_hcheck_int is the amount of time in seconds, which the next controller health check command will be issued. This value must be less than the cntl_delay_time (unless set to "0", disabled).

If you wish to allow the storage device 30 seconds to come back on the fabric (after leaving the fabric), then you can set cntl_delay_time=30 and cntl_hcheck_int=2.

The device, /dev/hdisk#, must not be in use, when setting the ODM values (or the chdev "-P" option must be used, which requires a reboot).

CAUTION: There are cases where the storage device may reboot both of the controllers and become inaccessiblefor a period of time. If the controller health check sequence is enabled, then this may result in an I/O failure. It is recommended to to make sure you have an mirrored volume
to failover to, if you are running with controller health check enabled (especially with under 60 second cntl_delay_time).

And as I suspected the issue was related to the type of storage we were using. It appears the I/O delay was attributed to the following attributes on the DS5020 hdisks on the VIOS:

cntl_delay_time 0 Controller Delay Time True
cntl_hcheck_int 0 Controller Health Check Interval True

Based on the tech note, I attempted several tests with various values for both parameters e.g.

$ chdev -dev hdiskX -attr cntl_delay_time=30
$ chdev -dev hdiskX -attr cntl_hcheck_int=2

After making the changes to the hdisks on all VIOS, I performed the same test i.e. disconnected BOTH fibre cables from the first VIOS and continued to write a file to a file system on the client LPARs. By modifying these values on all the DS5020 disks, on all the VIO servers, the I/O delay was reduced to seconds rather than five minutes!

The following attributes were used for the hdisks and adapters in the final configuration.

On the VIO servers:

$ lsdev -type disk | grep DS5020

hdisk3 Available MPIO DS5020 Disk

hdisk4 Available MPIO DS5020 Disk

hdisk5 Available MPIO DS5020 Disk

hdisk6 Available MPIO DS5020 Disk

hdisk7 Available MPIO DS5020 Disk

hdisk8 Available MPIO DS5020 Disk

hdisk9 Available MPIO DS5020 Disk

hdisk10 Available MPIO DS5020 Disk

hdisk11 Available MPIO DS5020 Disk

hdisk12 Available MPIO DS5020 Disk

hdisk13 Available MPIO DS5020 Disk

hdisk14 Available MPIO DS5020 Disk

hdisk15 Available MPIO DS5020 Disk

hdisk16 Available MPIO DS5020 Disk

hdisk17 Available MPIO DS5020 Disk

hdisk18 Available MPIO DS5020 Disk

hdisk19 Available MPIO DS5020 Disk

hdisk20 Available MPIO DS5020 Disk

hdisk21 Available MPIO DS5020 Disk

hdisk22 Available MPIO DS5020 Disk

hdisk23 Available MPIO DS5020 Disk

hdisk24 Available MPIO DS5020 Disk

hdisk25 Available MPIO DS5020 Disk

hdisk26 Available MPIO DS5020 Disk

hdisk27 Available MPIO DS5020 Disk

hdisk28 Available MPIO DS5020 Disk

hdisk29 Available MPIO DS5020 Disk

hdisk30 Available MPIO DS5020 Disk

hdisk31 Available MPIO DS5020 Disk

hdisk32 Available MPIO DS5020 Disk

$ lsdev -dev hdisk3 -attr

attribute value description user_settable

PCM PCM/friend/otherapdisk Path Control Module False

PR_key_value none Persistant Reserve Key Value True

algorithm fail_over Algorithm True

autorecovery no Path/Ownership Autorecovery True

clr_q no Device CLEARS its Queue on error True

cntl_delay_time 15 Controller Delay Time True

cntl_hcheck_int 2 Controller Health Check Interval True

dist_err_pcnt 0 Distributed Error Percentage True

dist_tw_width 50 Distributed Error Sample Time True

hcheck_cmd inquiry Health Check Command True

hcheck_interval 60 Health Check Interval True

hcheck_mode nonactive Health Check Mode True

location Location Label True

lun_id 0x11000000000000 Logical Unit Number ID False

lun_reset_spt yes LUN Reset Supported True

max_retry_delay 60 Maximum Quiesce Time True

max_transfer 0x40000 Maximum TRANSFER Size True

node_name 0x20040080e5187564 FC Node Name False

pvid 00f6482f7869e92d0000000000000000 Physical volume identifier False

q_err yes Use QERR bit True

q_type simple Queuing TYPE True

queue_depth 24 Queue DEPTH True

reassign_to 120 REASSIGN time out value True

reserve_policy no_reserve Reserve Policy True

rw_timeout 30 READ/WRITE time out value True

scsi_id 0x11600 SCSI ID False

start_timeout 60 START unit time out value True

unique_id 3E21360080E50001875DE000005224D2CD63B0F1814 FAStT03IBMfcp Unique device identifier False

ww_name 0x20150080e5187564 FC World Wide Name False

$ lsdev -dev fscsi0 -attr

attribute value description user_settable

attach switch How this adapter is CONNECTED False

dyntrk yes Dynamic Tracking of FC Devices True

fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True

scsi_id 0x11400 Adapter SCSI ID False

sw_fc_class 3 FC Class for Fabric True

$ r oem

oem_setup_env

# manage_disk_drivers

Device Present Driver Driver Options

2810XIV AIX_AAPCM AIX_AAPCM,AIX_non_MPIO

DS4100 AIX_APPCM AIX_APPCM,AIX_fcparray

DS4200 AIX_APPCM AIX_APPCM,AIX_fcparray

DS4300 AIX_APPCM AIX_APPCM,AIX_fcparray

DS4500 AIX_APPCM AIX_APPCM,AIX_fcparray

DS4700 AIX_APPCM AIX_APPCM,AIX_fcparray

DS4800 AIX_APPCM AIX_APPCM,AIX_fcparray

DS3950 AIX_APPCM AIX_APPCM

DS5020 AIX_APPCM AIX_APPCM

DS5100/DS5300AIX_APPCM AIX_APPCM AIX_APPCM

DS3500 AIX_APPCM AIX_APPCM

Usage :

manage_disk_drivers [-l]

manage_disk_drivers -d device -o driver_option

For entries with multiple model names use the first one listed.

Ex. DS5100/DS5300 use DS5100.

manage_disk_drivers h

# mpio_get_config -Av

Frame id 0:

Storage Subsystem worldwide name: 609e50018345de00004da7998

Controller count: 2

Partition count: 1

Partition 0:

Storage Subsystem Name = 'MyApp-DS5020'

hdisk LUN # Ownership User Label

hdisk3 17 B (preferred) LPAR1

hdisk4 18 A (preferred) LPAR1datavg

hdisk5 19 A (preferred) LPAR1appvg

hdisk6 16 B (preferred) LPAR2

hdisk7 20 A (preferred) LPAR2datavg

hdisk8 21 B (preferred) LPAR2data3vg

hdisk9 15 A (preferred) LPAR3

hdisk10 14 B (preferred) LPAR4

hdisk11 22 B (preferred) LPAR5

hdisk12 23 A (preferred) LPAR6

hdisk13 13 B (preferred) LPAR6datavg

hdisk14 12 A (preferred) LPAR7

hdisk15 24 B (preferred) LPAR7datavg

hdisk16 25 A (preferred) LPAR8

hdisk17 11 B (preferred) LPAR8datavg

hdisk18 10 A (preferred) LPAR8datavg

hdisk19 9 B (preferred) LPAR8datavg

hdisk20 8 A (preferred) LPAR8datavg

hdisk21 26 A (preferred) LPAR9

hdisk22 7 A (preferred) LPAR9datavg

hdisk23 6 B (preferred) LPAR9datavg

hdisk24 27 B (preferred) LPAR9appvg

hdisk25 28 A (preferred) LPAR9binvg

hdisk26 5 A (preferred) LPAR10

hdisk27 4 B (preferred) LPAR10datavg

hdisk28 29 B (preferred) LPAR10datavg

hdisk29 30 A (preferred) LPAR11

hdisk30 3 A (preferred) LPAR11datavg

hdisk31 32 B (preferred) LPAR12

hdisk32 50 B (preferred) LPAR12datavg

On the VIO clients:

- Changed the following attributes for all virtual SCSI disks to:

# lsattr -El hdisk0

PCM PCM/friend/vscsi Path Control Module False

algorithm fail_over Algorithm True

hcheck_cmd test_unit_rdy Health Check Command True

hcheck_interval 60 Health Check Interval True

hcheck_mode nonactive Health Check Mode True

max_transfer 0x40000 Maximum TRANSFER Size True

pvid 00f6482f7869e92d0000000000000000 Physical volume identifier False

queue_depth 24 Queue DEPTH True

reserve_policy no_reserve Reserve Policy True

- Changed the following attributes for all virtual SCSI adapters to:

# lsattr -El vscsi0

vscsi_err_recov fast_fail N/A True

vscsi_path_to 30 Virtual SCSI Path Timeout True

Note: With regard to SDDPCM support for DS5020 with VIOS, Ive been referring to the following IBM website:

http://www-01.ibm.com/support/docview.wss?uid=ssg1S4000201

The site has a link to the SDDPCM readme file for DS5020 storage (under SDDPCM Package for DS5000):

ftp://ftp.software.ibm.com/storage/subsystem/aix/2.5.2.0/sddpcm.readme.2.5.2.0.txt

The readme states:

Note: VIOS is not supported with SDDPCM on DS4000/DS5000/DS5020/DS3950 subsystem devices.

During our research we had this confirmed by IBM support that, at this time, SDDPCM is not supported
on VIOS with DS5000.

I hope this helps others that may be about to implement this type of storage with a VIOS.