Virtual I/O Server and DS5020 disk.Recently I had the pleasure of configuring a couple of POWER7 720s for a customer. Each 720 was to host roughly 12 LPARs each. There were two VIO servers and a NIM server per 720.
Everything
went along nicely and according to plan. In a few days we had both systems
built. My steps for building the systems were almost identical to those
described by Rob McNelly in a recent post. All four VIO servers were running the latest VIOS code i.e. 2.2.0.10-FP-24 SP-01. All the client LPARs were running AIX 6.1 TL6 SP3. Each VIOS was configured with two FC paths to the SAN and the SAN storage device was an IBM DS5020.
Native AIX
MPIO was in use on the VIOS and the AIX LPARs. I did not deploy SDDPCM on the
VIOS as this is currently unsupported with the DS5020. Once the LPAR builds were complete we performed a number of “integration tests”. These typically involve disconnecting network and SAN cables from each VIOS and observing how the VIOS and LPARs respond to and recover from these types of conditions.
One of the
integration tests required that BOTH fibre cables be disconnected from the
first VIOS and confirm that the client LPARs were not impacted i.e. that all
I/O travelled via the second VIOS.
After doing some research, we discovered the following post on the IBM developerWorks AIX forum:
http
This post highlighted a number of things we needed to check and also confirmed several decisions we’d made during the design process, such as SDDPCM not being supported with DS5020 storage and VIOS (this was good as some people were starting to believe we should have installed SDDPCM to resolve this problem. I’d only be happy to do this if it was a supported combination and it’s not).
Finally we found the following IBM tech note that related directly to our issue.
The following statement seemed to match our exact problem.
"For active/passive storage device, such as DS3K, DS4K, or DS5K if
complete access is lost to the storage device, then it may take greater than 5 minutes to fail I/O. This feature is for
Active/Passive storage devices, which are running with the AIX Default A/P PCM. This includes DS3K, DS4K, and DS5K family of
devices.” The new feature was described as follows.
“Added feature which health checks
controllers, when an enabled path becomes unavailable, due to transport problems.
By default this feature is DISABLED.
To enable this feature set the following ODM attributes for the active/passive storage
device. Enabling this feature, results
in faster I/O failure time cntl If you wish to allow the storage
device 30 seconds to come back on the fabric (after leaving the fabric), then
you can set cntl_delay_time=30 and
cntl
And as I suspected the issue was related to the type of storage we were using. It appears the I/O delay was attributed to the following attributes on the DS5020 hdisks on the VIOS:
cntl_delay_time 0 Controller
Delay Time
True
Based on the tech note, I attempted several tests with various values for both parameters e.g.
$ chdev -dev hdiskX -attr
cntl
After making the changes to the hdisks on all VIOS, I performed the same test i.e. disconnected BOTH fibre cables from the first VIOS and continued to write a file to a file system on the client LPARs. By modifying these values on all the DS5020 disks, on all the VIO servers, the I/O delay was reduced to seconds rather than five minutes!
The following attributes were used for the hdisks and adapters in the final configuration.
On the VIO servers:
$ lsdev -type disk | grep DS5020 hdisk3 Available MPIO DS5020 Disk hdisk4 Available MPIO DS5020 Disk hdisk5 Available MPIO DS5020 Disk hdisk6 Available MPIO DS5020 Disk hdisk7 Available MPIO DS5020 Disk hdisk8 Available MPIO DS5020 Disk hdisk9 Available MPIO DS5020 Disk hdisk10 Available MPIO DS5020 Disk hdisk11 Available MPIO DS5020 Disk hdisk12 Available MPIO DS5020 Disk hdisk13 Available MPIO DS5020 Disk hdisk14 Available MPIO DS5020 Disk hdisk15 Available MPIO DS5020 Disk hdisk16 Available MPIO DS5020 Disk hdisk17 Available MPIO DS5020 Disk hdisk18 Available MPIO DS5020 Disk hdisk19 Available MPIO DS5020 Disk hdisk20 Available MPIO DS5020 Disk hdisk21 Available MPIO DS5020 Disk hdisk22 Available MPIO DS5020 Disk hdisk23 Available MPIO DS5020 Disk hdisk24 Available MPIO DS5020 Disk hdisk25 Available MPIO DS5020 Disk hdisk26 Available MPIO DS5020 Disk hdisk27 Available MPIO DS5020 Disk hdisk28 Available MPIO DS5020 Disk hdisk29 Available MPIO DS5020 Disk hdisk30 Available MPIO DS5020 Disk hdisk31 Available MPIO DS5020 Disk hdisk32 Available MPIO DS5020 Disk
$ lsdev -dev hdisk3 -attr attribute
valu
PCM
PCM/ PR_key_value
non algorithm
fail autorecovery
n clr_q
n cntl_delay_time 1 cntl_hcheck_int 2 dist_err_pcnt
0 dist_tw_width
5 hcheck_cmd
inquiry hcheck_interval 6 hcheck_mode
nona loca lun_id
0x11 lun_reset_spt
ye max_retry_delay 6 max_transfer
0x40 node_name
0x20 pvid
00f6 q_err
ye q_type
simp queue_depth 2 reassign_to
12 reserve_policy no_r rw_timeout
3 scsi_id
0x11 start_timeout
6 unique_id
3E21 ww_name
0x20
$ lsdev -dev fscsi0 -attr attribute
value desc
attach switch How this adapter is CONNECTED False dyntrk yes Dynamic Tracking of FC Devices True fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True scsi_id
0x11400 Adapter SCSI I sw_fc_class
3 FC Class for Fabr
$ r oem oem_setup_env # manage_disk_drivers Device Present Driver Driver Options 2810XIV
AIX_AAPCM
AIX_ DS4100
AIX_APPCM
AIX_ DS4200
AIX_APPCM
AIX_ DS4300
AIX_APPCM
AIX_ DS4500
AIX_APPCM
AIX_ DS4700
AIX_APPCM
AIX_ DS4800
AIX_APPCM
AIX_ DS3950 AIX_APPCM AIX_APPCM DS5020 AIX_APPCM AIX_APPCM DS51 DS3500 AIX_APPCM AIX_APPCM Usage : manage_disk_drivers [-l] manage_disk_drivers -d device -o driver_option For entries with multiple model names use the first one listed. Ex. DS5100/DS5300 use DS5100. manage_disk_drivers –h
# mpio_get_config -Av Frame id 0: Storage
Subsystem worldwide name: 609e Controller count: 2 Partition count: 1 Partition 0: Storage Subsystem Name = 'MyApp-DS5020' hdisk LUN # Ownership User Label hdisk3 17 B (preferred) LPAR1 hdisk4 18 A (preferred) LPAR1datavg hdisk5 19 A (preferred) LPAR1appvg hdisk6 16 B (preferred) LPAR2 hdisk7 20 A (preferred) LPAR2datavg hdisk8 21 B (preferred) LPAR2data3vg hdisk9 15 A (preferred) LPAR3 hdisk10 14 B (preferred) LPAR4 hdisk11 22 B (preferred) LPAR5 hdisk12 23 A (preferred) LPAR6 hdisk13 13 B (preferred) LPAR6datavg hdisk14 12 A (preferred) LPAR7 hdisk15 24 B (preferred) LPAR7datavg hdisk16 25 A (preferred) LPAR8 hdisk17 11 B (preferred) LPAR8datavg hdisk18 10 A (preferred) LPAR8datavg hdisk19 9 B (preferred) LPAR8datavg hdisk20 8 A (preferred) LPAR8datavg hdisk21 26 A (preferred) LPAR9 hdisk22 7 A (preferred) LPAR9datavg hdisk23 6 B (preferred) LPAR9datavg hdisk24 27 B (preferred) LPAR9appvg hdisk25 28 A (preferred) LPAR9binvg hdisk26 5 A (preferred) LPAR10 hdisk27 4 B (preferred) LPAR10datavg hdisk28 29 B (preferred) LPAR10datavg hdisk29 30 A (preferred) LPAR11 hdisk30 3 A (preferred) LPAR11datavg hdisk31 32 B (preferred) LPAR12 hdisk32 50 B (preferred) LPAR12datavg
On the VIO clients:
- Changed the following attributes for all virtual SCSI disks to:
# lsattr -El hdisk0 PCM
PCM/ algorithm
fail hcheck_cmd
test hcheck_interval 60 hcheck_mode
nona max_transfer
0x40 pvid
00f6 queue_depth 2 reserve_policy
no_r
- Changed the following attributes for all virtual SCSI adapters to:
# lsattr -El vscsi0 vscsi_err_recov fast_fail N/ vscsi_path_to 30 Virtual SCSI Path Timeout True
Note: With regard to SDDPCM support for DS5020 with VIOS, I’ve been referring to the following IBM website:
http
The site has a link to the SDDPCM readme file for DS5020 storage (under SDDPCM Package for DS5000):
ftp:
The readme states:
Note: VIOS
is not supported with SDDPCM on DS40
During our
research we had this confirmed by IBM support that, at this time, SDDPCM
is not supported
I hope this helps others that may be about to implement this type of storage with a VIOS. |
good piece this, I am doing this sorta work this week-end, nice to see it
Chris, thanks for sharing this information! All the best, MarkD:-)