Queue depth setting in a Versioned WPAR on AIXOne of my customers was configuring a new AIX 5.3 Versioned WPAR when they came across a very interesting issue. I thought I’d share the experience here, just in case anyone else comes across the problem. We configured the VWPAR to host an old application. The setup was relatively straight forward, restore the AIX 5.3 mksysb into the VWPAR and export the data disk from the Global into the VWPAR, import the volume group and mount the file systems. Job done! However, we noticed some fairly poor performance during application load tests. After some investigation we discovered that disk I/O performance was worse in the VWPAR than on the source LPAR. The question was, why?
We initially suspected the customers SAN and/or the storage subsystem, but both of these came back clean with no errors or configuration issues. In the end, the problem was related to a lack of ODM attributes in the PdAt object class, which prevented the VWPAR disk from using the correct queue depth setting.
Let me explain by demonstrating the problem and the workaround.
First, let’s add a new disk to a VWPAR. This will be used for a data volume group and file system. The disk in question is hdisk3.
# uname -W 0
# lsdev -Cc disk hdisk0 Available Virtual SCSI Disk Drive hdisk1 Available Virtual SCSI Disk Drive hdisk2 Defined Virtual SCSI Disk Drive hdisk3 Available Virtual SCSI Disk Drive <<<<<<
We set the disk queue depth to an appropriate number, in this case 256. Note: This value will differ depending on the storage subsystem type, so check with your storage team and/or vendor for the best setting for your environment.
# chdev -l hdisk3 -a queue_depth=256 hdisk3 changed
Using the lsattr command, we verify that the queue depth attribute is set correctly in both the ODM and the AIX kernel.
# lsattr -El hdisk3 -a queue_depth queue_depth 256 Queue DEPTH True
# lsattr -Pl hdisk3 -a queue_depth queue_depth 256 Queue DEPTH True
We can also use kdb to verify the setting in the kernel. Remember at this stage, we are concentrating on hdisk3, which is referenced with a specific kernel device address in kdb.
# echo scsidisk | kdb START END <name> 0000000000001000 0000000005840000 start+000FD8 F00000002FF47600 F00000002FFDF9C8 __ublock+000000 000000002FF22FF4 000000002FF22FF8 environ+000000 000000002FF22FF8 000000002FF22FFC errno+000000 F1000F0A00000000 F1000F0A10000000 pvproc+000000 F1000F0A10000000 F1000F0A18000000 pvthread+000000 read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01E4E20 (0)> scsidisk "scsidisk_list" address...[0x0] NAME ADDRESS STATE CMDS_OUT CURRBUF LOW hdisk0 0xF1000A01505DC000 0x00000002 0x0000 0x0000000000000000 0x0 hdisk0 0xF1000A01C0148000 0x00000002 0x0000 0x0000000000000000 0x0 hdisk1 0xF1000A01505D4000 0x00000002 0x0000 0x0000000000000000 0x0
hdisk3
# echo scsidisk 0xF1000A01C014C000 | kdb | grep queue_depth ushort queue_depth = 0x100;
From the output above, we can see that the queue depth is correctly i.e. set to 0x100 in Hex (256 in decimal).
Next, we export hdisk3 to the VWPAR using the chwpar command. The disk, as expected, enters a Defined state in the Global environment. It is known as hdisk1 in the VWPAR.
# chwpar -D devname=hdisk3 p8wpar1
# lswpar -D p8wpar1 | head -2 ; lswpar -D p8wpar1 | grep hdisk Name Device Name Type Virtual Device RootVG Status
---- p8wpar1 hdisk3 disk hdisk1 no EXPORTED <<<<<< p8wpar1 hdisk2 disk hdisk0 yes EXPORTED
[root@gibopvc1]/ # lsdev -Cc disk hdisk0 Available Virtual SCSI Disk Drive hdisk1 Available Virtual SCSI Disk Drive hdisk2 Defined Virtual SCSI Disk Drive hdisk3 Defined Virtual SCSI Disk Drive
In the VWPAR, we run cfgmgr to discover the disk. We create a new data volume group (datavg) and file system (datafs) for application use (note: the steps to create the VG and FS are not shown below). This is for demonstration purposes only; the customer imported the data volume groups on their system.
# clogin p8wpar1
****
*
* Welcome to AIX Version 5.3!
* * Please see the README file in /usr/lpp/bos for information pertinent to *
* this release of the AIX Operating Syst
*
**** Last login: Sun Apr 26 21:27:03 2015 on /dev/Global from
# uname -W 11
# lspv
hdisk0 00f9
# cfgmgr ; lspv
hdisk0 00f9
hdisk1 none
# lsvg rootvg datavg
# lsvg -l datavg datavg: LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT datalv jfs2 1024 1024 1 open/syncd /datafs loglv00 jfs2log 1 1 1 open/syncd N/A
We perform a very simple I/O test in the /datafs file system. We write/create a 1GB file and time the execution. We noticed immediately that the task took longer than expected.
# cd /datafs # time lmktemp Afile 1024M Afile
real 0m7.22s <<<<<<<<<<<<<<< SLOW? user 0m0.04s sys 0m1.36s
We ran the iostat command, from the Global environment, and noticed that “serv qfull” was constantly non-zero (very large numbers) for hdisk3. Essentially the hdisk queue was full all the time. This was bad and unexpected, given the queue depth setting of 256!
# iostat -DTRl 1
System configuration: lcpu=8 drives=4 paths=8 vdisks=2
Disk
-------------- ---- %tm bps tps bread bwrtn rps avg min max time fail wps avg min max time fail avg min max avg avg serv
ac hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:53 hdisk0 0.0 4.1K 1.0 0.0 4.1K 0.0 0.0 0.0 0.0 0 0 1.0 0.3 0.3 0.3 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:53 hdisk2 1.0 12.3K 3.0 0.0 12.3K 0.0 0.0 0.0 0.0 0 0 3.0 0.5 0.3 0.6 0 0 0.4 0.0 0.9 0.0 0.0 3.0 03:11:53 hdisk3 100.0 140.5M 1072.0 0.0 140.5M 0.0 0.0 0.0 0.0 0 0 1072.0 0.9 0.8 24.0 0 0 176.7 107.5 279.8 152.0 0.0 1072.0 03:11:53
Disk
-------------- ---- %tm bps tps bread bwrtn rps avg min max time fail wps avg min max time fail avg min max avg avg serv
act hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:54 hdisk0 0.0 4.1K 1.0 4.1K 0.0 1.0 0.3 0.3 0.3 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:54 hdisk2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:54 hdisk3 100.0 154.0M 1175.0 0.0 154.0M 0.0 0.0 0.0 0.0 0 0 1175.0 0.8 0.8 1.8 0 0 161.9 108.0 217.7 303.0 1.0 1175.0 03:11:54
Disk
-------------- ---- %tm bps tps bread bwrtn rps avg min max time fail wps avg min max time fail avg min max avg avg serv
act hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:55 hdisk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:55 hdisk2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:55 hdisk3 100.0 150.3M 1147.0 0.0 150.3M 0.0 0.0 0.0 0.0 0 0 1147.0 0.9 0.8 10.6 0 0 165.9 108.5 239.4 304.0 1.0 1147.0 03:11:55
Disk
-------------- ---- %tm bps tps bread bwrtn rps avg min max time fail wps avg min max time fail avg min max avg avg serv
act hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:56 hdisk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:56 hdisk2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:56 hdisk3 100.0 155.5M 1186.0 0.0 155.5M 0.0 0.0 0.0 0.0 0 0 1186.0 0.8 0.8 1.8 0 0 161.6 106.8 217.8 307.0 1.0 1186.0 03:11:56
Disk
-------------- ---- %tm bps tps bread bwrtn rps avg min max time fail wps avg min max time fail avg min max avg avg serv
act hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:57 hdisk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:57 hdisk2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:11:57 hdisk3 67.0 102.2M 780.0 0.0 102.2M 0.0 0.0 0.0 0.0 0 0 780.0 0.9 0.8 11.8 0 0 166.1 106.3 232.8 53.0 0.0 779.0 03:11:57
Now comes the interesting part. With a little help from our friends in IBM support, using kdb we found that the queue depth was reported as being set to 1 in the kernel and not 256! You’ll also notice here that the hdisk name has changed from hdisk3 to hdisk1. This happened as a result of exporting hdisk3 to the VWPAR. The disk is known as hdisk1 in the VWPAR (not hdisk3) but the kernel address is the same.
# echo scsidisk | kdb START END <name> 0000000000001000 0000000005840000 start+000FD8 F00000002FF47600 F00000002FFDF9C8 __ublock+000000 000000002FF22FF4 000000002FF22FF8 environ+000000 000000002FF22FF8 000000002FF22FFC errno+000000 F1000F0A00000000 F1000F0A10000000 pvproc+000000 F1000F0A10000000 F1000F0A18000000 pvthread+000000 read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01E4E20 (0)> scsidisk "scsidisk_list" address...[0x0] NAME ADDRESS STATE CMDS_OUT CURRBUF LOW hdisk0 0xF1000A01505DC000 0x00000002 0x0000 0x0000000000000000 0x0 hdisk0 0xF1000A01C0148000 0x00000002 0x0000 0x0000000000000000 0x0 hdisk1 0xF1000A01505D4000 0x00000002 0x0000 0x0000000000000000 0x0 hdisk1 0xF1000A01C014C000 0x00000001 0x0000 0x0000000000000000 0x0 <<<<
# echo scsidisk 0xF1000A01C014C000 | kdb | grep queue_depth
ushort queue_depth = 0x1;
# lsattr -Pl hdisk3 -a queue_depth queue_depth 256 Queue DEPTH True
The kdb output above proved that the queue depth was set to 1 in the VWPAR. Even though the ODM still had the attribute set to 256 (in both the Global and VWPAR environments).
We discovered that this behaviour was a bug.
IV63665: CAN'T SET THE QUEUE_DEPTH IN A VERSION WPAR.
http
APAR status
Closed as fixed if next.
Error description
When a customer run the following in a version WPAR such as 5.2 or 5.3. The disk in the kernel will not have the queue depth set in kernel.
1. chdev -P -l hdisk5 -a queue_depth=8 2. odmget CuAt | grep -p hdisk5 | less
CuAt: name = "hdisk5" attribute = "queue_depth" value = "8" type = "R" generic = "UD" rep = "nr" nls_index = 30
3. stopwpar -N <wpar_name>
The customer then will see that the performance is still the same. When a perfpmr is sent in from the global lpar you will see that the queue_depth is set to 1 for all of the disk exported to the wpar.
Local fix
N/A
Problem summary
When a customer run the following in a version WPAR such as 5.2 or 5.3. The disk in the kernel will not have the queue depth set in kernel.
Fortunately, IBM support was able to provide us with a workaround. The first step was to add the missing vparent PdAt entry to the ODM in the Global environment.
# cat addo PdAt:
uniquetype = "wio attribute = "naca_1_spt" deflt = "1" values = "1" width = "" type = "R" generic = "" rep = "n" nls_index = 0
# odmadd addo
# odmget PdAt | grep -p "wio PdAt:
uniquetype = "wio attribute = "naca_1_spt" deflt = "1" values = "1" width = "" type = "R" generic = "" rep = "n" nls_index = 0
We did the same in the VWPAR.
# clogin p8wpar1 # uname -W 11
# odmget PdAt | grep -p "wio #
# odmadd addo
# odmget PdAt | grep -p "wio PdAt:
uniquetype = "wio attribute = "naca_1_spt" deflt = "1" values = "1" width = "" type = "R" generic = "" rep = "n" nls_index = 0
In the VWPAR, we removed the hdisk and then discovered it again, ensuring that the queue depth attribute was set to 256 in the ODM.
# uname –W 11 # rmdev -dl hdisk1 hdisk1 deleted # cfgmgr # lspv
hdisk0 00f9
hdisk1 none # lsattr -El hdisk1 –a queue_depth
queue_depth 256 Queue DEPT # odmget CuAt | grep -p queue CuAt: name = "hdisk1" attribute = "queue_depth" value = "256" type = "R" generic = "UD" rep = "nr" nls_index = 12
Back in the Global environment we checked that the queue depth was set correctly in the kernel. And it was!
# uname -W 0 # echo scsidisk 0xF1000A01C014C000 | kdb | grep queue_depth ushort queue_depth = 0x100;
We re-ran the simple I/O test and immediately found that the test ran faster and the hdisk queue (for hdisk3, as shown by iostat from the Global environment) was no longer full. Subsequent application load tests showed much better performance.
# time lmktemp Afile 1024M Afile
real 0m3.15s <<<< BETTER! user 0m0.03s sys 0m1.60s
# iostat -DTRl 1
Disk
-------------- ---- %tm bps tps bread bwrtn rps avg min max time fail wps avg min max time fail avg min max avg avg serv
act hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:20:14 hdisk0 0.0 4.1K 1.0 4.1K 0.0 1.0 0.2 0.2 0.2 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:20:14 hdisk2 0.0 20.5K 5.0 20.5K 0.0 5.0 0.2 0.2 0.4 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 5.0 03:20:14 hdisk3 86.0 280.6M 2144.0 0.0 280.6M 0.0 0.0 0.0 0.0 0 0 2144.0 1.2 0.3 2.9 0 0 0.0 0.0 0.0 0.0 4.0 0.0 03:20:14
Disk
-------------- ---- %tm bps tps bread bwrtn rps avg min max time fail wps avg min max time fail avg min max avg avg serv
act hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:20:15 hdisk0 0.0 8.2K 2.0 0.0 8.2K 0.0 0.0 0.0 0.0 0 0 2.0 1.2 0.6 1.9 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:20:15 hdisk2 0.0 4.1K 1.0 0.0 4.1K 0.0 0.0 0.0 0.0 0 0 1.0 0.6 0.6 0.6 0 0 0.0 0.0 0.0 0.0 0.0 1.0 03:20:15 hdisk3 100.0 327.0M 2495.0 0.0 327.0M 0.0 0.0 0.0 0.0 0 0 2495.0 1.3 0.8 10.0 0 0 0.0 0.0 0.0 0.0 2.0 0.0 03:20:15
Disk
-------------- ---- %tm bps tps bread bwrtn rps avg min max time fail wps avg min max time fail avg min max avg avg serv
act hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:20:16 hdisk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:20:16 hdisk2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:20:16 hdisk3 100.0 354.3M 2703.0 0.0 354.3M 0.0 0.0 0.0 0.0 0 0 2703.0 2.1 0.9 16.1 0 0 0.0 0.0 0.0 0.0 5.0 0.0 03:20:16
Disk
-------------- ---- %tm bps tps bread bwrtn rps avg min max time fail wps avg min max time fail avg min max avg avg serv
act hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:20:17 hdisk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:20:17 hdisk2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 03:20:17 hdisk3 32.0 111.7M 852.0 0.0 111.7M 0.0 0.0 0.0 0.0 0 0 852.0 1.3 0.9 2.8 0 0 0.0 0.0 0.0 0.0 1.0 0.0 03:20:17
Please Note: kdb will not work inside a WPAR. If you attempt to run it, you’ll receive the following error message. Run kdb from the Global environment instead.
# clogin p8wpar1 # kdb The specified kernel file is a 64-bit kernel open: A file or directory in the path name does not exist. cannot open /dev/pmem
|