One of my customers was configuring a new AIX 5.3 Versioned WPAR when they came across a very interesting issue. I thought I’d share the experience here, just in case anyone else comes across the problem. We configured the VWPAR to host an old application. The setup was relatively straight forward, restore the AIX 5.3 mksysb into the VWPAR and export the data disk from the Global into the VWPAR, import the volume group and mount the file systems. Job done!  However, we noticed some fairly poor performance during application load tests. After some investigation we discovered that disk I/O performance was worse in the VWPAR than on the source LPAR.  The question was, why?

 

We initially suspected the customers SAN and/or the storage subsystem, but both of these came back clean with no errors or configuration issues.  In the end, the problem was related to a lack of ODM attributes in the PdAt object class, which prevented the VWPAR disk from using the correct queue depth setting.

 

Let me explain by demonstrating the problem and the workaround.

 

First, let’s add a new disk to a VWPAR. This will be used for a data volume group and file system. The disk in question is hdisk3.

 

# uname -W

0

 

# lsdev -Cc disk

hdisk0 Available  Virtual SCSI Disk Drive

hdisk1 Available  Virtual SCSI Disk Drive

hdisk2 Defined    Virtual SCSI Disk Drive

hdisk3 Available  Virtual SCSI Disk Drive  <<<<<<

 

We set the disk queue depth to an appropriate number, in this case 256.

Note: This value will differ depending on the storage subsystem type, so check with your storage team and/or vendor for the best setting for your environment.

 

# chdev -l hdisk3 -a queue_depth=256

hdisk3 changed

 

Using the lsattr command, we verify that the queue depth attribute is set correctly in both the ODM and the AIX kernel.

 

# lsattr -El hdisk3 -a queue_depth

queue_depth 256 Queue DEPTH True

 

# lsattr -Pl hdisk3 -a queue_depth

queue_depth 256 Queue DEPTH True

 

We can also use kdb to verify the setting in the kernel. Remember at this stage, we are concentrating on hdisk3, which is referenced with a specific kernel device address in kdb.

 

# echo scsidisk | kdb

           START              END <name>

0000000000001000 0000000005840000 start+000FD8

F00000002FF47600 F00000002FFDF9C8 __ublock+000000

000000002FF22FF4 000000002FF22FF8 environ+000000

000000002FF22FF8 000000002FF22FFC errno+000000

F1000F0A00000000 F1000F0A10000000 pvproc+000000

F1000F0A10000000 F1000F0A18000000 pvthread+000000

read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01E4E20

(0)> scsidisk

"scsidisk_list" address...[0x0]

NAME            ADDRESS             STATE   CMDS_OUT  CURRBUF             LOW

hdisk0          0xF1000A01505DC000  0x00000002  0x0000    0x0000000000000000  0x0

hdisk0          0xF1000A01C0148000  0x00000002  0x0000    0x0000000000000000  0x0

hdisk1          0xF1000A01505D4000  0x00000002  0x0000    0x0000000000000000  0x0

hdisk3          0xF1000A01C014C000  0x00000001  0x0000    0x0000000000000000  0x0  <<<<<<<

 

# echo scsidisk 0xF1000A01C014C000 | kdb | grep queue_depth

    ushort queue_depth   = 0x100;

 

From the output above, we can see that the queue depth is correctly i.e. set to 0x100 in Hex (256 in decimal).

 

Next, we export hdisk3 to the VWPAR using the chwpar command. The disk, as expected, enters a Defined state in the Global environment. It is known as hdisk1 in the VWPAR.

 

# chwpar -D devname=hdisk3 p8wpar1

 

# lswpar -D p8wpar1 | head -2 ; lswpar -D p8wpar1 | grep hdisk

Name     Device Name      Type    Virtual Device  RootVG  Status

-------------------------------------------------------------------

p8wpar1  hdisk3           disk    hdisk1          no      EXPORTED <<<<<<

p8wpar1  hdisk2           disk    hdisk0          yes     EXPORTED

 

[root@gibopvc1]/ # lsdev -Cc disk

hdisk0 Available  Virtual SCSI Disk Drive

hdisk1 Available  Virtual SCSI Disk Drive

hdisk2 Defined    Virtual SCSI Disk Drive

hdisk3 Defined    Virtual SCSI Disk Drive

 

In the VWPAR, we run cfgmgr to discover the disk. We create a new data volume group (datavg) and file system (datafs) for application use (note: the steps to create the VG and FS are not shown below). This is for demonstration purposes only; the customer imported the data volume groups on their system.

 

# clogin p8wpar1

*******************************************************************************

*                                                                             *

*  Welcome to AIX Version 5.3!                                                *

*                                                                             *

*  Please see the README file in /usr/lpp/bos for information pertinent to    *

*  this release of the AIX Operating System.                                  *

*                                                                             *

*******************************************************************************

Last login: Sun Apr 26 21:27:03 2015 on /dev/Global from

 

# uname -W

11

 

# lspv

hdisk0          00f94f58a0b98ca2                    rootvg          active

 

# cfgmgr ; lspv

hdisk0          00f94f58a0b98ca2                    rootvg          active

hdisk1          none                                None

 

# lsvg

rootvg

datavg

 

# lsvg -l datavg

datavg:

LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT

datalv              jfs2       1024    1024    1    open/syncd    /datafs

loglv00             jfs2log    1       1       1    open/syncd    N/A

 

We perform a very simple I/O test in the /datafs file system. We write/create a 1GB file and time the execution. We noticed immediately that the task took longer than expected.

 

# cd /datafs

# time lmktemp Afile 1024M

Afile

 

real    0m7.22s <<<<<<<<<<<<<<< SLOW?

user    0m0.04s

sys     0m1.36s

 

We ran the iostat command, from the Global environment, and noticed that “serv qfull” was constantly non-zero (very large numbers) for hdisk3. Essentially the hdisk queue was full all the time. This was bad and unexpected, given the queue depth setting of 256!

 

# iostat -DTRl 1

 

System configuration: lcpu=8 drives=4 paths=8 vdisks=2

 

Disks:                     xfers                                read                                write                                  queue                    time

-------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------

                 %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max   avg   avg  serv

                 act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull

hdisk1           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:53

hdisk0           0.0   4.1K   1.0   0.0    4.1K   0.0   0.0    0.0    0.0     0    0   1.0   0.3    0.3    0.3     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:53

hdisk2           1.0  12.3K   3.0   0.0   12.3K   0.0   0.0    0.0    0.0     0    0   3.0   0.5    0.3    0.6     0    0   0.4    0.0    0.9    0.0   0.0   3.0  03:11:53

hdisk3         100.0 140.5M 1072.0   0.0  140.5M   0.0   0.0    0.0    0.0     0    0 1072.0   0.9    0.8   24.0     0    0 176.7  107.5  279.8  152.0   0.0 1072.0  03:11:53

 

Disks:                     xfers                                read                                write                                  queue                    time

-------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------

                 %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max   avg   avg  serv

                 act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull

hdisk1           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:54

hdisk0           0.0   4.1K   1.0   4.1K   0.0    1.0   0.3    0.3    0.3     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:54

hdisk2           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:54

hdisk3         100.0 154.0M 1175.0   0.0  154.0M   0.0   0.0    0.0    0.0     0    0 1175.0   0.8    0.8    1.8     0    0 161.9  108.0  217.7  303.0   1.0 1175.0  03:11:54

 

Disks:                     xfers                                read                                write                                  queue                    time

-------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------

                 %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max   avg   avg  serv

                 act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull

hdisk1           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:55

hdisk0           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:55

hdisk2           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:55

hdisk3         100.0 150.3M 1147.0   0.0  150.3M   0.0   0.0    0.0    0.0     0    0 1147.0   0.9    0.8   10.6     0    0 165.9  108.5  239.4  304.0   1.0 1147.0  03:11:55

 

Disks:                     xfers                                read                                write                                  queue                    time

-------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------

                 %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max   avg   avg  serv

                 act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull

hdisk1           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:56

hdisk0           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:56

hdisk2           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:56

hdisk3         100.0 155.5M 1186.0   0.0  155.5M   0.0   0.0    0.0    0.0     0    0 1186.0   0.8    0.8    1.8     0    0 161.6  106.8  217.8  307.0   1.0 1186.0  03:11:56

 

Disks:                     xfers                                read                                write                                  queue                    time

-------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------

                 %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max   avg   avg  serv

                 act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull

hdisk1           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:57

hdisk0           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:57

hdisk2           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:11:57

hdisk3          67.0 102.2M 780.0   0.0  102.2M   0.0   0.0    0.0    0.0     0    0 780.0   0.9    0.8   11.8     0    0 166.1  106.3  232.8   53.0   0.0 779.0  03:11:57

 

Now comes the interesting part. With a little help from our friends in IBM support, using kdb we found that the queue depth was reported as being set to 1 in the kernel and not 256! You’ll also notice here that the hdisk name has changed from hdisk3 to hdisk1. This happened as a result of exporting hdisk3 to the VWPAR. The disk is known as hdisk1 in the VWPAR (not hdisk3) but the kernel address is the same.

 

# echo scsidisk | kdb

           START              END <name>

0000000000001000 0000000005840000 start+000FD8

F00000002FF47600 F00000002FFDF9C8 __ublock+000000

000000002FF22FF4 000000002FF22FF8 environ+000000

000000002FF22FF8 000000002FF22FFC errno+000000

F1000F0A00000000 F1000F0A10000000 pvproc+000000

F1000F0A10000000 F1000F0A18000000 pvthread+000000

read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01E4E20

(0)> scsidisk

"scsidisk_list" address...[0x0]

NAME            ADDRESS             STATE   CMDS_OUT  CURRBUF             LOW

hdisk0          0xF1000A01505DC000  0x00000002  0x0000    0x0000000000000000  0x0

hdisk0          0xF1000A01C0148000  0x00000002  0x0000    0x0000000000000000  0x0

hdisk1          0xF1000A01505D4000  0x00000002  0x0000    0x0000000000000000  0x0

hdisk1          0xF1000A01C014C000  0x00000001  0x0000    0x0000000000000000  0x0 <<<<

 

# echo scsidisk 0xF1000A01C014C000 | kdb | grep queue_depth

    ushort queue_depth   = 0x1;                 <<<< WRONG QUEUE DEPTH!

 

# lsattr -Pl hdisk3 -a queue_depth

queue_depth 256 Queue DEPTH True

 

The kdb output above proved that the queue depth was set to 1 in the VWPAR. Even though the ODM still had the attribute set to 256 (in both the Global and VWPAR environments).

 

We discovered that this behaviour was a bug.

 

IV63665: CAN'T SET THE QUEUE_DEPTH IN A VERSION WPAR.

http://www-01.ibm.com/support/docview.wss?uid=isg1IV63665

 

 

APAR status

 

    Closed as fixed if next.

 

Error description

 

    When a  customer run the following in a version

    WPAR such as 5.2 or 5.3. The disk in the kernel

    will not have the queue depth set in kernel.

 

    1. chdev -P -l hdisk5 -a queue_depth=8

    2.  odmget CuAt | grep -p hdisk5 | less

 

    CuAt:

          name = "hdisk5"

          attribute = "queue_depth"

          value = "8"

          type = "R"

          generic = "UD"

          rep = "nr"

          nls_index = 30

 

    3.  stopwpar -N <wpar_name>

 

    The customer then will see that the performance

    is still the same. When a perfpmr is sent in

    from the global lpar you will see that the

    queue_depth is set to 1 for all of the disk

    exported to the wpar.

 

Local fix

 

    N/A

 

Problem summary

 

    When a  customer run the following in a version

    WPAR such as 5.2 or 5.3. The disk in the kernel

    will not have the queue depth set in kernel.

 

Fortunately, IBM support was able to provide us with a workaround. The first step was to add the missing vparent PdAt entry to the ODM in the Global environment.

 

# cat addodm_pdat_for_vparent.txt

PdAt:

      uniquetype = "wio/common/vparent"

      attribute = "naca_1_spt"

      deflt = "1"

      values = "1"

      width = ""

      type = "R"

      generic = ""

      rep = "n"

      nls_index = 0

 

# odmadd addodm_pdat_for_vparent.txt

 

# odmget PdAt | grep -p "wio/common/vparent"

PdAt:

        uniquetype = "wio/common/vparent"

        attribute = "naca_1_spt"

        deflt = "1"

        values = "1"

        width = ""

        type = "R"

        generic = ""

        rep = "n"

        nls_index = 0

 

We did the same in the VWPAR.

 

# clogin p8wpar1

# uname -W

11

 

# odmget PdAt | grep -p "wio/common/vparent"

#

# odmadd addodm_pdat_for_vparent.txt

# odmget PdAt | grep -p "wio/common/vparent"

PdAt:

        uniquetype = "wio/common/vparent"

        attribute = "naca_1_spt"

        deflt = "1"

        values = "1"

        width = ""

        type = "R"

        generic = ""

        rep = "n"

        nls_index = 0

 

In the VWPAR, we removed the hdisk and then discovered it again, ensuring that the queue depth attribute was set to 256 in the ODM.

 

# uname –W

11

# rmdev -dl hdisk1

hdisk1 deleted

# cfgmgr

# lspv

hdisk0          00f94f58a0b98ca2                    rootvg          active

hdisk1          none                                None

# lsattr -El hdisk1 –a queue_depth

queue_depth     256              Queue DEPTH                True

# odmget CuAt | grep -p queue

CuAt:

        name = "hdisk1"

        attribute = "queue_depth"

        value = "256"

        type = "R"

        generic = "UD"

        rep = "nr"

        nls_index = 12

 

Back in the Global environment we checked that the queue depth was set correctly in the kernel. And it was!

 

# uname -W

0

# echo scsidisk 0xF1000A01C014C000 | kdb | grep queue_depth

    ushort queue_depth   = 0x100;

 

We re-ran the simple I/O test and immediately found that the test ran faster and the hdisk queue (for hdisk3, as shown by iostat from the Global environment) was no longer full. Subsequent application load tests showed much better performance.

 

# time lmktemp Afile 1024M

Afile

 

real    0m3.15s <<<< BETTER!

user    0m0.03s

sys     0m1.60s

 

 

# iostat -DTRl 1

 

Disks:                     xfers                                read                                write                                  queue                    time

-------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------

                 %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max   avg   avg  serv

                 act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull

hdisk1           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:20:14

hdisk0           0.0   4.1K   1.0   4.1K   0.0    1.0   0.2    0.2    0.2     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:20:14

hdisk2           0.0  20.5K   5.0  20.5K   0.0    5.0   0.2    0.2    0.4     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   5.0  03:20:14

hdisk3          86.0 280.6M 2144.0   0.0  280.6M   0.0   0.0    0.0    0.0     0    0 2144.0   1.2    0.3    2.9     0    0   0.0    0.0    0.0    0.0   4.0   0.0  03:20:14

 

Disks:                     xfers                                read                                write                                  queue                    time

-------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------

                 %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max   avg   avg  serv

                 act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull

hdisk1           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:20:15

hdisk0           0.0   8.2K   2.0   0.0    8.2K   0.0   0.0    0.0    0.0     0    0   2.0   1.2    0.6    1.9     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:20:15

hdisk2           0.0   4.1K   1.0   0.0    4.1K   0.0   0.0    0.0    0.0     0    0   1.0   0.6    0.6    0.6     0    0   0.0    0.0    0.0    0.0   0.0   1.0  03:20:15

hdisk3         100.0 327.0M 2495.0   0.0  327.0M   0.0   0.0    0.0    0.0     0    0 2495.0   1.3    0.8   10.0     0    0   0.0    0.0    0.0    0.0   2.0   0.0  03:20:15

 

Disks:                     xfers                                read                                write                                  queue                    time

-------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------

                 %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max   avg   avg  serv

                 act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull

hdisk1           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:20:16

hdisk0           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:20:16

hdisk2           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:20:16

hdisk3         100.0 354.3M 2703.0   0.0  354.3M   0.0   0.0    0.0    0.0     0    0 2703.0   2.1    0.9   16.1     0    0   0.0    0.0    0.0    0.0   5.0   0.0  03:20:16

 

Disks:                     xfers                                read                                write                                  queue                    time

-------------- -------------------------------- ------------------------------------ ------------------------------------ -------------------------------------- ---------

                 %tm    bps   tps  bread  bwrtn   rps    avg    min    max time fail   wps    avg    min    max time fail    avg    min    max   avg   avg  serv

                 act                                    serv   serv   serv outs              serv   serv   serv outs        time   time   time  wqsz  sqsz qfull

hdisk1           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:20:17

hdisk0           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:20:17

hdisk2           0.0   0.0    0.0   0.0    0.0    0.0   0.0    0.0    0.0     0    0   0.0   0.0    0.0    0.0     0    0   0.0    0.0    0.0    0.0   0.0   0.0  03:20:17

hdisk3          32.0 111.7M 852.0   0.0  111.7M   0.0   0.0    0.0    0.0     0    0 852.0   1.3    0.9    2.8     0    0   0.0    0.0    0.0    0.0   1.0   0.0  03:20:17

 

 

Please Note: kdb will not work inside a WPAR. If you attempt to run it, you’ll receive the following error message. Run kdb from the Global environment instead.

 

# clogin p8wpar1

# kdb

The specified kernel file is a 64-bit kernel

open: A file or directory in the path name does not exist.

cannot open /dev/pmem