I received the following question from an AIX administrator in Germany.

Hi Chris,

on your blog, you explain how to find out the active value of

num_cmd_elems of an fc-adapter by using the kdb. So you can decide, if the

value of lsattr is active or not ...

I wonder if you can find out the values fc_err_recov and dyntrk of the

fscsiX device.?

# lsattr -El fscsi0

attach switch How this adapter is CONNECTED False

dyntrk yes Dynamic Tracking of FC Devices True

fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True

scsi_id 0x1021f Adapter SCSI ID False

sw_fc_class 3 FC Class for Fabric True

I try to use echo efscsi fscsi0 | kdb .. but I can't figure it out..

Can you help my please?

I did a little research on his behalf and came up with an answer. However, Im not at all surprised he had trouble finding the right information. It's not easy, clear or documented!

I received the following information from my IBM AIX contacts.

The following relies on internal structures that are subject to change.

The procedure was tested on 6100-06, 6100-07, and 7100-01. I don't have a lab system with physical HBAs and 5.3 at the moment.

Hopefully the same steps should work for 5.3. You may need to first run efscsi without arguments to load the kdb module before running efscsi fscsiX.

# kdb

(0)> efscsi fscsi1 | grep efscsi_ddi

struct efscsi_ddi ddi =0xF1000A060084A080

(0)> dd0xF1000A060084A080+20 2

F1000A060084A0A0: 0101020202010200000000B400000028 ...............(

FFDD NNNNNNNN

FF = fc_error_recov: 01=delayed_fail 02=fast_fail

DD = dyntrk: 00=disabled 01=enabled

NNNN=num_cmd_elems - 20 (20 reserved)

e.g. 200 - 20 = 180 = B4

So in this example, fc_err_recov is set to fast_fail (02), dyntrk is set to yes (01) and num_cmd_elems is set to 200.

I tested this on a lab system running AIX 6.1 TL6 and AIX 7.1 TL1. Starting with an FC adapter with dyntrk disabled (set to no), fc_err_recov disabled (set to delayed_fail) and num_cmd_elems set to 500.

# lsattr -El fscsi1

attach none How this adapter is CONNECTED False

dyntrk no Dynamic Tracking of FC Devices True

fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True

scsi_id Adapter SCSI ID False

sw_fc_class 3 FC Class for Fabric True

# lsattr -El fcs1 -a num_cmd_elems

num_cmd_elems 500 Maximum number of COMMANDS to queue to the adapter True

# kdb

(0)> efscsi fscsi1 | grep efscsi_ddi

struct efscsi_ddi ddi = 0xF1000A060096E080

(0)> dd 0xF1000A060096E080+20 2

F1000A060096E0A0: 0101020201000100 000001E000000028 ...............(

FFDD NNNNNNNN

OK, lets break it down. From the kdb output we can determine the following:

fc_error_recov is currently set to delayed_fail (FF=01 = fc_error_recov = delayed_fail).

dyntrk is currently set to no (DD=00 = dyntrk = disabled).

num_cmd_elems is currently set to 500 (NNNNNNNN=1E0 = num_cmd_elems = 480 + 20 = 500).

If I set dyntrk to yes, we notice that the value changes immediately within the kernel running config. I was able to make this change without a reboot as the device was not in use.

# chdev -l fscsi1 -a dyntrk=yes

# kdb

(0)> efscsi fscsi1 | grep efscsi_ddi

struct efscsi_ddi ddi = 0xF1000A060096E080

(0)> dd 0xF1000A0800CB6080+20 2

F1000A0800CB60A0: 0101020201010200 000001E000000028 ...............(

FFDD NNNNNNNN

And now dynamic tracking is enabled (DD=01 = dyntrk = enabled, set to yes).

Poor old AIX 5.3 struggled to provide me with any information using the steps provided.

So what about max_xfer_size? For a physical FC adapter we can find the current value using the following kdb commands:

(0)> efcs fcs1 |grep ddi
struct efc_ddi ddi = 0xF1000A06006D0080
(0)> dd 0xF1000A06006D0080+60 4
F1000A06006D00E0: 00000000000000C8 0000012C900000C1 ...........,....
F1000A06006D00F0: 900000C1000FFC00 0010000000800000 ................

Based on the output, num_cmd_elems is set to 200 (C8) and max_xfer_size is set to 1048576 (100000).

The max_xfer_size for VFC is tricky because it is contained in a structure that can and does change between SPs and TLs. In 6100-06-01 max_xfer_size is offset 3932 bytes into the structure so we get the value like this:

(0)> vfcs
NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE
fcs2 0xF100010100B38000 0xFFFF nimlab102-vfchost0 0x00 0x0000
(0)> dcal 3932
Value decimal: 3932 Value hexa: 00000F5C
(0)> dd 0xF100010100B38000+F50
F100010100B38F50: 0000002800000002 000000C800100000 ...(............

Perhaps the easiest way to handle changes between versions is to use the fact that max_xfer_size is immediately after num_cmd_elems and that is very unlikely to change. So, knowing that the structure size does not change by very much you can grep in the general area:

(0)> vfcs fcs2 | grep elems
num_cmd_elems: 0xC8

(0)> dd 0xF100010100B38000 200 | grep 000000C8
F100010100B38F50: 0000002800000002 000000C800100000 ...(............

Here are the links to my previous posts on kdb:

https://www.ibm.com/developerworks/mydeveloperworks/blogs/cgaix/entry/checking_num_cmd_elems_for_vfc_adapters_with_kdb1?lang=en

https://www.ibm.com/developerworks/mydeveloperworks/blogs/cgaix/entry/checking_your_queue_depth_with_kdb?lang=en

Enjoy kdb fans!

Attention: just a note about max_xfer_size and virtual FC adapters. In my experience, if the values for this attribute on the VIO client do not match those on the VIO server, then you will have trouble configuring the virtual FC adapters. Possible side effects may include your system never booting again!

So if I change the value to 0x200000 on the client, without mirroring this value on the VIO server, I may encounter the following effects:

# rmdev -Rl fcs1

sfwcomm1 Defined

fscsi1 Defined

fcnet1 Defined

fcs1 Defined

# chdev -l fcs1 -a max_xfer_size=0x200000

fcs1 changed

The cfgmgr command will report errors for the FC adapter.


# cfgmgr

Method error (/usr/lib/methods/cfgefscsi -l fscsi1 ):

0514-061 Cannot find a child device.

Method error (/usr/lib/methods/cfgstorfworkcom -l sfwcomm1 ):

0514-040 Error initializing a device into the kernel.

Errors, similar to the following, may appear in the AIX error report.

# errpt errpt | grep fcs

0E0C5B31 0726123812 U S fcs1 Undefined error

8C9E9221 0726123812 I S fcs1 Informational message

Youll observe messages in the error report that claim a request from the client was rejected by the VIOS.

...

Request was rejected by VIOS

Response was rejected by the client

...

# errpt -aN fcs1

---------------------------------------------------------------------------

LABEL: VFC_ERR8

IDENTIFIER: 0E0C5B31

Date/Time: Thu Jul 26 12:38:29 EETDT 2012

Sequence Number: 1040

Machine Id: 00C123C64C00

Node Id: aixlpar1

Class: S

Type: UNKN

WPAR: Global

Resource Name: fcs1

Description

Undefined error

Probable Causes

PROCESSOR

Failure Causes

PROCESSOR

Recommended Actions

PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data

Error Location

0000 00E0

Error Type

00

RC

FFFF FFFF FFFF FFFF

VIO Server Partition Name

vio2

Physical Adapter Instance Name

vfchost50

Physical Adapter Location Code

U5873.001.8SS0071-P2-C6-T1

Physical Adapter DRC Name

U9119.FHB.87654C6-V7-C1100

Adapter N Port ID

0000 0000 0000 0000

Adapter State

0000 FFFF

Additional Information

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

---------------------------------------------------------------------------

LABEL: VFC_ERR7

IDENTIFIER: 8C9E9221

Date/Time: Thu Jul 26 12:38:29 EETDT 2012

Sequence Number: 1039

Machine Id: 00C123C64C00

Node Id: aixlpar1

Class: S

Type: INFO

WPAR: Global

Resource Name: fcs1

Description

Informational message

Probable Causes

Request was rejected by VIOS

Response was rejected by the client

Failure Causes

PROCESSOR

Recommended Actions

PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data

Error Location

0000 0088

Error Type

00

RC

0000 0000 0010 0000

VIO Server Partition Name

vio2

Physical Adapter Instance Name

vfchost50

Physical Adapter Location Code

U5873.001.8SS0071-P2-C6-T1

Physical Adapter DRC Name

U9119.FHB.87654C6-V7-C1100

Adapter N Port ID

0000 0000 0000 0000

Adapter State

0000 0004

If you encounter this problem, restore the clients FC adapter attributes to their previous values before restarting the system. If you dont, then your LPAR may no longer boot and may hang on LED 554. Change your VIOS first then update your VIO clients.