I received the following question from an AIX administrator in Germany.
Hi Chris,
on your blog, you explain how to find out the active value of
num_cmd_elems of an fc-adapter by using the kdb. So you can decide, if the
value of lsattr is active or not ...
I wonder if you can find out the values fc_err_recov and dyntrk of the
fscsiX device.?
# lsattr -El fscsi0
attach switch How this adapter is CONNECTED False
dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x1021f Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
I try to use echo efscsi fscsi0 | kdb .. but I can't figure it out..
Can you help my please?
I did a little research on his behalf and came up with an answer. However, Im not at all surprised he had trouble finding the right information. It's not easy, clear or documented!
I received the following information from my IBM AIX contacts.
The following relies on internal structures that are subject to change.
The procedure was tested on 6100-06, 6100-07, and 7100-01. I don't have a lab system with physical HBAs and 5.3 at the moment.
Hopefully the same steps should work for 5.3. You may need to first run efscsi without arguments to load the kdb module before running efscsi fscsiX.
# kdb
(0)> efscsi fscsi1 | grep efscsi_ddi
struct efscsi_ddi ddi =0xF1000A060084A080
(0)> dd0xF1000A060084A080+20 2
F1000A060084A0A0: 0101020202010200000000B400000028 ...............(
FFDD NNNNNNNN
FF = fc_error_recov: 01=delayed_fail 02=fast_fail
DD = dyntrk: 00=disabled 01=enabled
NNNN=num_cmd_elems - 20 (20 reserved)
e.g. 200 - 20 = 180 = B4
So in this example, fc_err_recov is set to fast_fail (02), dyntrk is set to yes (01) and num_cmd_elems is set to 200.
I tested this on a lab system running AIX 6.1 TL6 and AIX 7.1 TL1. Starting with an FC adapter with dyntrk disabled (set to no), fc_err_recov disabled (set to delayed_fail) and num_cmd_elems set to 500.
# lsattr -El fscsi1
attach none How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
# lsattr -El fcs1 -a num_cmd_elems
num_cmd_elems 500 Maximum number of COMMANDS to queue to the adapter True
# kdb
(0)> efscsi fscsi1 | grep efscsi_ddi
struct efscsi_ddi ddi = 0xF1000A060096E080
(0)> dd 0xF1000A060096E080+20 2
F1000A060096E0A0: 0101020201000100 000001E000000028 ...............(
FFDD NNNNNNNN
OK, lets break it down. From the kdb output we can determine the following:
fc_error_recov is currently set to delayed_fail (FF=01 = fc_error_recov = delayed_fail).
dyntrk is currently set to no (DD=00 = dyntrk = disabled).
num_cmd_elems is currently set to 500 (NNNNNNNN=1E0 = num_cmd_elems = 480 + 20 = 500).
If I set dyntrk to yes, we notice that the value changes immediately within the kernel running config. I was able to make this change without a reboot as the device was not in use.
# chdev -l fscsi1 -a dyntrk=yes
# kdb
(0)> efscsi fscsi1 | grep efscsi_ddi
struct efscsi_ddi ddi = 0xF1000A060096E080
(0)> dd 0xF1000A0800CB6080+20 2
F1000A0800CB60A0: 0101020201010200 000001E000000028 ...............(
FFDD NNNNNNNN
And now dynamic tracking is enabled (DD=01 = dyntrk = enabled, set to yes).
Poor old AIX 5.3 struggled to provide me with any information using the steps provided.
So what about max_xfer_size? For a physical FC adapter we can find the current value using the following kdb commands:
(0)> efcs fcs1 |grep ddi
struct
efc_ddi ddi = 0xF1000A06006D0080
(0)> dd
0xF1000A06006D0080+60 4
F1000A06006D00E0:
00000000000000C8
0000012C900000C1 ...........,....
F1000A06006D00F0:
900000C1000FFC00 0010000000800000
................
Based on the output, num_cmd_elems is set to 200 (C8) and max_xfer_size is set to 1048576 (100000).
The max_xfer_size
for VFC is tricky because it is contained in a structure that can and does
change between SPs and TLs. In
6100-06-01 max_xfer_size is offset
3932 bytes into the structure so we get the value like this:
(0)> vfcs
NAME
ADDRESS STATE HOST
HOST_ADAP OPENED NUM_ACTIVE
fcs2
0xF100010100B38000 0xFFFF nimlab102-vfchost0 0x00
0x0000
(0)> dcal 3932
Value decimal: 3932
Value hexa: 00000F5C
(0)> dd
0xF100010100B38000+F50
F100010100B38F50:
0000002800000002 000000C800100000
...(............
Perhaps the easiest way to handle
changes between versions is to use the fact that max_xfer_size is immediately after num_cmd_elems and that is very unlikely to change. So, knowing that
the structure size does not change by very much you can grep in the general area:
(0)> vfcs fcs2 | grep
elems
num_cmd_elems: 0xC8
(0)> dd
0xF100010100B38000 200 | grep 000000C8
F100010100B38F50:
0000002800000002 000000C800100000 ...(............
Here are the links to my previous posts on kdb:
Enjoy kdb fans!
Attention: just a note about max_xfer_size and virtual FC adapters. In my experience, if the values for this attribute on the VIO client do not match those on the VIO server, then you will have trouble configuring the virtual FC adapters. Possible side effects may include your system never booting again!
So if I change the value to 0x200000 on the client, without mirroring this value on the VIO server, I may encounter the following effects:
# rmdev -Rl fcs1
sfwcomm1 Defined
fscsi1 Defined
fcnet1 Defined
fcs1 Defined
# chdev -l fcs1 -a max_xfer_size=0x200000
fcs1 changed
The cfgmgr command will report errors for the FC adapter.
# cfgmgr
Method error (/usr/lib/methods/cfgefscsi -l fscsi1 ):
0514-061 Cannot find a child device.
Method error (/usr/lib/methods/cfgstorfworkcom -l sfwcomm1 ):
0514-040 Error initializing a device into the kernel.
Errors, similar to the following, may appear in the AIX error report.
# errpt errpt | grep fcs
0E0C5B31 0726123812 U S fcs1 Undefined error
8C9E9221 0726123812 I S fcs1 Informational message
Youll observe messages in the error report that claim a request from the client was rejected by the VIOS.
...
Request was rejected by VIOS
Response was rejected by the client
...
# errpt -aN fcs1
---------------------------------------------------------------------------
LABEL: VFC_ERR8
IDENTIFIER: 0E0C5B31
Date/Time: Thu Jul 26 12:38:29 EETDT 2012
Sequence Number: 1040
Machine Id: 00C123C64C00
Node Id: aixlpar1
Class: S
Type: UNKN
WPAR: Global
Resource Name: fcs1
Description
Undefined error
Probable Causes
PROCESSOR
Failure Causes
PROCESSOR
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
Error Location
0000 00E0
Error Type
00
RC
FFFF FFFF FFFF FFFF
VIO Server Partition Name
vio2
Physical Adapter Instance Name
vfchost50
Physical Adapter Location Code
U5873.001.8SS0071-P2-C6-T1
Physical Adapter DRC Name
U9119.FHB.87654C6-V7-C1100
Adapter N Port ID
0000 0000 0000 0000
Adapter State
0000 FFFF
Additional Information
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: VFC_ERR7
IDENTIFIER: 8C9E9221
Date/Time: Thu Jul 26 12:38:29 EETDT 2012
Sequence Number: 1039
Machine Id: 00C123C64C00
Node Id: aixlpar1
Class: S
Type: INFO
WPAR: Global
Resource Name: fcs1
Description
Informational message
Probable Causes
Request was rejected by VIOS
Response was rejected by the client
Failure Causes
PROCESSOR
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
Error Location
0000 0088
Error Type
00
RC
0000 0000 0010 0000
VIO Server Partition Name
vio2
Physical Adapter Instance Name
vfchost50
Physical Adapter Location Code
U5873.001.8SS0071-P2-C6-T1
Physical Adapter DRC Name
U9119.FHB.87654C6-V7-C1100
Adapter N Port ID
0000 0000 0000 0000
Adapter State
0000 0004
If you encounter this problem, restore the clients FC adapter attributes to their previous values before restarting the system. If you dont, then your LPAR may no longer boot and may hang on LED 554. Change your VIOS first then update your VIO clients.