I received the following
question from an AIX administrator in Germany.
“Hi Chris,
on your blog, you explain how to find out the active value
of
num_cmd_elems of an fc-adapter by using the kdb. So you can
decide, if the
value of lsattr is active or not ...
I wonder if you can find out the values fc_err_recov and
dyntrk of the
fscsiX device.?
# lsattr -El fscsi0
attach
switch How this adapter is
CONNECTED False
dyntrk
yes Dynamic Tracking of
FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY
Policy True
scsi_id
0x1021f Adapter SCSI
ID
False
sw_fc_class
3 FC Class
for
Fabric
True
I try to use echo efscsi fscsi0 | kdb .. but I can't figure
it out..
Can you help my please?”
I did a little research on his behalf
and came up with an answer. However, I’m not at all surprised he had trouble
finding the right information. It's not easy, clear or documented!
I received the following information
from my IBM AIX contacts.
“The following relies on internal structures that are subject to
change.
The procedure was tested on 6100-06, 6100-07, and 7100-01. I don't
have a lab system with physical HBAs and 5.3 at the moment.
Hopefully the same steps should work for 5.3. You may need to
first run efscsi without arguments to load the kdb module before running efscsi
fscsiX.
# kdb
(0)> efscsi fscsi1 | grep efscsi_ddi
struct efscsi_ddi ddi
= 0xF1000A060084A080
(0)> dd 0xF1000A060084A080+20 2
F1000A060084A0A0: 0101020202010200 000000B400000028 ...............(
FFDD
NNNNNNNN
FF = fc_error_recov: 01=delayed_fail
02=fast_fail
DD = dyntrk: 00=disabled 01=enabled
NNNN=num_cmd_elems - 20 (20 reserved)
e.g. 200 - 20 = 180 = B4
So in
this example, fc_err_recov is set to fast_fail (02), dyntrk is set to yes (01)
and num_cmd_elems is set to 200.“
I tested this on a lab system
running AIX 6.1 TL6 and AIX 7.1 TL1. Starting with an FC adapter with dyntrk disabled (set to no), fc_err_recov disabled (set to
delayed_fail) and num_cmd_elems set
to 500.
# lsattr -El fscsi1
attach none How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY
Policy True
scsi_id
Adapter SCSI ID
False
sw_fc_class 3 FC Class for Fabric True
# lsattr -El fcs1 -a num_cmd_elems
num_cmd_elems 500 Maximum number of COMMANDS to queue to the adapter
True
# kdb
(0)> efscsi fscsi1
| grep efscsi_ddi
struct efscsi_ddi
ddi = 0xF1000A060096E080
(0)> dd 0xF1000A060096E080+20
2
F1000A060096E0A0: 0101020201000100
000001E000000028 ...............(
FFDD NNNNNNNN
OK, let’s break it down. From the kdb output we can determine the
following:
·
fc_error_recov is currently set to
delayed_fail (FF=01
= fc_error_recov = delayed_fail).
·
dyntrk
is currently set to no (DD=00
= dyntrk = disabled).
·
num_cmd_elems
is currently set to 500 (NNNNNNNN=1E0
= num_cmd_elems = 480 + 20 = 500).
If I set dyntrk to yes, we notice that the value changes immediately within
the kernel running config. I was able to
make this change without a reboot as the device was not in use.
# chdev -l fscsi1 -a dyntrk=yes
# kdb
(0)> efscsi fscsi1
| grep efscsi_ddi
struct efscsi_ddi
ddi = 0xF1000A060096E080
(0)> dd 0xF1000A0800CB6080+20
2
F1000A0800CB60A0: 0101020201010200 000001E000000028 ...............(
FFDD NNNNNNNN
And
now dynamic tracking is enabled (DD=01
= dyntrk = enabled, set to yes).
Poor old AIX 5.3 struggled to provide me with any information using the
steps provided.
So what about max_xfer_size? For a physical FC adapter we can find the current
value using the following kdb
commands:
(0)> efcs fcs1 |grep ddi
struct
efc_ddi ddi = 0xF1000A06006D0080
(0)> dd
0xF1000A06006D0080+60 4
F1000A06006D00E0:
00000000000000C8
0000012C900000C1 ...........,....
F1000A06006D00F0:
900000C1000FFC00 0010000000800000
................
Based on the output, num_cmd_elems is set to 200 (C8) and max_xfer_size is set to 1048576
(100000).
The max_xfer_size
for VFC is tricky because it is contained in a structure that can and does
change between SPs and TLs. In
6100-06-01 max_xfer_size is offset
3932 bytes into the structure so we get the value like this:
(0)> vfcs
NAME
ADDRESS STATE HOST
HOST_ADAP OPENED NUM_ACTIVE
fcs2
0xF100010100B38000 0xFFFF nimlab102-vfchost0 0x00
0x0000
(0)> dcal 3932
Value decimal: 3932
Value hexa: 00000F5C
(0)> dd
0xF100010100B38000+F50
F100010100B38F50:
0000002800000002 000000C800100000
...(............
Perhaps the easiest way to handle
changes between versions is to use the fact that max_xfer_size is immediately after num_cmd_elems and that is very unlikely to change. So, knowing that
the structure size does not change by very much you can grep in the general area:
(0)> vfcs fcs2 | grep
elems
num_cmd_elems: 0xC8
(0)> dd
0xF100010100B38000 200 | grep 000000C8
F100010100B38F50:
0000002800000002 000000C800100000 ...(............
Here are the links to my previous
posts on kdb:
https://www.ibm.com/developerworks/mydeveloperworks/blogs/cgaix/entry/checking_num_cmd_elems_for_vfc_adapters_with_kdb1?lang=en
https://www.ibm.com/developerworks/mydeveloperworks/blogs/cgaix/entry/checking_your_queue_depth_with_kdb?lang=en
Enjoy kdb fans!
Attention: just a note about max_xfer_size
and virtual FC adapters. In my experience, if the values for this attribute on
the VIO client do not match those on
the VIO server, then you will have trouble configuring the virtual FC adapters.
Possible side effects may include your system never booting again!
So if I change the value to
0x200000 on the client, without mirroring this value on the VIO server, I may encounter
the following effects:
# rmdev -Rl
fcs1
sfwcomm1
Defined
fscsi1
Defined
fcnet1
Defined
fcs1 Defined
# chdev -l
fcs1 -a max_xfer_size=0x200000
fcs1 changed
The cfgmgr command will report errors for the FC adapter.
# cfgmgr
Method error
(/usr/lib/methods/cfgefscsi -l fscsi1
):
0514-061 Cannot
find a child device.
Method error
(/usr/lib/methods/cfgstorfworkcom -l sfwcomm1 ):
0514-040 Error initializing a device
into the kernel.
Errors, similar to the
following, may appear in the AIX error report.
# errpt
errpt | grep fcs
0E0C5B31 0726123812 U S fcs1 Undefined error
8C9E9221 0726123812 I S fcs1 Informational message
You’ll observe messages in
the error report that claim a request from the client was rejected by the VIOS.
...
Request was rejected by VIOS
Response was rejected by the client
...
# errpt -aN
fcs1
---------------------------------------------------------------------------
LABEL: VFC_ERR8
IDENTIFIER: 0E0C5B31
Date/Time: Thu Jul 26 12:38:29 EETDT 2012
Sequence
Number: 1040
Machine
Id: 00C123C64C00
Node Id: aixlpar1
Class: S
Type: UNKN
WPAR: Global
Resource Name: fcs1
Description
Undefined
error
Probable
Causes
PROCESSOR
Failure
Causes
PROCESSOR
Recommended Actions
PERFORM PROBLEM DETERMINATION
PROCEDURES
Detail Data
Error
Location
0000 00E0
Error Type
00
RC
FFFF FFFF
FFFF FFFF
VIO Server Partition Name
vio2
Physical Adapter Instance Name
vfchost50
Physical Adapter Location Code
U5873.001.8SS0071-P2-C6-T1
Physical Adapter DRC Name
U9119.FHB.87654C6-V7-C1100
Adapter N
Port ID
0000 0000
0000 0000
Adapter
State
0000 FFFF
Additional
Information
0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: VFC_ERR7
IDENTIFIER: 8C9E9221
Date/Time: Thu Jul 26 12:38:29 EETDT 2012
Sequence
Number: 1039
Machine
Id: 00C123C64C00
Node
Id: aixlpar1
Class: S
Type: INFO
WPAR: Global
Resource
Name: fcs1
Description
Informational
message
Probable
Causes
Request was rejected by VIOS
Response was rejected by the client
Failure
Causes
PROCESSOR
Recommended Actions
PERFORM PROBLEM DETERMINATION
PROCEDURES
Detail Data
Error
Location
0000 0088
Error Type
00
RC
0000 0000
0010 0000
VIO Server Partition Name
vio2
Physical Adapter Instance Name
vfchost50
Physical Adapter Location Code
U5873.001.8SS0071-P2-C6-T1
Physical Adapter DRC Name
U9119.FHB.87654C6-V7-C1100
Adapter N
Port ID
0000 0000
0000 0000
Adapter
State
0000 0004
If you encounter this
problem, restore the clients FC adapter attributes to their previous values
before restarting the system. If you don’t, then your LPAR may no longer boot
and may hang on LED 554. Change your VIOS first then update your VIO clients.
Tags:
dyntrk
num_cmd_elems
fscsi
kdb
chris_gibson
fc_error_recov
attributes
aix