saposcol and shared memory on AIX.If you are
a system administrator that is responsible for managing AIX systems that run
SAP, then you’ve probably had an experience similar to the following? OK, so one
day my SAP Basis administrator contacts me and says “I can’t start saposcol......can
you please reboot the system?” I quickly reply, “Was there an error message when trying to restart saposcol?”. He
replies, “No”. Again I return very
quickly, “OK, have you checked to see if
there are any shared memory segments left for saposcol?”. Just as quick, he replies “How do I do that?” Together
we try starting saposcol and what we
find is that it thinks it’s already running (as shown below, PID 327924). But there is no such process! $ saposcol
-l -l 09:37:24
22.01.2009 LOG: Effective User Id is
root **** *
This is Saposcol Version COLL 20.95 700 - AIX v11.15 5L-64 bit 080317 *
Usage: saposcol -l: Start OS Collector * saposcol -k: Stop OS Collector * saposcol -d: OS Collector Dialog Mode * saposcol -s: OS Collector Status * The OS
Collector (PID 327924) is already running ..... **** $
ps -fp 327924 UID
PID PPID C
STIME TTY TIME CMD So we try
to stop saposcol. This of course fails, as there is no PID 327924 running! $
saposcol -k Setting
Stop Flag : 09:19:29
22.01.2009 LOG: ==== Stop Flag was set
by saposcol (kill_collector()). 09:19:29
22.01.2009 LOG: ==== The collection process will stop as soon as possible **** can't kill
process 327924. kill: No such
process ERROR:No
reaction from collecting process 327924. Please
kill collecting process. My
conclusion is that there must be a shared memory segment still allocated for
saposcol. There were many other SAP processes still running happily, so there
were several shared memory segments to sift through. So, what shared memory ID
does saposcol use? Now, according
to the following website, shared memory key 4dbe is used by saposcol on AIX. http
So I run ipcs to check for the existence of 4dbe. And I find an entry for this key. There are
several process id’s ‘attached’ to this segment. However, only one of them actually
exists (PID 2293794). #
ipcs -ma | grep 4dbe m 2097156 0x00004dbe --rw-rw-rw- root
sapsys root sapsys
0 1870188
467164 2293794 9:39:43
9:39:43 8:12:43 #
ps -fp 1870188 UID
PID PPID C
STIME TTY TIME CMD #
ps -fp 467164 UID
PID PPID C
STIME TTY TIME CMD #
ps -fp 2293794 UID
PID PPID C
STIME TTY TIME CMD sapadm 2293794
1 0 19:26:41 -
1:07 sapccm4x -DCCMS pf=/ I ask the
SAP admin to stop this process, which he does. Now I remove the shared memory
segment and there is no evidence of 4dbe in the ipcs output. #
ipcrm -m 2097156 #
ipcs -ma | grep 4dbe We were
then able to start saposcol again
with success. The process is running and the shared memory segment 4dbe has returned. #
ipcs -ma | grep 4dbe m 3145732 0x00004dbe --rw-rw-rw- root
sapsys root sapsys
1 1870188 2293818 2207892
10:14:25 10:14:25 10:12:47 #
ps -ef | grep oscol sapadm 2064616 1
0 10:12:53 - 0:00 saposcol -l #
/usr **** Collector
Versions : running : COLL 20.95 700 - AIX v11.15 5L-64
bit 080317 dialog
: COLL 20.95 700 - AIX v11.15 5L-64 bit 080317 Shared
Memory : attached Number
of records : 17640 Active
Flag : active (01) Operating
System : AIX aix01 3 5 00C01C704C00 Collector PID : 2064616 (001F80E8) Collector : running Start
time coll. : Thu Jan 22 10:12:53 2009 Current
Time : Thu Jan 22 10:17:46 2009 Last
write access : Thu Jan 22 10:17:38 2009 Last
Read Access : Thu Jan 22 10:16:16 2009 Collection
Interval : 10 sec (next delay). Collection
Interval : 10 sec (last ). Status : free Collect
Details : required Refresh : required Header
Extention Structure Number
of x-header Records : 1 Number
of Communication Records : 60 Number
of free Com. Records : 60 Resulting
offset to 1.data rec. : 61 Trace
level : 2 Collector
in IDLE - mode ? : NO become idle after 300 sec without read
access. Length of Idle Interval : 60 sec Length of norm.Interval : 10 sec **** Problem
solved and no reboot required.....this is AIX after all! :) |
Worked like a charm on a Prod SAP NetWeaver system I'm helping with. I wonder whether the initial problem was because they killed saposcol with a kill -9, which may not have cleaned up shared memory. Anyway, problem solved. Thanks, Chris.