If you are
a system administrator that is responsible for managing AIX systems that run
SAP, then youve probably had an experience similar to the following?
OK, so one
day my SAP Basis administrator contacts me and says I cant start saposcol......can
you please reboot the system? I quickly reply, Was there an error message when trying to restart saposcol?. He
replies, No. Again I return very
quickly, OK, have you checked to see if
there are any shared memory segments left for saposcol?. Just as quick, he replies How do I do that?
Together
we try starting saposcol and what we
find is that it thinks its already running (as shown below, PID 327924). But there is no such process!
$ saposcol
-l
-l
09:37:24
22.01.2009 LOG: Effective User Id is
root
***********************************************************************
*
This is Saposcol Version COLL 20.95 700 - AIX v11.15 5L-64 bit 080317
*
Usage: saposcol -l: Start OS Collector
* saposcol -k: Stop OS Collector
* saposcol -d: OS Collector Dialog Mode
* saposcol -s: OS Collector Status
* The OS
Collector (PID 327924) is already running .....
************************************************************************
$
ps -fp 327924
UID
PID PPID C
STIME TTY TIME CMD
So we try
to stop saposcol. This of course fails, as there is no PID 327924 running!
$
saposcol -k
Setting
Stop Flag :
09:19:29
22.01.2009 LOG: ==== Stop Flag was set
by saposcol (kill_collector()).
09:19:29
22.01.2009 LOG: ==== The collection process will stop as soon as possible
********************
can't kill
process 327924.
kill: No such
process
ERROR:No
reaction from collecting process 327924.
Please
kill collecting process.
My
conclusion is that there must be a shared memory segment still allocated for
saposcol. There were many other SAP processes still running happily, so there
were several shared memory segments to sift through. So, what shared memory ID
does saposcol use?
Now, according
to the following website, shared memory key 4dbe is used by saposcol on AIX.
http://www.saptechies.com/os-collector-saposcol/
7. Question: How do I remove the shared memory key of
saposcol?
Answer: Sometimes it may be
necessary to remove the shared memory key of saposcol (see point 6). Caution:
Please be very careful! This procedure should be performed only after checking
that saposcol is really not running (see point 4) and only in cases when other
options (see point 6) really do not work! For this, execute command ipcs -ma
and note the line that contains saposcol key 4dbe. You need the shared
memory ID. After that, execute command ipcrm -m ID. Now the commandsaposcol
-s should show that saposcol is not running and that the shared memory is not
attached. The shared memory key will be created automatically by the saposcol when
the collector is next started: saposcol -l.
So I run ipcs to check for the existence of 4dbe. And I find an entry for this key.
There are
several process ids attached to this segment. However, only one of them actually
exists (PID 2293794).
#
ipcs -ma | grep 4dbe
m 2097156 0x00004dbe --rw-rw-rw- root
sapsys root sapsys
0 1870188
467164 2293794 9:39:43
9:39:43 8:12:43
#
ps -fp 1870188
UID
PID PPID C
STIME TTY TIME CMD
#
ps -fp 467164
UID
PID PPID C
STIME TTY TIME CMD
#
ps -fp 2293794
UID
PID PPID C
STIME TTY TIME CMD
sapadm 2293794
1 0 19:26:41 -
1:07 sapccm4x -DCCMS pf=/sapmnt/SAP/profile/SAP_DVEBMGS00_aix01
I ask the
SAP admin to stop this process, which he does. Now I remove the shared memory
segment and there is no evidence of 4dbe in the ipcs output.
#
ipcrm -m 2097156
#
ipcs -ma | grep 4dbe
We were
then able to start saposcol again
with success. The process is running and the shared memory segment 4dbe has returned.
#
ipcs -ma | grep 4dbe
m 3145732 0x00004dbe --rw-rw-rw- root
sapsys root sapsys
1 1870188 2293818 2207892
10:14:25 10:14:25 10:12:47
#
ps -ef | grep oscol
sapadm 2064616 1
0 10:12:53 - 0:00 saposcol -l
#
/usr/sap/SAP/SYS/exe/run/saposcol -s
**************************************************************
Collector
Versions :
running : COLL 20.95 700 - AIX v11.15 5L-64
bit 080317
dialog
: COLL 20.95 700 - AIX v11.15 5L-64 bit 080317
Shared
Memory : attached
Number
of records : 17640
Active
Flag : active (01)
Operating
System : AIX aix01 3 5 00C01C704C00
Collector PID : 2064616 (001F80E8)
Collector : running
Start
time coll. : Thu Jan 22 10:12:53 2009
Current
Time : Thu Jan 22 10:17:46 2009
Last
write access : Thu Jan 22 10:17:38 2009
Last
Read Access : Thu Jan 22 10:16:16 2009
Collection
Interval : 10 sec (next delay).
Collection
Interval : 10 sec (last ).
Status : free
Collect
Details : required
Refresh : required
Header
Extention Structure
Number
of x-header Records : 1
Number
of Communication Records : 60
Number
of free Com. Records : 60
Resulting
offset to 1.data rec. : 61
Trace
level : 2
Collector
in IDLE - mode ? : NO
become idle after 300 sec without read
access.
Length of Idle Interval : 60 sec
Length of norm.Interval : 10 sec
**************************************************************
Problem
solved and no reboot required.....this is AIX after all! :)