If you are a system administrator that is responsible for managing AIX systems that run SAP, then youve probably had an experience similar to the following?

OK, so one day my SAP Basis administrator contacts me and says I cant start saposcol......can you please reboot the system? I quickly reply, Was there an error message when trying to restart saposcol?. He replies, No. Again I return very quickly, OK, have you checked to see if there are any shared memory segments left for saposcol?. Just as quick, he replies How do I do that?

Together we try starting saposcol and what we find is that it thinks its already running (as shown below, PID 327924). But there is no such process!

$ saposcol -l

-l

09:37:24 22.01.2009 LOG: Effective User Id is root

***********************************************************************

* This is Saposcol Version COLL 20.95 700 - AIX v11.15 5L-64 bit 080317

* Usage: saposcol -l: Start OS Collector

* saposcol -k: Stop OS Collector

* saposcol -d: OS Collector Dialog Mode

* saposcol -s: OS Collector Status

* The OS Collector (PID 327924) is already running .....

************************************************************************

$ ps -fp 327924

UID PID PPID C STIME TTY TIME CMD

So we try to stop saposcol. This of course fails, as there is no PID 327924 running!

$ saposcol -k

Setting Stop Flag :

09:19:29 22.01.2009 LOG: ==== Stop Flag was set by saposcol (kill_collector()).

09:19:29 22.01.2009 LOG: ==== The collection process will stop as soon as possible

********************

can't kill process 327924.

kill: No such process

ERROR:No reaction from collecting process 327924.

Please kill collecting process.

My conclusion is that there must be a shared memory segment still allocated for saposcol. There were many other SAP processes still running happily, so there were several shared memory segments to sift through. So, what shared memory ID does saposcol use?

Now, according to the following website, shared memory key 4dbe is used by saposcol on AIX.

http://www.saptechies.com/os-collector-saposcol/

7. Question: How do I remove the shared memory key of saposcol?


Answer: Sometimes it may be necessary to remove the shared memory key of saposcol (see point 6). Caution: Please be very careful! This procedure should be performed only after checking that saposcol is really not running (see point 4) and only in cases when other options (see point 6) really do not work! For this, execute command ipcs -ma and note the line that contains saposcol key 4dbe. You need the shared memory ID. After that, execute command ipcrm -m ID. Now the commandsaposcol -s should show that saposcol is not running and that the shared memory is not attached. The shared memory key will be created automatically by the saposcol when the collector is next started: saposcol -l.

So I run ipcs to check for the existence of 4dbe. And I find an entry for this key.

There are several process ids attached to this segment. However, only one of them actually exists (PID 2293794).

# ipcs -ma | grep 4dbe

m 2097156 0x00004dbe --rw-rw-rw- root sapsys root sapsys 0 1870188 467164 2293794 9:39:43 9:39:43 8:12:43

# ps -fp 1870188

UID PID PPID C STIME TTY TIME CMD

# ps -fp 467164

UID PID PPID C STIME TTY TIME CMD

# ps -fp 2293794

UID PID PPID C STIME TTY TIME CMD

sapadm 2293794 1 0 19:26:41 - 1:07 sapccm4x -DCCMS pf=/sapmnt/SAP/profile/SAP_DVEBMGS00_aix01

I ask the SAP admin to stop this process, which he does. Now I remove the shared memory segment and there is no evidence of 4dbe in the ipcs output.

# ipcrm -m 2097156

# ipcs -ma | grep 4dbe

We were then able to start saposcol again with success. The process is running and the shared memory segment 4dbe has returned.

# ipcs -ma | grep 4dbe

m 3145732 0x00004dbe --rw-rw-rw- root sapsys root sapsys 1 1870188 2293818 2207892 10:14:25 10:14:25 10:12:47

# ps -ef | grep oscol

sapadm 2064616 1 0 10:12:53 - 0:00 saposcol -l

# /usr/sap/SAP/SYS/exe/run/saposcol -s

**************************************************************

Collector Versions :

running : COLL 20.95 700 - AIX v11.15 5L-64 bit 080317

dialog : COLL 20.95 700 - AIX v11.15 5L-64 bit 080317

Shared Memory : attached

Number of records : 17640

Active Flag : active (01)

Operating System : AIX aix01 3 5 00C01C704C00

Collector PID : 2064616 (001F80E8)

Collector : running

Start time coll. : Thu Jan 22 10:12:53 2009

Current Time : Thu Jan 22 10:17:46 2009

Last write access : Thu Jan 22 10:17:38 2009

Last Read Access : Thu Jan 22 10:16:16 2009

Collection Interval : 10 sec (next delay).

Collection Interval : 10 sec (last ).

Status : free

Collect Details : required

Refresh : required

Header Extention Structure

Number of x-header Records : 1

Number of Communication Records : 60

Number of free Com. Records : 60

Resulting offset to 1.data rec. : 61

Trace level : 2

Collector in IDLE - mode ? : NO

become idle after 300 sec without read access.

Length of Idle Interval : 60 sec

Length of norm.Interval : 10 sec

**************************************************************

Problem solved and no reboot required.....this is AIX after all! :)