Examine PowerHA log files for errors with clanalyze (Chris's AIX Blog)

The developerWorks Connections platform will be sunset on December 31, 2019. On January 1, 2020, this blog will no longer be available. More details available on our FAQ.

Examine PowerHA log files for errors with clanalyze

cggibbo Aug 1 2018 Comments (4) Visits (9615)

0 people like this

The new clanalyze command in PowerHA 7.2.2 (Dec 2017), analyses log files for errors and provides an analysis report.

https://www.ibm.com/support/knowledgecenter/en/SSPHQG_7.2.2/com.ibm.powerha.navigation/powerha_whatsnew.htm

It can perform the following tasks:

Analyses the log files and provides an error report based on error strings or time stamps.
Analyses the core dump file from the AIX error log.
Analyses the log files that are collected through the snap and clsnap utility.
Analyses user-specified snap file based on error strings that are provided and generates a report.

clanalyze command

https://www.ibm.com/support/knowledgecenter/en/SSPHQG_7.2.2/com.ibm.powerha.cmds/clanalyze.htm

e.g.

To analyze log files for all recent errors, enter the following command:

# halevel -s

7.2.2 SP1

# /usr/es/sbin/cluster/clanalyze/clanalyze -a -o recent

Following nodes will be considered for analysis or extraction:

cgha1 cgha2.

Log analyzer may take some time to provide analysis report.

Less than 1% analysis is completed

100% analysis is completed

Recent failure: Node failure occurred at time: 2018-07-04T00:56:14

Time at which Node failure occurred : 2018-07-04T00:56:14

Node name : cgha2

Description for node failure : SYSTEM SHUTDOWN BY USER

Probable cause for node failure : SYSTEM SHUTDOWN

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): 0

Analysis completed successfully.

To verify the status of several daemons such as syslogd or errdemon on specific nodes, enter the following command:

# /usr/es/sbin/cluster/clanalyze/clanalyze -v

Following nodes will be considered for analysis or extraction:

cgha1 cgha2.

NODE: cgha1

SYSLOGD STATE : active

ERRDEMON STATE : active

SYSLOGD CONFIGURATION:

----------------------

Facility.Priority Destination Size RotationCount

aso.notice /var/log/aso/aso.log 1m 8

aso.info /var/log/aso/aso_process.log 1m 8

aso.debug /var/log/aso/aso_debug.log 32m 8

caa.debug /var/adm/ras/syslog.caa 10m 10

*.debug /var/log/syslog

local0.crit /dev/console

local0.info;user.notice;daemon.notice /var/hacmp/adm/cluster.log 1m 8

ERRORDEMON CONFIGURATION:

-------------------------

Log File /var/adm/ras/errlog

Log Size 1048576 bytes

Memory Buffer Size 32768 bytes

Duplicate Removal true

Duplicate Interval 10000 milliseconds

Duplicate Error Maximum 1000

PureScale Logging off

PureScale Logstream CentralizedRAS/Errlog

NODE: cgha2

SYSLOGD STATE : active

ERRDEMON STATE : active

SYSLOGD CONFIGURATION:

----------------------

Facility.Priority Destination Size RotationCount

aso.notice /var/log/aso/aso.log 1m 8

aso.info /var/log/aso/aso_process.log 1m 8

aso.debug /var/log/aso/aso_debug.log 32m 8

caa.debug /var/adm/ras/syslog.caa 10m 10

*.debug /var/log/syslog

local0.crit /dev/console

local0.info;user.notice;daemon.notice /var/hacmp/adm/cluster.log 1m 8

ERRORDEMON CONFIGURATION:

-------------------------

Log File /var/adm/ras/errlog

Log Size 1048576 bytes

Memory Buffer Size 32768 bytes

Duplicate Removal true

Duplicate Interval 10000 milliseconds

Duplicate Error Maximum 1000

PureScale Logging off

PureScale Logstream CentralizedRAS/Errlog

Verification of log daemons is successful.

To perform error analysis for all supported errors:

# /usr/es/sbin/cluster/clanalyze/clanalyze -a -o all

Following nodes will be considered for analysis or extraction:

cgha1 cgha2.

Log analyzer may take some time to provide analysis report.

Less than 1% analysis is completed

100% analysis is completed

Time at which Node failure occurred : 2018-07-04T00:56:14

Node name : cgha2

Description for node failure : SYSTEM SHUTDOWN BY USER

Probable cause for node failure : SYSTEM SHUTDOWN

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): 0

Time at which Node failure occurred : 2018-07-04T00:56:14

Node name : cgha1

Description for node failure : SYSTEM SHUTDOWN BY USER

Probable cause for node failure : SYSTEM SHUTDOWN

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): 0

Time at which Interface failure occurred : Jul 4 00:54:28

Interface name with failure : en0

Node at which Interface failure occurred : cgha2

Time at which Interface failure occurred : Jul 4 00:54:26

Interface name with failure : en0

Node at which Interface failure occurred : cgha1

Time at which Node failure occurred : 2018-07-04T00:54:22

Node name : cgha2

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-07-04T00:54:19

Node name : cgha1

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Interface failure occurred : Jul 1 19:45:47

Interface name with failure : en0

Node at which Interface failure occurred : cgha1

Time at which Interface failure occurred : Jul 1 19:45:45

Interface name with failure : en0

Node at which Interface failure occurred : cgha2

Time at which Interface failure occurred : Jul 1 19:44:22

Interface name with failure : en0

Node at which Interface failure occurred : cgha2

Time at which Interface failure occurred : Jul 1 19:44:21

Interface name with failure : en0

Node at which Interface failure occurred : cgha1

Time at which Interface failure occurred : Jul 1 19:10:01

Interface name with failure : en0

Node at which Interface failure occurred : cgha2

Time at which Interface failure occurred : Jul 1 19:10:00

Interface name with failure : en0

Node at which Interface failure occurred : cgha1

Time at which Node failure occurred : 2018-07-01T19:09:54

Node name : cgha2

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-07-01T19:09:51

Node name : cgha1

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Interface failure occurred : Jun 26 20:07:49

Interface name with failure : en0

Node at which Interface failure occurred : cgha2

Time at which Interface failure occurred : Jun 26 20:07:48

Interface name with failure : en0

Node at which Interface failure occurred : cgha1

Time at which Node failure occurred : 2018-04-27T08:58:57

Node name : cgha1

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-04-23T13:53:34

Node name : cgha2

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-04-23T13:51:05

Node name : cgha1

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-04-23T13:49:40

Node name : cgha1

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-04-23T13:47:51

Node name : cgha2

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-04-23T13:47:49

Node name : cgha1

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-02-22T16:58:53

Node name : cgha2

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-02-20T22:58:54

Node name : cgha2

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-02-20T22:58:51

Node name : cgha1

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-02-13T21:54:13

Node name : cgha2

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-02-01T17:24:12

Node name : cgha1

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2017-12-20T16:28:48

Node name : cgha2

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2017-12-20T16:28:41

Node name : cgha1

Description for node failure : Information lost. Logs got recycled.

Probable cause for node failure : Information lost. Logs got recycled.

Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Note: Any field left blank indicates that element does not exist in log files

Analysis report is available at /var/hacmp/log/loganalyzer/analysis/report/2018-07-31/report.17170924

Analysis completed successfully.

Tags: log aix chris gibson examine clanalyze errors 7.2.2 files powerha with cgaix for

Comments (4)

Add a Comment

Quarantine this Entry

CHARIN_KUMJUDPAI commented Feb 18 Comment Permalink

Thank you so much cggibbo.

CHARIN_KUMJUDPAI commented Feb 14 Comment Permalink

Hi cggibbo,

Thank you so much sir, Your always quickly response of my question.
I am agree with because the problem happened long back date it's hard to find it.

Thanks,
CK.

cggibbo commented Feb 14 Comment Permalink

Hi Charin, looking at this output, no it is not possible to determine the reason for any node failure(s). The message, "Information lost. Logs got recycled." indicates that the PowerHA log files were recycled and that the information (relating to cluster events) is no longer available.

CHARIN_KUMJUDPAI commented Feb 13 Comment Permalink

Hi Chris,

Thank you for your useful technical article.
Today (13 Feb 19) customer raise support case for power HA: 7220 version on AIX: 7100-05-02-1810.
But our customer needs IBM support to analysis logs to find "Why is cluster Failover from Active to Standby" on 21 Oct 18 during 09:30-12:00pm.

Then IBM local ran the command: # /usr/es/sbin/cluster/clanalyze/clanalyze -a -o all
Here is the excerpt output from report file.
---------------------------START OF excerpt report -------------------------
EVENT: Node Failure
-------------------

Time at which Node failure occurred : 2018-10-21T02:00:23
Node name : intdbs2a
Description for node failure : Information lost. Logs got recycled.
Probable cause for node failure : Information lost. Logs got recycled.
Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-10-21T02:00:26
Node name : intdbs2b
Description for node failure : Information lost. Logs got recycled.
Probable cause for node failure : Information lost. Logs got recycled.
Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-10-21T02:02:31
Node name : intdbs2a
Description for node failure : Information lost. Logs got recycled.
Probable cause for node failure : Information lost. Logs got recycled.
Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-10-21T02:02:37
Node name : intdbs2b
Description for node failure : Information lost. Logs got recycled.
Probable cause for node failure : Information lost. Logs got recycled.
Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

Time at which Node failure occurred : 2018-10-21T09:53:03
Node name : intdbs2a
Description for node failure : Information lost. Logs got recycled.
Probable cause for node failure : Information lost. Logs got recycled.
Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.

No Site failure is observed.
--------------------------- END OF excerpt report -------------------------

My question is:

Reference the above output report. Do anyone (powerHA expert) can analysis output to figure out the root cause of "Why is cluster Failover from Active to Standby" on 21 Oct 18 during 09:30-12:00pm."

Note: intdbs2a is Active note
: intdbs2b is Standby node

Regards,
Charin Kumjudpai (IBM Thailand as AIX L1 support)

Activities	To Do List	High Priority Activities
Blogs	Latest Entries	Public Blogs Listing
Files	Shared With Me	Pinned Folders
Forums	I'm an Owner	Public Forums
Wikis	I'm an Owner	Public Wikis
My Home

Blogs

Chris's AIX Blog

About this blog

Related posts

Testing AIX Live Upd...

HACMP 同窓会

Introducing direct S...

IBM Storage Insights...

IBM Spectrum Control...

Similar Ideas

Client Connectivity ...

Tags

Selected Tags

Related Tags

Examine PowerHA log files for errors with clanalyze

Send Email Notification

Quarantine this entry

Mark as Duplicate

Comments (4)