Examine PowerHA log files for errors with clanalyzeThe new clanalyze command in PowerHA 7.2.2 (Dec 2017), analyses log files for errors and provides an analysis report.
It can perform the following tasks:
clanalyze command http
e.g.
To analyze log files for all recent errors, enter the following command:
# halevel -s 7.2.2 SP1
# /usr Following nodes will be considered for analysis or extraction: cgha1 cgha2. Log analyzer may take some time to provide analysis report. Less than 1% analysis is completed 100% analysis is completed
Recent failure: Node failure occurred at time: 2018-07-04T00:56:14
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): 0
Analysis completed successfully.
To verify the status of several daemons such as syslogd or errdemon on specific nodes, enter the following command:
# /usr Following nodes will be considered for analysis or extraction: cgha1 cgha2.
NODE: cgha1 SYSLOGD STATE : active ERRDEMON STATE : active SYSLOGD CONFIGURATION: ---- Faci aso. aso. aso.
caa.
*.de loca loca ERRORDEMON CONFIGURATION: ---- Log File /var/adm/ras/errlog Log Size 1048576 bytes Memory Buffer Size 32768 bytes Duplicate Removal true Duplicate Interval 10000 milliseconds Duplicate Error Maximum 1000 PureScale Logging off PureScale Logstream Cent
NODE: cgha2 SYSLOGD STATE : active ERRDEMON STATE : active SYSLOGD CONFIGURATION: ---- Faci aso. aso. aso.
caa.
*.de loca loca ERRORDEMON CONFIGURATION: ---- Log File /var/adm/ras/errlog Log Size 1048576 bytes Memory Buffer Size 32768 bytes Duplicate Removal true Duplicate Interval 10000 milliseconds Duplicate Error Maximum 1000 PureScale Logging off PureScale Logstream Cent Verification of log daemons is successful.
To perform error analysis for all supported errors:
# /usr Following nodes will be considered for analysis or extraction: cgha1 cgha2. Log analyzer may take some time to provide analysis report. Less than 1% analysis is completed 100% analysis is completed Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): 0
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): 0
Time at which Interface failure occu Interface name with fail Node at which Interface failure occu
Time at which Interface failure occu Interface name with fail Node at which Interface failure occu
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Interface failure occu Interface name with fail Node at which Interface failure occu
Time at which Interface failure occu Interface name with fail Node at which Interface failure occu
Time at which Interface failure occu Interface name with fail Node at which Interface failure occu
Time at which Interface failure occurred : Jul 1 19:44:21 Interface name with fail Node at which Interface failure occu
Time at which Interface failure occu Interface name with failure Node at which Interface failure occu
Time at which Interface failure occu Interface name with fail Node at which Interface failure occu
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Interface failure occu Interface name with fail Node at which Interface failure occu
Time at which Interface failure occu Interface name with fail Node at which Interface failure occu
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node failure : Information lost. Logs got recycled. Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occu Node name Description for node fail Probable cause for node fail Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Note: Any field left blank indicates that element does not exist in log files Analysis report is available at /var Analysis completed successfully. |
Hi cggibbo,
Thank you so much sir, Your always quickly response of my question.
I am agree with because the problem happened long back date it's hard to find it.
Thanks,
CK.
Hi Charin, looking at this output, no it is not possible to determine the reason for any node failure(s). The message, "Information lost. Logs got recycled." indicates that the PowerHA log files were recycled and that the information (relating to cluster events) is no longer available.
Hi Chris,
Thank you for your useful technical article.
Today (13 Feb 19) customer raise support case for power HA: 7220 version on AIX: 7100-05-02-1810.
But
our customer needs IBM support to analysis logs to find "Why is
cluster Failover from Active to Standby" on 21 Oct 18 during
09:30-12:00pm.
Then IBM local ran the command: # /usr/es/sbin/cluster/clanalyze/clanalyze -a -o all
Here is the excerpt output from report file.
---------------------------START OF excerpt report -------------------------
EVENT: Node Failure
-------------------
Time at which Node failure occurred : 2018-10-21T02:00:23
Node name : intdbs2a
Description for node failure : Information lost. Logs got recycled.
Probable cause for node failure : Information lost. Logs got recycled.
Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occurred : 2018-10-21T02:00:26
Node name : intdbs2b
Description for node failure : Information lost. Logs got recycled.
Probable cause for node failure : Information lost. Logs got recycled.
Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occurred : 2018-10-21T02:02:31
Node name : intdbs2a
Description for node failure : Information lost. Logs got recycled.
Probable cause for node failure : Information lost. Logs got recycled.
Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occurred : 2018-10-21T02:02:37
Node name : intdbs2b
Description for node failure : Information lost. Logs got recycled.
Probable cause for node failure : Information lost. Logs got recycled.
Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
Time at which Node failure occurred : 2018-10-21T09:53:03
Node name : intdbs2a
Description for node failure : Information lost. Logs got recycled.
Probable cause for node failure : Information lost. Logs got recycled.
Reason for node failure(0=SOFT IPL 1=HALT 2=TIME REBOOT): Information lost. Logs got recycled.
No Site failure is observed.
--------------------------- END OF excerpt report -------------------------
My question is:
Reference
the above output report. Do anyone (powerHA expert) can analysis
output to figure out the root cause of "Why is cluster Failover from
Active to Standby" on 21 Oct 18 during 09:30-12:00pm."
Note: intdbs2a is Active note
: intdbs2b is Standby node
Regards,
Charin Kumjudpai (IBM Thailand as AIX L1 support)