Cluster Aware AIX (Chris's AIX Blog)

The developerWorks Connections platform will be sunset on December 31, 2019. On January 1, 2020, this blog will no longer be available. More details available on our FAQ.

Cluster Aware AIX

cggibbo Oct 21 2010 Comments (13) Visits (97806)

3 people like this

Using Cluster Aware AIX with AIX 7.1 you can create a cluster of AIX nodes. This interconnected cluster of nodes can immediately provide several capabilities, such as:

· Clusterwide event management.

· Communication and storage events:

o Node UP and node DOWN

o Network adapter UP and DOWN

o Network address change

o Point-of-contact UP and DOWN

o Disk UP and DOWN

o Predefined and user-defined events

· Clusterwide storage naming service

· Clusterwide command distribution

· Clusterwide communication making use of networking and storage connectivity

All of the capabilities of CAA are built into the AIX Version 7.1 operating system. CAA is essentially a set of tools and services embedded into AIX that help to manage a cluster of AIX nodes and assist in running cluster software on AIX. Cluster products, such as PowerHA and RSCT from IBM, can utilise CAA to simplify the configuration, management and monitoring of an AIX cluster.

CAA does not form a cluster by itself. It is a tool set. There is no notion of quorum. (If 20 nodes of a 21 node cluster are down, CAA still runs on the remaining node). CAA does not eject nodes from a cluster. CAA provides tools to fence a node but never fences a node and will continue to run on a fenced node.

As you can see in the following diagram, cluster products, like PowerHA, will integrate with CAA to help form and manage highly available clusters.

So you will still need some form of cluster product, either from IBM or another vendor, in order to build a cluster that provides high availability capabilities like node failover/takeover.

What follows are some of the most important “snippets” from the CAA documentation (IMHO), available in the AIX 7.1 Information Centre (see the link at the bottom of the page).

Just like any cluster, each node that is added to a cluster by using CAA must have common storage devices available, for example via SAN storage device. These storage devices are used for the cluster repository disk and for any clustered shared data disks.

The Storage Naming Service provides a global device view across all the nodes in the cluster. The Storage Naming Service also provides a single global device name for a disk from any node in the cluster. The global device name, for example, cldisk1, refers to the same physical disk from any node in the cluster.

The cluster repository disk is used as the central repository for the cluster configuration data. The cluster repository disk must be accessible from all nodes in the cluster and is a minimum of 10 GB in size. Given the importance of the cluster configuration data, the cluster repository disk should be backed up by a redundant and highly available storage configuration. Even though the cluster repository disk is visible as a disk device to AIX, it should be treated as a special device for the cluster. The use of LVM commands is not supported when used on a cluster repository disk. The AIX LVM commands are designed as single node administrative commands, and are not applicable in a clustered configuration.

The cluster repository disk is renamed to a private device name (caa_private0). A raw section of the disk and a section of the disk that contains a special volume group and special logical volumes are used during cluster operations.

A multicast address is used for cluster communications between the nodes in the cluster.

These are configured automatically during the creation of the cluster. These nodes support cluster monitoring of events and cluster configuration attributes.

Scalable reliable multicasting is implemented in the cluster with a special gossip protocol over the multicast address. The gossip protocol determines the node configuration and then transmits the gossip packets over all available networking and storage communication interfaces (either Fibre Channel and/or SAS adapters). If no storage communication interfaces are configured, only the traditional networking interfaces are used.

When you first configure CAA, the following actions are performed:

• The cluster is created using the mkcluster command.

• The cluster configuration is written to the raw section of the cluster repository disk.

• Primary and secondary database nodes are selected from the list of candidate nodes in the mkcluster command.

• Special volume groups and logical volumes are created on the cluster repository disk.

• Cluster file systems are created on the special volume group.

• The cluster repository database is created on both primary and secondary nodes.

• The cluster repository database is started.

• Cluster services are made available to other functions in the operating system, such as Reliable Scalable Cluster Technology (RSCT) and PowerHA.

• Storage framework register lists are created on the cluster repository disk.

• A global device namespace is created and interaction with LVM starts for handling associated volume group events.

• A clusterwide multicast address is established.

• The node discovers all of the available communication interfaces.

• The cluster interface monitoring starts.

• The cluster interacts with Autonomic Health Advisory File System (AHAFS) for clusterwide event distribution.

• The cluster exports cluster messaging and cluster socket services to other functions in the operating system, such as Reliable Scalable Cluster Technology (RSCT) and PowerHA.

In the following example I created a two node cluster using CAA tools. First of all, I had to use the mkcluster command to define the cluster nodes, shared storage and repository disk. The node names are 7502lp01 and 7502lp02. The shared data storage disks are hdisk2, hdisk3 and hdisk4. The repository disk, used to house the cluster configuration data, is hdisk1.

# mkcluster -n mycluster -r hdisk1 -d hdisk2,hdisk3,hdisk4 -m 7502lp01,7502lp02

mkcluster: Cluster shared disks are automatically renamed to names such as

cldisk1, [cldisk2, ...] on all cluster nodes. However, this cannot

take place while a disk is busy or on a node which is down or not

reachable. If any disks cannot be renamed now, they will be renamed

later by the clconfd daemon, when the node is available and the disks

are not busy.

You’ll notice that the mkcluster command informed me that the cluster shared disks are automatically renamed to cluster disk names like cldisk1. After I’d run the command I noticed something quite interesting and impressive. Prior to configuring the cluster the shared disks on both nodes had names like hdiskX. Afterwards the shared disks had been renamed across both nodes. Now these disks had the same name!

This is going to simplify cluster configuration and management. No more will I need to remove and recreate disks in order to resolve disk naming inconsistencies in a cluster. The lspv output (shown below) from both nodes shows that I have three shared disks (cldisk1, cldisk2 and cldisk3). These would be used for shared data. The disk named caa_private0 is my cluster repository disk. This is used to store and share the cluster configuration data.

7502lp01:

# lspv | sort

caa_private0 00f61ab20b97190d caavg_private active

cldisk1 00f61ab20bf28ac6 None

cldisk2 none None

cldisk3 none None

hdisk0 00f61ab2f73e46e2 rootvg active

7502lp02:

# lspv | sort

caa_private0 00f61ab20b97190d caavg_private active

cldisk1 00f61ab20bf28ac6 None

cldisk2 none None

cldisk3 none None

hdisk0 00f61ab2895e4cbe rootvg active

Cluster Aware AIX tells you what nodes are in the cluster plus information on those nodes, including state. A special “gossip” protocol is used over the multicast address to determine node information and implement scalable reliable multicast. No traditional heartbeat mechanism is employed. Gossip packets travel over all interfaces, including storage. Immediately after running mkcluster, I was able to query the status of the nodes in my cluster without any further configuration!

# lscluster -m

Calling node query for all nodes

Node query number of nodes examined: 2

Node name: 7502lp01

Cluster shorthand id for node: 1

uuid for node: 3cd9cb00-bf55-11df-b015-6e8dd0af6304

State of node: UP

Smoothed rtt to node: 7

Mean Deviation in network rtt to node: 3

Number of zones this node is a member in: 0

Number of clusters node is a member in: 1

CLUSTER NAME TYPE SHID UUID

mycluster local 267ce7fc-bf55-11df-a3b9-6e8dd877b814

Number of points_of_contact for node: 1

Point-of-contact interface & contact state

en0 UP

------------------------------

Node name: 7502lp02

Cluster shorthand id for node: 2

uuid for node: d1a46164-bf46-11df-94b3-6e8dd877b814

State of node: UP NODE_LOCAL

Smoothed rtt to node: 0

Mean Deviation in network rtt to node: 0

Number of zones this node is a member in: 0

Number of clusters node is a member in: 1

CLUSTER NAME TYPE SHID UUID

mycluster local 267ce7fc-bf55-11df-a3b9-6e8dd877b814

Number of points_of_contact for node: 0

Point-of-contact interface & contact state

n/a

CAA tells you what interfaces have been discovered on a node plus information on those interfaces, including state.

# lscluster –i –n

Node 7502lp01

Node uuid = 110b2422-7efc-11df-aed7-1612a0003002

Number of interfaces discovered = 1

Interface number 1 en0

ifnet type = 6 ndd type = 7

Mac address length = 6

Mac address = 16.12.a0.0.30.2

Smoothed rrt across interface = 7

Mean Deviation in network rrt across interface = 3

Probe interval for interface = 100 ms

ifnet flags for interface = 0x1e080863

ndd flags for interface = 0x21081b

Interface state UP

Number of regular addresses configured on interface = 1

IPV4 ADDRESS: 9.3.28.136 broadcast 9.3.28.159 netmask 255.255.255.224

Number of cluster multicast addresses configured on interface = 1

IPV4 MULTICAST ADDRESS: 228.8.8.8 broadcast 0.0.0.0 netmask 0.0.0.0

CAA tells you what disks are in the cluster plus information on those disks, including the state and type.

# lscluster –d

Storage Interface Query

Cluster Name: mycluster

Cluster uuid: a1bde416-83b9-11df-b438-1612a0004002

Number of nodes reporting = 2

Number of nodes expected = 2

Node oscar-dev8.austin.ibm.com

Node uuid = 9807465b-3194-f1e5-8a04-1f044bc82593

Number of disk discovered = 2

cldisk1

state : UP

uDid : 533E3E213600A0B8000475C200000E7114BC825930F1818 FAStT03IB

Mfcp05VDASD03AIXvscsi

type : CLUSDISK

uUid : 9807465b-3194-f1e5-8a04-1f044bc82593

hdisk1

state : UP

uDid :

uUid : 600a0b80-0047-5d0a-0000-e6094bc82690

type : REPDISK

You can run cluster wide commands on all nodes immediately after configuring your cluster using the clcmd command. No further configuration required.

# clcmd ps -ef

-------------------------------

NODE 7502lp01

-------------------------------

UID PID PPID C STIME TTY TIME CMD

root 1 0 0 Aug 25 - 0:00 /etc/init

root 655526 1 0 Aug 25 - 0:18 [cimserve]

root 1310840 1 0 Aug 25 - 0:00 /usr/ccs/bin/shlap64

root 2490482 1 0 Aug 25 - 0:42 /usr/sbin/srcmstr

root 3145874 3801262 0 11:17:13 - 0:00 telnetd -a

root 3276994 2490482 0 Aug 25 - 0:10 /usr/sbin/snmpd

-------------------------------

NODE 7502lp02

-------------------------------

UID PID PPID C STIME TTY TIME CMD

root 1 0 1 Aug 28 - 0:00 /etc/init

root 1441934 1 0 Aug 28 - 0:39 /usr/sbin/syncd 60

root 1638496 1 0 Aug 28 - 0:00 /usr/ccs/bin/shlap64

root 2228252 2556054 0 Aug 28 - 0:00 /usr/sbin/hostmibd

I believe that PowerHA SystemMirror 7.1 will be the first cluster product from IBM to provide high availability on AIX using the new Cluster Aware AIX features. An IBM Redbook Residency recently covered the integration of CAA and PowerHA. I’m eagerly awaiting the first draft of this book so that I can learn how these two components will “play” together.

Refer to the AIX 7.1 Information Centre for more information on Cluster Aware AIX:

http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusteraware/claware_main.htm

Refer to the AIX 7.1 Information Centre for more information on AHAFS and the new AIX Event Infrastructure:

http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.baseadmn/doc/baseadmndita/aix_ev.htm

Tags: mkcluster aix event aware caa_private0 caa ahafs lscluster cluster cldisk1 7.1

Comments (13)

Add a Comment

Quarantine this Entry

miszrmg commented Jan 5 2018 Conversation Permalink

Can you tune the CAA LUNS to be more tolerant when the storage subsystem experiences a ISL flapping condition or a scenario where the SVC reboots frequently due to recovery attempts? The last time we has a ISL related issue we saw the CAA luns had pending I/O which caused a DMS timeout

cggibbo commented Jan 5 2018 Conversation Permalink

The following article describes all the Cluster Aware AIX (CAA) tunables and possible recommendations.

https://www.ibm.com/developerworks/aix/library/au-aix-powerha-caa/index.html

cggibbo commented Sep 19 2013 Comment Permalink

The online CAA documentation appears to be out dated. Basically CAA needs 256MB and PowerHA needs a 512MB disk. The initial 7.1.0 release did have a 10GB lun requirement documented, which later changed to 1GB, but going forward with the new usage of CAA with HA versions 7.1.1 and 7.1.2 they only need 512MB. Thus the smallest size is 512MB the largest if 460GB. The size requirements should not change even for a 16 node cluster, so we typically advise clients to use their standard lun size if they don't want to jump through hoops with their storage team, or assign it as small as possible w/the 512MB min size since anything beyond that would be wasted space. Note that in the first release we would create a database w/the Solid DB fileset and we no longer exploit it in V7.1.1 or 7.1.2. So you basically only see raw lvs in the private VG that gets created, you no longer see the filesystems that we had in V7.1.0. At this stage, it's not possible to have multiple repos disks. Only the backup disk (for hot replacement).

ashishnain commented Sep 19 2013 Comment Permalink

I read in an IBM document that size of Cluster repo disk should be greater than 512MB and less than 460GB. Please confirm Also I have a related question. Can we have multiple repo disks, (mirrored or unmirrored ) as cluster repository disks to avoid single disk as a SPOF. I am aware that we can replace the repo disk.

JP91_Maurizio_Pagani commented Apr 17 2013 Comment Permalink

However, i don't have AIX Express Edition, because my servers are on Production Environment, and I'm working at IBM italy (Rome).

JP91_Maurizio_Pagani commented Apr 17 2013 Comment Permalink

Thanks, but i have latest SP of PowerHA yet, and i have opened a PMR for this problem. Today i have also upload "snap" files. When i'll solve this problem, i can write here the solution for traceability, ok? Thanks in advance, Bye Bye

PrabhanjanG commented Apr 12 2013 Comment Permalink

We have often seen this message and is resolved by unlocking CAA

cggibbo commented Apr 10 2013 Comment Permalink

I asked my PowerHA guru about this and his advice was: "when testing labs for...class, I ran into it ....what I did to get it to work was create a bare PowerHA cluster, sync, let it create the CAA cluster like normal. Then go back and remove/delete the cluster definition and then I could do mkcluster manually after that. Another thing I discovered (searching PMRs when I hit it) might be a cause, is AIX Express edition does not include CAA support. So there is a chance IF they had Express edition that CAA would not be able to be used to." Of course, if none of this helps, you should open a new PMR with IBM support.

cggibbo commented Apr 9 2013 Comment Permalink

I have not come across this error before. I would recommend installing the latest updates for PowerHA 7.1 and then try again.

JP91_Maurizio_Pagani commented Apr 9 2013 Comment Permalink

yes, i have installed yet. My realease is "PowerHA 7.1.1" do you have any idea?

JP91_Maurizio_Pagani commented Apr 8 2013 Comment Permalink

sure, i have installed yet the POWERHA 7.1.1: [dmv02trnx:root:/home/root:] halevel -s 7.1.1 GA [dmv02trnx:root:/home/root:] mkcluster -r hdisk300 -d hdisk302 -m dmv02trnx,dmv03trnx -v INFO: START mkcluster: Cluster product not licensed. [dmv02trnx:root:/home/root:] someone know, why i have this problem? Thanks in advance

Blogs

Chris's AIX Blog

About this blog

Related posts

Webcast: Db2 12 for ...

WebGUI 8.1 - Exampl...

Two opportunities to...

Testing AIX Live Upd...

Introducing direct S...

Tags

Selected Tags

Related Tags

Cluster Aware AIX

Send Email Notification

Quarantine this entry

Mark as Duplicate

Comments (13)