Cluster Aware AIXUsing Cluster Aware AIX with AIX 7.1 you can create a cluster of AIX nodes. This interconnected cluster of nodes can immediately provide several capabilities, such as:
· Clusterwide event management. · Communication and storage events: o Node UP and node DOWN o Network adapter UP and DOWN o Network address change o Point-of-contact UP and DOWN o Disk UP and DOWN o Predefined and user-defined events · Clusterwide storage naming service · Clusterwide command distribution · Clusterwide communication making use of networking and storage connectivity
All of the capabilities of CAA are built into the AIX Version 7.1 operating system. CAA is essentially a set of tools and services embedded into AIX that help to manage a cluster of AIX nodes and assist in running cluster software on AIX. Cluster products, such as PowerHA and RSCT from IBM, can utilise CAA to simplify the configuration, management and monitoring of an AIX cluster.
CAA does not form a cluster by itself. It is a tool set. There is no notion of quorum. (If 20 nodes of a 21 node cluster are down, CAA still runs on the remaining node). CAA does not eject nodes from a cluster. CAA provides tools to fence a node but never fences a node and will continue to run on a fenced node.
As you can see in the following diagram, cluster products, like PowerHA, will integrate with CAA to help form and manage highly available clusters.
So you will still need some form of cluster product, either from IBM or another vendor, in order to build a cluster that provides high availability capabilities like node failover/takeover.
What follows are some of the most important “snippets” from the CAA documentation (IMHO), available in the AIX 7.1 Information Centre (see the link at the bottom of the page).
Just like any cluster, each node that is added to a cluster by using CAA must have common storage devices available, for example via SAN storage device. These storage devices are used for the cluster repository disk and for any clustered shared data disks.
The Storage Naming Service provides a global device view across all the nodes in the cluster. The Storage Naming Service also provides a single global device name for a disk from any node in the cluster. The global device name, for example, cldisk1, refers to the same physical disk from any node in the cluster.
The cluster repository disk is used as the central repository for the cluster configuration data. The cluster repository disk must be accessible from all nodes in the cluster and is a minimum of 10 GB in size. Given the importance of the cluster configuration data, the cluster repository disk should be backed up by a redundant and highly available storage configuration. Even though the cluster repository disk is visible as a disk device to AIX, it should be treated as a special device for the cluster. The use of LVM commands is not supported when used on a cluster repository disk. The AIX LVM commands are designed as single node administrative commands, and are not applicable in a clustered configuration.
The cluster repository disk is renamed to a private device name (caa_private0). A raw section of the disk and a section of the disk that contains a special volume group and special logical volumes are used during cluster operations.
A multicast address is used for cluster communications between the nodes in the cluster. These are configured automatically during the creation of the cluster. These nodes support cluster monitoring of events and cluster configuration attributes.
Scalable reliable multicasting is implemented in the cluster with a special gossip protocol over the multicast address. The gossip protocol determines the node configuration and then transmits the gossip packets over all available networking and storage communication interfaces (either Fibre Channel and/or SAS adapters). If no storage communication interfaces are configured, only the traditional networking interfaces are used.
When you first configure CAA, the following actions are performed:
• The cluster is created using the mkcluster command. • The cluster configuration is written to the raw section of the cluster repository disk. • Primary and secondary database nodes are selected from the list of candidate nodes in the mkcluster command. • Special volume groups and logical volumes are created on the cluster repository disk. • Cluster file systems are created on the special volume group. • The cluster repository database is created on both primary and secondary nodes. • The cluster repository database is started. • Cluster services are made available to other functions in the operating system, such as Reliable Scalable Cluster Technology (RSCT) and PowerHA. • Storage framework register lists are created on the cluster repository disk. • A global device namespace is created and interaction with LVM starts for handling associated volume group events. • A clusterwide multicast address is established. • The node discovers all of the available communication interfaces. • The cluster interface monitoring starts. • The cluster interacts with Autonomic Health Advisory File System (AHAFS) for clusterwide event distribution. • The cluster exports cluster messaging and cluster socket services to other functions in the operating system, such as Reliable Scalable Cluster Technology (RSCT) and PowerHA.
In the following example I created a two node cluster using CAA tools. First of all, I had to use the mkcluster command to define the cluster nodes, shared storage and repository disk. The node names are 7502lp01 and 7502lp02. The shared data storage disks are hdisk2, hdisk3 and hdisk4. The repository disk, used to house the cluster configuration data, is hdisk1.
# mkcluster -n
mycluster -r hdisk1 -d hdis mkcluster: Cluster shared disks are automatically renamed to names such as cldisk1, [cldisk2, ...] on all cluster nodes. However, this cannot take place while a disk is busy or on a node which is down or not reachable. If any disks cannot be renamed now, they will be renamed later by the clconfd daemon, when the node is available and the disks are not busy.
You’ll notice that the mkcluster command informed me that the cluster shared disks are automatically renamed to cluster disk names like cldisk1. After I’d run the command I noticed something quite interesting and impressive. Prior to configuring the cluster the shared disks on both nodes had names like hdiskX. Afterwards the shared disks had been renamed across both nodes. Now these disks had the same name!
This is going to simplify cluster configuration and management. No more will I need to remove and recreate disks in order to resolve disk naming inconsistencies in a cluster. The lspv output (shown below) from both nodes shows that I have three shared disks (cldisk1, cldisk2 and cldisk3). These would be used for shared data. The disk named caa_private0 is my cluster repository disk. This is used to store and share the cluster configuration data.
7502lp01: # lspv | sort caa_private0 00f6 cldisk1 00f6 cldisk2 non cldisk3 non hdisk0 00f6
7502lp02: # lspv | sort caa_private0 00f6 cldisk1 00f6 cldisk2 non cldisk3 none hdisk0 00f6
Cluster Aware AIX tells you what nodes are in the cluster plus information on those nodes, including state. A special “gossip” protocol is used over the multicast address to determine node information and implement scalable reliable multicast. No traditional heartbeat mechanism is employed. Gossip packets travel over all interfaces, including storage. Immediately after running mkcluster, I was able to query the status of the nodes in my cluster without any further configuration!
# lscluster -m Calling node query for all nodes Node query number of nodes examined: 2
Node name: 7502lp01 Cluster shorthand id for node: 1 uuid for node:
3cd9 State of node: UP Smoothed rtt to node: 7 Mean Deviation in network rtt to node: 3 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID mycluster local 267c
Number of points_of_contact for node: 1 Point-of-contact interface & contact state en0 UP
----
Node name: 7502lp02 Cluster shorthand id for node: 2 uuid for node:
d1a4 State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of zones this node is a member in: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID mycluster local 267c
Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a
CAA tells you what interfaces have been discovered on a node plus information on those interfaces, including state.
# lscluster –i –n Node 7502lp01 Node uuid =
110b Number of interfaces discovered = 1 Interface number 1 en0 ifnet type = 6 ndd type = 7 Mac address length = 6 Mac address = 16.12.a0.0.30.2 Smoothed rrt across interface = 7 Mean Deviation in network rrt across interface = 3 Probe interval for interface = 100 ms ifnet flags for interface = 0x1e080863 ndd flags for interface = 0x21081b Interface state UP Number of regular addresses configured on interface = 1 IPV4 ADDRESS: 9.3.28.136 broadcast 9.3.28.159 netmask 255.255.255.224 Number of cluster multicast addresses configured on interface = 1 IPV4 MULTICAST ADDRESS: 228.8.8.8 broadcast 0.0.0.0 netmask 0.0.0.0
CAA tells you what disks are in the cluster plus information on those disks, including the state and type.
# lscluster –d
Storage Interface Query Cluster Name: mycluster Cluster uuid:
a1bd Number of nodes reporting = 2 Number of nodes expected = 2 Node
osca Node uuid =
9807 Number of disk discovered = 2 cldisk1 state : UP uDid
: 533E Mfcp type : CLUSDISK uUid
: 9807 hdisk1 state : UP uDid : uUid
: 600a type : REPDISK
You can run cluster wide commands on all nodes immediately after configuring your cluster using the clcmd command. No further configuration required.
# clcmd ps -ef
---- NODE 7502lp01 ---- UID PID PPID C STIME TTY TIME CMD root 1 0 0 Aug 25 - 0:00 /etc/init root 655526 1 0 Aug 25 - 0:18 [cimserve] root 1310840 1 0
Aug 25 - 0:00 /usr root 2490482 1 0 Aug 25 - 0:42 /usr/sbin/srcmstr root 3145874 3801262 0 11:17:13 - 0:00 telnetd -a root 3276994 2490482 0 Aug 25 - 0:10 /usr/sbin/snmpd
---- NODE 7502lp02 ---- UID PID PPID C STIME TTY TIME CMD root 1 0 1 Aug 28 - 0:00 /etc/init root 1441934 1 0 Aug 28 - 0:39 /usr/sbin/syncd 60 root 1638496 1
0 Aug 28 -
0:00 /usr root 2228252 2556054 0 Aug 28 - 0:00 /usr/sbin/hostmibd
I believe that PowerHA SystemMirror 7.1 will be the first cluster product from IBM to provide high availability on AIX using the new Cluster Aware AIX features. An IBM Redbook Residency recently covered the integration of CAA and PowerHA. I’m eagerly awaiting the first draft of this book so that I can learn how these two components will “play” together.
Refer to the AIX 7.1 Information Centre for more information on Cluster Aware AIX:
http
Refer to the AIX 7.1 Information Centre for more information on AHAFS and the new AIX Event Infrastructure:
|
Can you tune the CAA LUNS to be more tolerant when the storage subsystem experiences a ISL flapping condition or a scenario where the SVC reboots frequently due to recovery attempts? The last time we has a ISL related issue we saw the CAA luns had pending I/O which caused a DMS timeout
The following article describes all the Cluster Aware AIX (CAA) tunables and possible recommendations.
https://www.ibm.com/developerworks/aix/library/au-aix-powerha-caa/index.html
The online CAA documentation appears to be out dated. Basically CAA needs 256MB and PowerHA needs a 512MB disk. The initial 7.1.0 release did have a 10GB lun requirement documented, which later changed to 1GB, but going forward with the new usage of CAA with HA versions 7.1.1 and 7.1.2 they only need 512MB. Thus the smallest size is 512MB the largest if 460GB. The size requirements should not change even for a 16 node cluster, so we typically advise clients to use their standard lun size if they don't want to jump through hoops with their storage team, or assign it as small as possible w/the 512MB min size since anything beyond that would be wasted space. Note that in the first release we would create a database w/the Solid DB fileset and we no longer exploit it in V7.1.1 or 7.1.2. So you basically only see raw lvs in the private VG that gets created, you no longer see the filesystems that we had in V7.1.0. At this stage, it's not possible to have multiple repos disks. Only the backup disk (for hot replacement).
I read in an IBM document that size of Cluster repo disk should be greater than 512MB and less than 460GB. Please confirm Also I have a related question. Can we have multiple repo disks, (mirrored or unmirrored ) as cluster repository disks to avoid single disk as a SPOF. I am aware that we can replace the repo disk.
However, i don't have AIX Express Edition, because my servers are on Production Environment, and I'm working at IBM italy (Rome).
Thanks, but i have latest SP of PowerHA yet, and i have opened a PMR for this problem. Today i have also upload "snap" files. When i'll solve this problem, i can write here the solution for traceability, ok? Thanks in advance, Bye Bye
We have often seen this message and is resolved by unlocking CAA
I asked my PowerHA guru about this and his advice was: "when testing labs for...class, I ran into it ....what I did to get it to work was create a bare PowerHA cluster, sync, let it create the CAA cluster like normal. Then go back and remove/delete the cluster definition and then I could do mkcluster manually after that. Another thing I discovered (searching PMRs when I hit it) might be a cause, is AIX Express edition does not include CAA support. So there is a chance IF they had Express edition that CAA would not be able to be used to." Of course, if none of this helps, you should open a new PMR with IBM support.
I have not come across this error before. I would recommend installing the latest updates for PowerHA 7.1 and then try again.
yes, i have installed yet. My realease is "PowerHA 7.1.1" do you have any idea?
sure, i have installed yet the POWERHA 7.1.1: [dmv02trnx:root:/home/root:] halevel -s 7.1.1 GA [dmv02trnx:root:/home/root:] mkcluster -r hdisk300 -d hdisk302 -m dmv02trnx,dmv03trnx -v INFO: START mkcluster: Cluster product not licensed. [dmv02trnx:root:/home/root:] someone know, why i have this problem? Thanks in advance