Using Cluster Aware AIX with AIX 7.1 you can create a cluster of AIX nodes. This interconnected cluster of nodes can immediately provide several capabilities, such as:
Clusterwide event management.
Communication and storage events:
o Node UP and node DOWN
o Network adapter UP and DOWN
o Network address change
o Point-of-contact UP and DOWN
o Disk UP and DOWN
o Predefined and user-defined events
Clusterwide storage naming service
Clusterwide command distribution
Clusterwide communication making use of networking and storage connectivity
All of the capabilities of CAA are built into the AIX Version 7.1 operating system. CAA is essentially a set of tools and services embedded into AIX that help to manage a cluster of AIX nodes and assist in running cluster software on AIX. Cluster products, such as PowerHA and RSCT from IBM, can utilise CAA to simplify the configuration, management and monitoring of an AIX cluster.
CAA does not form a cluster by itself. It is a tool set. There is no notion of quorum. (If 20 nodes of a 21 node cluster are down, CAA still runs on the remaining node). CAA does not eject nodes from a cluster. CAA provides tools to fence a node but never fences a node and will continue to run on a fenced node.
As you can see in the following diagram, cluster products, like PowerHA, will integrate with CAA to help form and manage highly available clusters.
So you will still need some form of cluster product, either from IBM or another vendor, in order to build a cluster that provides high availability capabilities like node failover/takeover.
What follows are some of the most important snippets from the CAA documentation (IMHO), available in the AIX 7.1 Information Centre (see the link at the bottom of the page).
Just like any cluster, each node that is added to a cluster by using CAA must have common storage devices available, for example via SAN storage device. These storage devices are used for the cluster repository disk and for any clustered shared data disks.
The Storage Naming Service provides a global device view across all the nodes in the cluster. The Storage Naming Service also provides a single global device name for a disk from any node in the cluster. The global device name, for example, cldisk1, refers to the same physical disk from any node in the cluster.
The cluster repository disk is used as the central repository for the cluster configuration data. The cluster repository disk must be accessible from all nodes in the cluster and is a minimum of 10 GB in size. Given the importance of the cluster configuration data, the cluster repository disk should be backed up by a redundant and highly available storage configuration. Even though the cluster repository disk is visible as a disk device to AIX, it should be treated as a special device for the cluster. The use of LVM commands is not supported when used on a cluster repository disk. The AIX LVM commands are designed as single node administrative commands, and are not applicable in a clustered configuration.
The cluster repository disk is renamed to a private device name (caa_private0). A raw section of the disk and a section of the disk that contains a special volume group and special logical volumes are used during cluster operations.
A multicast address is used for cluster communications between the nodes in the cluster.
These are configured automatically during the creation of the cluster. These nodes support cluster monitoring of events and cluster configuration attributes.
Scalable reliable multicasting is implemented in the cluster with a special gossip protocol over the multicast address. The gossip protocol determines the node configuration and then transmits the gossip packets over all available networking and storage communication interfaces (either Fibre Channel and/or SAS adapters). If no storage communication interfaces are configured, only the traditional networking interfaces are used.
When you first configure CAA, the following actions are performed:
The cluster is created using the mkcluster command.
The cluster configuration is written to the raw section of the cluster repository disk.
Primary and secondary database nodes are selected from the list of candidate nodes in the mkcluster command.
Special volume groups and logical volumes are created on the cluster repository disk.
Cluster file systems are created on the special volume group.
The cluster repository database is created on both primary and secondary nodes.
The cluster repository database is started.
Cluster services are made available to other functions in the operating system, such as Reliable Scalable Cluster Technology (RSCT) and PowerHA.
Storage framework register lists are created on the cluster repository disk.
A global device namespace is created and interaction with LVM starts for handling associated volume group events.
A clusterwide multicast address is established.
The node discovers all of the available communication interfaces.
The cluster interface monitoring starts.
The cluster interacts with Autonomic Health Advisory File System (AHAFS) for clusterwide event distribution.
The cluster exports cluster messaging and cluster socket services to other functions in the operating system, such as Reliable Scalable Cluster Technology (RSCT) and PowerHA.
In the following example I created a two node cluster using CAA tools. First of all, I had to use the mkcluster command to define the cluster nodes, shared storage and repository disk. The node names are 7502lp01 and 7502lp02. The shared data storage disks are hdisk2, hdisk3 and hdisk4. The repository disk, used to house the cluster configuration data, is hdisk1.
# mkcluster -n mycluster -r hdisk1 -d hdisk2,hdisk3,hdisk4 -m 7502lp01,7502lp02
mkcluster: Cluster shared disks are automatically renamed to names such as
cldisk1, [cldisk2, ...] on all cluster nodes. However, this cannot
take place while a disk is busy or on a node which is down or not
reachable. If any disks cannot be renamed now, they will be renamed
later by the clconfd daemon, when the node is available and the disks
are not busy.
Youll notice that the mkcluster command informed me that the cluster shared disks are automatically renamed to cluster disk names like cldisk1. After Id run the command I noticed something quite interesting and impressive. Prior to configuring the cluster the shared disks on both nodes had names like hdiskX. Afterwards the shared disks had been renamed across both nodes. Now these disks had the same name!
This is going to simplify cluster configuration and management. No more will I need to remove and recreate disks in order to resolve disk naming inconsistencies in a cluster. The lspv output (shown below) from both nodes shows that I have three shared disks (cldisk1, cldisk2 and cldisk3). These would be used for shared data. The disk named caa_private0 is my cluster repository disk. This is used to store and share the cluster configuration data.
7502lp01:
# lspv | sort
caa_private0 00f61ab20b97190d caavg_private active
cldisk1 00f61ab20bf28ac6 None
cldisk2 none None
cldisk3 none None
hdisk0 00f61ab2f73e46e2 rootvg active
7502lp02:
# lspv | sort
caa_private0 00f61ab20b97190d caavg_private active
cldisk1 00f61ab20bf28ac6 None
cldisk2 none None
cldisk3 none None
hdisk0 00f61ab2895e4cbe rootvg active
Cluster Aware AIX tells you what nodes are in the cluster plus information on those nodes, including state. A special gossip protocol is used over the multicast address to determine node information and implement scalable reliable multicast. No traditional heartbeat mechanism is employed. Gossip packets travel over all interfaces, including storage. Immediately after running mkcluster, I was able to query the status of the nodes in my cluster without any further configuration!
# lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2
Node name: 7502lp01
Cluster shorthand id for node: 1
uuid for node: 3cd9cb00-bf55-11df-b015-6e8dd0af6304
State of node: UP
Smoothed rtt to node: 7
Mean Deviation in network rtt to node: 3
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
mycluster local 267ce7fc-bf55-11df-a3b9-6e8dd877b814
Number of points_of_contact for node: 1
Point-of-contact interface & contact state
en0 UP
------------------------------
Node name: 7502lp02
Cluster shorthand id for node: 2
uuid for node: d1a46164-bf46-11df-94b3-6e8dd877b814
State of node: UP NODE_LOCAL
Smoothed rtt to node: 0
Mean Deviation in network rtt to node: 0
Number of zones this node is a member in: 0
Number of clusters node is a member in: 1
CLUSTER NAME TYPE SHID UUID
mycluster local 267ce7fc-bf55-11df-a3b9-6e8dd877b814
Number of points_of_contact for node: 0
Point-of-contact interface & contact state
n/a
CAA tells you what interfaces have been discovered on a node plus information on those interfaces, including state.
# lscluster i n
Node 7502lp01
Node uuid = 110b2422-7efc-11df-aed7-1612a0003002
Number of interfaces discovered = 1
Interface number 1 en0
ifnet type = 6 ndd type = 7
Mac address length = 6
Mac address = 16.12.a0.0.30.2
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x1e080863
ndd flags for interface = 0x21081b
Interface state UP
Number of regular addresses configured on interface = 1
IPV4 ADDRESS: 9.3.28.136 broadcast 9.3.28.159 netmask 255.255.255.224
Number of cluster multicast addresses configured on interface = 1
IPV4 MULTICAST ADDRESS: 228.8.8.8 broadcast 0.0.0.0 netmask 0.0.0.0
CAA tells you what disks are in the cluster plus information on those disks, including the state and type.
# lscluster d
Storage Interface Query
Cluster Name: mycluster
Cluster uuid: a1bde416-83b9-11df-b438-1612a0004002
Number of nodes reporting = 2
Number of nodes expected = 2
Node oscar-dev8.austin.ibm.com
Node uuid = 9807465b-3194-f1e5-8a04-1f044bc82593
Number of disk discovered = 2
cldisk1
state : UP
uDid : 533E3E213600A0B8000475C200000E7114BC825930F1818 FAStT03IB
Mfcp05VDASD03AIXvscsi
type : CLUSDISK
uUid : 9807465b-3194-f1e5-8a04-1f044bc82593
hdisk1
state : UP
uDid :
uUid : 600a0b80-0047-5d0a-0000-e6094bc82690
type : REPDISK
You can run cluster wide commands on all nodes immediately after configuring your cluster using the clcmd command. No further configuration required.
# clcmd ps -ef
-------------------------------
NODE 7502lp01
-------------------------------
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Aug 25 - 0:00 /etc/init
root 655526 1 0 Aug 25 - 0:18 [cimserve]
root 1310840 1 0 Aug 25 - 0:00 /usr/ccs/bin/shlap64
root 2490482 1 0 Aug 25 - 0:42 /usr/sbin/srcmstr
root 3145874 3801262 0 11:17:13 - 0:00 telnetd -a
root 3276994 2490482 0 Aug 25 - 0:10 /usr/sbin/snmpd
-------------------------------
NODE 7502lp02
-------------------------------
UID PID PPID C STIME TTY TIME CMD
root 1 0 1 Aug 28 - 0:00 /etc/init
root 1441934 1 0 Aug 28 - 0:39 /usr/sbin/syncd 60
root 1638496 1 0 Aug 28 - 0:00 /usr/ccs/bin/shlap64
root 2228252 2556054 0 Aug 28 - 0:00 /usr/sbin/hostmibd
I believe that PowerHA SystemMirror 7.1 will be the first cluster product from IBM to provide high availability on AIX using the new Cluster Aware AIX features. An IBM Redbook Residency recently covered the integration of CAA and PowerHA. Im eagerly awaiting the first draft of this book so that I can learn how these two components will play together.
Refer to the AIX 7.1 Information Centre for more information on Cluster Aware AIX:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusteraware/claware_main.htm
Refer to the AIX 7.1 Information Centre for more information on AHAFS and the new AIX Event Infrastructure: