Update, May 14th, 2019: It's been 7 years since I first published this post. Power Systems and PowerVM technology has progressed significantly since then. Hence the need for me to provide an update to this blog post. So, if you're planning on or you're already using Simplified Remote Restart (SRR) (and/or offline LPM), please do NOT enable ghostdev on your systems. If ghostdev is set to 1 and you attempt to use SRR (or offline LPM), the AIX LPAR will reset the ODM during boot. This is (most likely) not desired beahviour. If the ODM is cleared the system will need to be reconfigured so that TCP/IP and LVM are operational again. If you require a "ghostdev like" behaviour for your AIX disaster recovery (DR) process, I would recommend you set the sys0 attribute, clouddev, to 1, immediately after you have booted from your replicated rootvg. Rebooting your AIX system with this setting enabled will "Recreate ODM devices on next boot" and allow you to reconfigure your LPAR for DR. Once you've booted with clouddev=1 and reconfigured your AIX LPAR at DR, immediately disable clouddev (i.e. set it to 0, the default), so that the ODM is not cleared again on the next system reboot. Some more details on clouddev below.
"Details on the "ghostdev" version-2 function (aka clouddev) we need from AIX:
The desired behavior we want to see occur in AIX to manage resetting device-id naming in the OS is:
1) A user (or script run when installing 'cloud-init') runs: set /usr/sbin/chdev -l sys0 -a clouddev=1
2) AIX then defines the NVRAM variable "ibm,aix-clearODMdata" and sets it to false when that chdev command is run.
3) During AIX startup of ODM, if this clouddev flag is set, AIX looks for an "ibm,aix-clearODMdata" NVRAM variable and if it does not
exist, it sets it to true.
4) If the "ibm,aix-clearODMdata" flag is set to true during that boot, ODM will then do the same ghostdev-style cleanup and set
"ibm,aix-clearODMdata" to false when finished
This solution leverages the fact that NVRAM data is preserved across Live Partition Migration but not across the image/instance IaaS
lifecycle. It also means we are no longer dependent upon detecting different LPAR ids, i.e. we don't break if parent image is
deployed to same lpar-id on same host. It also means we no longer have to toggle the ghostdev setting at boot or AE reset (the goal
being no need to do any sort of reset, be able to do capture anytime)."
If you are looking for a more modern and automated solution for your AIX DR, I would highly recommend you take a look at theIBM VM Recovery Manager for IBM Power Systems. "Streamline site switches with a more economical, automated, easier to implement high availability and disaster recovery solution for IBM Power Systems."
https://www.ibm.com/au-en/marketplace/vm-recovery-manager
---- Original Post from 2012 below ----
My team and I have recently been trying to stream line our AIX disaster recovery process. Weve been looking for ways to reduce our overall recovery time. Several ideas were tossed around such as a) using a standby DR LPAR with AIX already installed and using rsync/scp to keep the Prod & DR LPARs in sync and b) using alt_disk_copy (with the O flag for a device reset) to clone rootvg to an alternate disk which is then replicated to DR. These methods may work but are cumbersome to administer and (in the case of alt_disk_copy) require additional (permanent) resources on every production system. With over 120 production instances of AIX, the disk space requirements start to add up.
So far weve concluded that the best way to achieve our goal is by using SAN replicated rootvg volumes at our DR site.
Our current DR process relies on recovery of AIX systems from mksysb images from a NIM master. All our data (non-rootvg) LUNs are already replicated to our DR site. The aim was to change the process and recover our AIX images using replicated rootvg LUNs. This will reduce our overall recovery time at DR (which is crucial if we are to meet the proposed recovery time objectives set by our business). Based on current IBM documentation we were relatively comfortable with the proposed approach. The following IBM developerWorks article (originally published in 2009 and updated in late 2010) describes scenarios in which remapping, copying, and reuse of SAN disks is allowed and supported. More easily switch AIX environments from one system to another and help achieve higher availability and reduced down time. These scenarios also allow for fast deployment of new systems using cloning.
The document focuses on fully virtualised environments that utilise shared processors and VIO servers. One area where this document is currently lacking in information is the use of NPIV and virtual fibre channel adapters in a DR scenario. We reached out to our contacts in the AIX development space and asked the following question:
Hoping you can help us find some statements regarding support for a replicated rootvg environment using NPIV/Virtual Fibre Channel adapters?
The following IBM developerWorks article discusses VSCSI and we are looking for something similar for NPIV.
http://www.ibm.com/developerworks/aix/library/au-AIX_HA_SAN/index.html
My guess is that restrictions similar to those for physical FC adapters will apply here? But I'm hoping that given the adapters are virtual the limitations may be relaxed.
Are you aware of any statement regarding support (or not) for booting from another system using a disk subsystem image of rootvg replicated to another disk subsystem when using NPIV? And what, if any, additional requirements/restrictions may apply when using NPIV?
We received the following responses:
There are some additional considerations when using NPIV for booting from a replicated rootvg. With NPIV the client partitions has virtual Fibre Channel adapter ports, but has physical access to the actual (physical) disk devices. There may be an increased chance of needing to update the boot list via the Open Firmware SMS menu. Since the clients have access to the actual disks, you have the possibility of running multipathing software besides AIX MPIO. If you are using a multipathing software to manage the NPIV attached disks besides AIX MPIO, then you should contact the vendor that provided the software to check their support statement,
Since one or more of the physical devices will change when booting from an NPIV replicated rootvg, it is recommend to set the ghostdev attribute. The ghostdev attribute will trigger when it detects the AIX image is booting from either a different partition or server. Ghostdev attribute should not trigger during LPM operations (Live Partition Mobility). Once triggered, ghostdev will clear the customized ODM database. This will cause detected devices to be discovered as new devices (with default settings), and avoid the issue with missing/stale device entries in ODM. Since ghostdev does clear the entire customized ODM database, this will require you import your data (non-rootvg) volume groups again, and perform any (device) attribute customization. To set ghostdev, run "chdev -l sys0 -a ghostdev=1". Ghostdev must be set before the rootvg is replicated.
Similar with virtual devices, the client partition is booting an existing rootvg where the hardware may be different. It's possible some applications have dependency on tracking the actual physical devices (instead of data on the disks). For example PowerHA, may keep track of a disk for a cluster health checks. If you do have applications that have a dependency on tracking physical devices, then additional setup (of those applications) may be required after the first boot from the replicated rootvg.
We do have multiple customers using NPIV for such scenarios. I believe most of them worked with IBM Lab Based Services to assist with implementing such a configuration, and some of the customers required some custom scripts to further customize their system after booting from the replicated rootvg. Those customers set the ghostdev attribute, and had custom scripts to import their data (non-rootvg) volume groups, and update PowerHA to point to the new health check disk.
You should get support for such an NPIV setup with IBM as long as you follow the considerations listed in the WhitePaper.
Development has approved using NPIV to do this for one customer. Below are more detailed requirements for this DR strategy using NPIV. If a Disaster Recovery (DR) environment not using PowerHA Enterprise Edition is used, then we believe the white paper located http://www.ibm.com/developerworks/aix/library/au-AIX_HA_SAN/index.html provides the guidelines in regards to setup, prerequisites, and as well as the limitations of such a DR deployment. Deployments as detailed in the white paper are supported by IBM. However note that such a deployment has many manually instituted responsibilities on the customer to setup and maintain such an environment. IBM expects that the customer carefully manage these manual steps without any mistakes. The White paper does not currently cover using NPIV in such a DR scenario.
We have the following guidelines regarding the configuration, which includes NPIV as an option: All of the system configuration should be virtualized, with the possible exception of disk devices when using NPIV. If NPIV is used then AIX MPIO must be used as the multi-pathing solution. If multi-pathing software is used besides AIX MPIO, then the vendor of that software must be contacted regarding a support statement. Install AIX (at least minimum required TLs/SPs for desired AIX version) and software stack (Middleware and applications) on primary systems, which is compatible with systems at both sites. Primary and secondary sites should be using systems with similar hardware, same microcode levels, and same VIOS levels.
Many manual steps are needed to setup the virtual and physical devices accurately on secondary site VIOS. If Virtual SCSI Disks are being used, then discover the unique identification on the primary site and map the disks to the corresponding replication disks. Map the same appropriately on VIOS on secondary site. Level of VIOS should support attribute to open the secondary devices passively. This setting needs to be setup correctly on the VIOS on secondary site. Operating environment should not have subnet dependencies. Manage the replication relationships accurately. Manually may need to switch the secondary disks to primary node. Raw disk usage may cause problems. Some middle ware products may bypass Operating system and use the disk directly. They might have their own restrictions for this environment. (e.g. anything that is device location code or storage LUN unique ID dependant may have issues when the cloned image is restarted on the secondary system with replicated storage).
Set the "ghostdev" attribute using chdev command (must be done on the primary). This attribute can be set using the command "chdev -l sys0 a ghostdev=1". The ghostdev attribute will delete customized ODM database on rootvg when AIX detects it has booted from a different LPAR or system. If the "ghostdev" attribute is not set, then the booting from the alternate site will result in devices in ODM showing up in "Defined" or "Missing" state.
After a failover to secondary site, may need to reset the boot device list for each LPAR before the boot of the LPAR using SMS menus of the firmware Note that this is not an exhaustive list of issues. Refer to the white paper and study the same as it applies to the environment. So as long as they are using MPIO, you're OK. If not using MPIO and some OEM storage, then the storage vendor must also support it.
While these responses indicate that this form of recovery is supported by IBM, we were still looking to IBM for clarity on the support position. It has been noted that other IBM customers have had mixed responses when contacting AIX support for feedback and assistance with this type of DR procedure. And its not hard to see why when you read statements like this from the Supported Methods of Duplicating an AIX System document:
Unsupported Methods
1. Using a bitwise copy of a rootvg disk to another disk.
This bitwise copy can be a one-time snapshot copy such as flashcopy, from one disk to another, or a continuously-updating copy method, such as Metro Mirror.
While these methods will give you an exact duplicate of the installed AIX operating system, the copy of the OS may not be bootable. A typical scenario where this is tried is when one system is a production host and there is a desire to create a duplicate system at a disaster recovery site in a remote location.
2. Removing the rootvg disks from one system and inserting into another.
This also applies to re-zoning SAN disks that contain the rootvg so another host can see them and attempt to boot from them.
Why don't these methods work?
The reason for this is there are many objects in an AIX system that are unique to it; Hardware location codes, World-Wide Port Names, partition identifiers, and Vital Product Data (VPD) to name a few. Most of these objects or identifiers are stored in the ODM and used by AIX commands.
If a disk containing the AIX rootvg in one system is copied bit-for-bit (or removed), then inserted in another system, the firmware in the second system will describe an entirely different device tree than the AIX ODM expects to find, because it is operating on different hardware. Devices that were previously seen will show missing or removed, and usually the system will typically fail to boot with LED 554 (unknown boot disk).
So, as a secondary objective, we have been working closely with our local IBM representatives to obtain some surety from IBM that our proposed DR strategy for AIX is fully supported by both the AIX development and support teams.
With that in mind Ill provide an overview of our new DR approach and hope that it offers others insight to alternative method for recovery and also to assist IBM in further understanding what some of the larger AIX customers are looking for in terms of simplified AIX disaster recovery.
What follows is a detailed description of our IBM AIX, PowerVM/Power Systems environment, the proposed recovery steps and other items for consideration.
Our Environment:
- AIX 5.3*, 6.1 & 7.1 - 5300-12-04-1119, 6100-06-05-1115 and 7100-01-04-1216.
- VIOS 2.2.1.3 both production and DR.
- VIOS physical FC adapters both production and DR - 8Gb PCI Express Dual Port FC Adapter (5735) Firmware level: 2.00X7.
- All client LPARs utilise Virtual FC adapters (NPIV) for disk storage.
- Production: POWER7 795 (9119-FHB). Firmware level: AH730_078.
- DR: POWER6 595 (9119-FHA). Firmware level: EH350_120.
- HDS VSP storage at both production and DR sites.
- AIX MPIO only HDS ODM driver is installed (devices.fcp.disk.Hitachi.array.mpio.rte) HDLM is not installed.
- Production: Dual VIOS.
- DR: Single VIOS.
- Production: LUNs are mapped directly to client LPARs via NPIV, virtual FC adapter pass thru via dual VIOS. MPIO is in use i.e. one path per VIOS/Physical FC.
- DR: LUNs are mapped directly to client LPARs via NPIV, virtual FC adapter pass thru via a single VIOS. i.e. single path only.
- Please refer to the following table for a summary of the environment details.
Our Recovery Procedure:
1. - Change the sys0 ghostdev attribute value to 1 on the source production AIX system. Set the "ghostdev" attribute using chdev command (must be done on the primary). This attribute can be set using the command "chdev -l sys0 a ghostdev=1". The ghostdev attribute will delete customized ODM database on rootvg when AIX detects it has booted from a different LPAR or system.
2. - Take note of the source systems rootvg hdisk PVID.
3. - Select the source production rootvg LUN for replication on the HDS VSP.
4. - Replicate the LUN from the production site to the DR HDS VSP.
5. - In a DR test, suspend HDS replication from production to DR.
6. - Assign the replicated LUN to the target LPAR on the DR POWER6 595 i.e. map the LUN to the WWPN of the virtual FC adapter on the DR LPAR.
7. - Attempt to boot the DR LPAR using the replicated rootvg LUN. If necessary, enter SMS menu to update the boot list i.e. select the correct boot disk, check for the same PVID as the source host.
8. - Once the LPAR has successfully booted, the AIX administrator would configure the necessary devices i.e. import data volume groups, configure network interfaces, etc. This may also be scripted for execution during the first boot process.
9. - Please refer to the following diagrams for a visual representation of the proposed process.
Some Notes/Caveats:
The following is a list of items that we understand are possible limitations and issues with our new DR process.
- Booting from replicated rootvg disks may fail for several reasons, such as, a) there is unexpected corruption in the replicated LUN image due to rootvg not being quiesced during replication or b) there is a unidentified issue with the AIX system that is only apparent the next time the system is booted; this could be mis-configuration by the administrator or some other unforseen problem.
- In the event that an LPAR fails to boot via a replicated rootvg LUN, a backup method is available for recovery. Switching back to manual NIM mksysb restore provides a sufficient backup should the replicated rootvg be unusable.
- If the "ghostdev" attribute is not set, then booting from the DR site will result in devices in the ODM showing up in a "Defined" or "Missing" state.
- Once a DR test is completed, the DR LPAR should be de-activated immediately so that SAN disk replication can be restarted between production and DR. Failure to perform this step may result in the DR LPAR failing as a result of file system corruption.
- At present we are using AIX MPIO only. There is discussion of using HDLM in the future. We will contact HDS for a support statement regarding booting from replicated rootvg LUNs with HDLM installed.
- The ghostdev attribute is not implemented in AIX 5.3. AIX 5.3 is no longer supported*.
So far all of our testing has been successful. We verified that we could replicate an SOE rootvg image of AIX 6.1 and 7.1 to DR and successfully boot an LPAR using the replicated disk. Based on these tests there doesnt appear to be anything stopping us from using this method for DR purposes. The following table outlines the different versions of AIX we tested and the results.
Once the system was booted we needed to perform some post boot configuration tasks. These tasks were handled by two scripts that were called from /etc/inittab. On the source system we installed the new scripts (in /etc) and added new entries to the /etc/inittab file. These scripts only run if the systemid matches that of the DR systemid. Note: Only partial contents of each script are shown belowbut you get the idea.
# mkitab -i srcmstr "AIXDRimportvg:2:once:/etc/AIX_DRimportvg.ksh > /dev/console 2>&1 #Reconfigure the system for DR on boot."
# mkitab -i srcmstr "AIXDRconfig:2:once:/etc/AIX_DRconfig.ksh > /dev/console 2>&1 #Reconfigure the system for DR on boot."
# lsitab -a | grep DR
AIXDRconfig:2:once:/etc/AIX_DRconfig.ksh > /dev/console 2>&1 #Reconfigure the system for DR on boot.
AIXDRimportvg:2:once:/etc/AIX_DRimportvg.ksh > /dev/console 2>&1 #Reconfigure the system for DR on boot.
AIX_DRconfig.ksh:
#!/bin/ksh
#set -xv
#################################################################
#
# This script will configure the AIX environment for DR purposes.
# This script will only run if the systemid matches the systemid of the DR system.
# aixdr : / # lsattr -EOl sys0 -a systemid | grep -v systemid | awk -F, '{print $2}'
#
# 0211D11C1 = DR POWER System
#
#################################################################
MYNAME=$(basename $0)
DR_SYSTEMID=0211D11C1
LSATTR_SYSTEMID_DR=`lsattr -EOl sys0 -a systemid | grep -v systemid | awk -F, '{print $2}'`
if [ "$LSATTR_SYSTEMID_DR" = "$DR_SYSTEMID" ]
then
#Set the DR hostname, IP address, netmask and gateway.
#mktcpip -h <hostname> -a <IP to use in DR> -m <network mask> -i en0 -g <gateway IP>
#en0:10.1.5.99:255.255.252.0:10.1.1.10,255.255.255.0 10.2.2.10,255.255.255.0
dr_hostname=`cat /usr/local/dr/dr_hostname.txt`
dr_defgw=`cat /usr/local/dr/dr_defgw.txt`
dr_intf=`cat /usr/local/dr/dr_en0.txt | awk -F: 'print $1}'`
dr_ip=`cat /usr/local/dr/dr_en0.txt | awk -F: 'print $2}'`
dr_netm=`cat /usr/local/dr/dr_en0.txt | awk -F: 'print $3}'`
echo "$MYNAME: Setting DR hostname and IP address."
mktcpip -h $dr_hostname -a $dr_ip -m $dr_netm -i $dr_intf -g $dr_defgw
#Configure IP aliases
#chdev -l en0 -a alias4=10.1.1.10,255.255.255.0
#Remove IP aliases
#chdev -l en0 -a delalias4=10.1.1.10,255.255.255.0
for i in `cat /usr/local/dr/dr_en0.txt | awk -F: '{print $4}'`
do
chdev -l $dr_intf -a alias4=$i
done
for inet in `cat dr_inet.txt | grep -v en0`
do
for i in `cat /usr/local/dr/dr_$inet.txt | awk -F: '{print $4}'`
do
chdev -l $inet -a alias4=$i
done
done
#Configure the AIX environment for DR.
echo "$MYNAME: Configure the AIX environment for DR."
echo
#Delete the MDC DNS entries
echo "$MYNAME: Configuring DNS for DR."
namerslv -d -i 10.1.6.21
namerslv -d -i 10.1.4.21
namerslv -d -i 10.1.5.21
#Add the DR DNS entry
namerslv -a -i 10.1.7.38
echo
echo "$MYNAME: Configuring bootlist for DR."
bootlist -m normal -o
bootlist -m normal hdisk0
bootlist -m normal -o
echo
echo "$MYNAME: Configuring /etc/inittab for DR."
lsitab -a | grep nim
chitab "nimclient:2:off:/usr/sbin/nimclient -S running > /dev/console 2>&1 # inform nim we're running"
lsitab -a | grep nim
echo
else
MYNAME=$(basename $0)
echo "$MYNAME: The systemid $LSATTR_SYSTEMID_DR does not match the expected DR systemid of $DR_SYSTEMID."
echo "$MYNAME: This script should only be executed at DR."
echo "$MYNAME: If you are not booting the system at the DR site, then you can ignore this message."
echo "$MYNAME: No changes have been perfomed. Script is exiting."
fi
The ghostdev attribute essentially provides us with a clean ODM and allows the system to discover new devices and build the ODM from scratch. If you attempt to boot from a replicated rootvg disk without first setting the ghostdev attribute, your system may fail to boot (hang at LED 554) because of a new device tree and/or missing devices. You might be able to recover from this situation (without restoring from mksysb) by performing the steps outlined on pages 16-20 of the following document (thanks to Dominic Lancaster at IBM for the presentation).