I was working at a client site today, on a NIM master that I configured a month or so ago. I was there to install the TSM backup client software on about 30 or so LPARs. Of course I was going to use NIM to accomplish this task.
The software install via NIM worked for the majority of the LPARs but I noticed a few of them were failing. This was very odd, as the last time Id use the same NIM method to install software, everything was fine.
I suspected that perhaps something had changed on the client LPARs...maybe with their /etc/niminfo file for instance. So I performed the following steps to reconfigure the /etc/niminfo file and configure the nimsh subsystem on the client LPAR.
lpar1# mv /etc/niminfo /etc/niminfo.old
lpar1# niminit -a master=nim1 -a name=`hostname`
lpar1# stopsrc -s nimsh
lpar1# smit nim_config_services
Configure Client Communication Services
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Communication Protocol used by client [nimsh] +
NIM Service Handler Options
* Enable Cryptographic Authentication [disable] +
for client communication?
Install Secure Socket Layer Software (SSLv3)? [no] +
Absolute path location for INSTALLP package [/dev/cd0] /
-OR-
lpp_source which contains INSTALLP package [] +
Alternate Port Range for Secondary Connections
(reserved values will be used if left blank)
Secondary Port Number [] #
Port Increment Range [] +#
The last step failed with the following error message:
0042-358 niminit: The connect attribute may only be assigned a service value of "shell" or "nimsh".
I checked the NIM client and confirmed it was configured for nimsh and it was fine. However, I did notice something odd when I ran the following command:
lpar1# egrep 'nimsh|nimaux' /etc/services
lpar1#
The entries for nimsh were missing from the /etc/services file!
Somebody had decided that these entries were not required and had simply removed them! Gee, thanks so much for that!
After adding the following entries back into the services file, everything started working again!
nimsh 3901/tcp # NIM Service Handler
nimsh 3901/udp # NIM Service Handler
nimaux 3902/tcp # NIMsh Auxiliary Port
nimaux 3902/udp # NIMsh Auxiliary Port
Ive also encountered this error when there is another process (other than nimsh) using port 3901 or 3902.
Another error message you might confront, if those entries are either missing or commented out, is on the NIM master:
nimmast# nim -o showlog -a log_type=lppchk lpar1
0042-001 nim: processing error encountered on "master":
0042-006 m_showlog: (From_Master) connect Error 0
poll: setup failure
I thought Id also mention another error message that can potentially drive you insane (especially if you havent had your morning coffee!). The error doesnt relate to nimsh at all but I thought Id describe it anyway. The message appears when running the nim o showlog command against a client LPAR.
nimmast# nim -o showlog lpar1
0042-001 nim: processing error encountered on "master":
0042-006 m_showlog: (From_Master) connect Error 0
0042-008 nimsh: Request denied wronghostname
Ive modified the output a little to make it easier to identify the problem. Can you see it? I thought so! Upon investigation you may find that the IP address for the NIM master is resolving to a different hostname on the client. For example:
On the NIM master:
nimmast# host nimmast
nimmast is 172.29.150.177
nimmast# host 172.29.150.177
nimmast is 172.29.150.177
nimmast# grep 177 /etc/hosts
172.29.150.177 nimmast
On the NIM client:
lpar1# host nimmast
nimmast is 172.29.150.177
lpar1# host 172.29.150.177
wronghostname is 172.29.150.177
lpar1# grep 172.29.150.177 /etc/hosts
172.29.150.177 wronghostname
172.29.150.177 nimmast
In this example, someone placed two host entries in /etc/hosts with the same IP address. The client was resolving the IP address to an incorrect hostname. This resulted in our nim o showlog command failing.