Tips and tricks with dsh on AIXI received the following errors whilst running dsh on a NIM master recently.
root@nim1 : / # dsh -waixlpar1 date 0042-053 lsnim: there is no NIM object named "aixlpar1" The node aixlpar1 is not defined in NIM database. aixlpar1: Mon Aug 4 14:01:57 EET 2014
I had to set the following environment variable, shown below. By setting DSH_CONTEXT to DSH this prevented the dsh command from referring to the NIM database and instead forced it to query a user-defined node list.
root@nim1 : / # export DSH_CONTEXT=DSH
root@nim1 : / # dsh -waixlpar1 date aixlpar1: Mon Aug 4 14:02:22 EET 2014
root@nim1 : / # env | grep -i dsh DSH_CONTEXT=DSH
DSH_
root@nim1 : / # dsh -q
DSH: DSH:DCP_DEVICE_RCP=
DSH:
DSH: DSH:DSH_CONTEXT=DSH
DSH:
DSH: DSH:DSH_DEVICE_RCP= DSH:DSH_DEVICE_RSH=
DSH: DSH:DSH_FANOUT= DSH:DSH_LOG=
DSH:
DSH: DSH:DSH_NODE_OPTS= DSH:DSH_NODE_RCP=
DSH: DSH:DSH_OUTPUT= DSH:DSH_PATH= DSH:DSH_REPORT= DSH:DSH_SYNTAX= DSH:DSH_TIMEOUT= DSH:RSYNC_RSH=
Here’s another dsh tip I picked up. By default dsh will use the default port for ssh connections to nodes. For example, by default sshd listens on port 22 on an AIX node. I recently came across a customer environment where they had configured sshd to listen on port 6666 (not the real port number!). They wanted to use dsh from a NIM master which would connect to all the defined nodes in their custom list. When they ran it they got the following error message:
# dsh date aixlpar1: ssh: connect to host aixlpar1 port 22: Connection refused dsh: 2617-009 aixlpar1 remote shell had exit code 255
On the AIX node, we could see that sshd was listening on port 6666:
# netstat -a | grep 6666 | grep LIST
tcp6 0 0 *.66
tcp4 0 0 *.66
We needed to find a way to force dsh to use a different port number when starting the ssh connection. This was accomplished by setting the DSH_REMOTE_OPTS variable, as shown below.
[root@nim1]/ # export DSH_
[root@nim1]/ # dsh date aixlpar1: Tue Aug 5 17:37:16 2014
[root@nim1]/ # env | grep DSH
DSH_
DSH_
DSH_
DSH_
DSH CONTEXT
The
DSH CONTEXT is the in-built context for all the DSH Utilities commands.
It permits a user-defined node group database contained in the local
file system. The DSH_NODEGROUP_PATH environment variable specifies the
path to the node group database. Each file in this directory represents a
node group, and contains one host name or TCP/IP address for each node
that is a group member. Blank lines and comment lines beginning with a #
symbol are ignored. If all nodes are requested for the DSH CONTEXT, a
full node list is built from all groups in the DSH_NODEGROUP_PATH
directory, and cached in /var
http
DSH_REMOTE_OPTS Includes the options specified in the remote command when the command is forwarded to the remote nodes.
|
Chris,
Sorry I failed to notice you updated this thread; I'll be watching it
daily going forward. I tried to email you but it refuses to go through.
The command passed to dsh doesn't seem to have an effect. Running
sshd in debug mode on the remote host shows that it never even attempts
to connect. Using ssh to connect works just fine.
I have the following relevant packages installed:
dsm.dsh 7.1.3.45 APPLIED Distributed Systems Management
openssh.base.client 6.0.0.6108 COMMITTED Open Secure Shell Commands
openssl.base 1.0.1.513 COMMITTED Open Secure Socket Layer
Here are dsh related shell settings:
DSH_CONTEXT=DSH
DSH_NODE_RSH=/usr/bin/ssh
DSH_NODE_OPTS=-v -q -o BatchMode=yes -t 30
Mike, I was able to reproduce the problem. Try changing "DSH_NODE_OPTS="-v -q -o BatchMode=yes -t 30" to this
"DSH_NODE_OPTS="-v
-q -o BatchMode=yes". For some reason, the -t option is causing the
command to fail. I haven't figured out why just yet. But I wanted to let
you know what I'd found in case it helps.
Thanks Chris! It seems that the DSH_NODE_OPTS are being passed to the ssh command rather than being applied to dsh. Changing to DSH_NODE_OPTS="-v -q -o BatchMode=yes" allowed it to run but in addition to the command output it also outputs the version of ssh in use. Changing to DSH_NODE_OPTS="-q -o BatchMode=yes" allows it to drop the output of the ssh version. I don't know why this worked in the past but I didn't change they settings when it stopped working. In any case, thanks again for you help. I think I'm just going to clear the DSH_NODE_OPTS variable.
Chris,
thanks for the primer. I've used dsh for years but learned about the
DSH_NODEGROUP_PATH setting here. Anyway, I recently upgraded AIX to use
OpenSSH 6.0p1 and OpenSSL 1.0.1e. The change seems to have broken dsh.
Now I just get errors like:
remote.host: OpenSSH_6.0p1, OpenSSL 1.0.1e 11 Feb 2013
remote.host: Pseudo-terminal will not be allocated because stdin is not a terminal.
dsh: 2617-009 remote.host remote shell had exit code 255
Have you seen this problem? More importantly, are you aware of a solution?
Thanks!
Hi, I have the same levels installed in my lab and dsh is working fine. What command/commands are you passing to dsh? Does a simple command work? For example, # dsh date ? Let me know. If you want to discuss over email, you can contact me at cg@gibsonnet. Cheers, Chris
Hi Chris,
i am in the same situation of Mike_Pete .
I have OpenSsh ver 6.0.0.6104 , openssl 1.0.1.513 and i experience the same error (except for the pseudo terminal part).
As you can see also a simple command as date fails with rc 255 .
root@nim01:/#dsh -n nim02-int date
dsh: 2617-009 nim02-int remote shell had exit code 255
Thanks in advance for suggestions
Bye
Brian, Love the articles and find them very useful. I have a question on using a WCOLL file for DSH.. I want to have comments in my WCOLL file for nodes: nodename1 # my test server Of course DSH ignore any line starting with "#" but also ANYTHING after the 1st space. I have been trying to find where this rule is set but with no luck. Any suggestions? Thanks James