ID #1293

How do I run an external Torque server with Bright?

How do I setup a Bright cluster to use an external Torque server?

  1. Enable torque using the wlm-setup utility:

[root@ma-b71-c6 ~]# wlm-setup -s -w torque
               Disabling torque services  .....   [  OK  ]
          Creating default torque config  .....   [  OK  ]
               Initializing torque setup  .....   [  OK  ]
                     Setting permissions  .....   [  OK  ]
                Enabling torque services  .....   [  OK  ]
                              Finalizing  .....   [  OK  ]

Please note that the changes in the software image(s) have not
been propagated to the running nodes. This will happen when
the node(s) is/are rebooted.  

2. Set the externalserver property to yes so that CMDaemon won’t complain about stopped torque_server service

[root@ma-b71-c6 ~]# cmsh
[ma-b71-c6]% device roles master
[ma-b71-c6->device[ma-b71-c6]->roles]% use torqueserver
[ma-b71-c6->device[ma-b71-c6]->roles[torqueserver]]% get externalserver
[ma-b71-c6->device[ma-b71-c6]->roles[torqueserver]]% set externalserver yes
[ma-b71-c6->device*[ma-b71-c6*]->roles*[torqueserver*]]% commit

3. Freeze the torque configurations to avoid CMDaemon writing out the configuration files:

[root@ma-b71-c6 ~]# grep Torqu /cm/local/apps/cmd/etc/cmd.conf

FreezeChangesToTorqueConfig = true

[root@ma-b71-c6 ~]# service cmd restart

4. Replace the “” with the hostname of the external server.

[root@ma-b71-c6 ~]# cat /cm/shared/apps/torque/var/spool/torque.cfg


[root@ma-b71-c6 ~]# cat /cm/shared/apps/torque/var/spool/server_namema-b70-c6
[root@ma-b71-c6 spool]# cat /cm/local/apps/torque/var/spool/server_name


5. Add the following firewall rules to /etc/shorewall/rules on the head node to allow communication with the external torque server:

ACCEPT   net            fw              tcp     15004

ACCEPT   net            fw              udp     15004
ACCEPT   net            fw              tcp     15003

ACCEPT   net            fw              udp     15003

ACCEPT   net            fw              tcp     15002

ACCEPT   net            fw              udp     15002

ACCEPT   net            fw              tcp     15001

ACCEPT   net            fw              udp     15001

[root@ma-b71-c6 ~]# /etc/init.d/shorewall restart

6. Restart the trqauthd and torque_mom services and make sure that the torque_server is stopped:

[root@ma-b71-c6 ~]# service torque_mom restart

[root@ma-b71-c6 ~]# service trqauthd restart
[root@ma-b71-c6 ~]
# service torque_server stop

Tags: -

Related entries:

You cannot comment on this entry