Problem: cannot start parallel task, Problem: bad performance, Problem: cannot start process on front end – PAR Technologies PARASTATION5 V5 User Manual
Page 34: 30 6.4. problem: bad performance, 30 6.6. problem: cannot start process on front end

Problem: cannot start parallel task
30
ParaStation5 Administrator's Guide
Or logged on to this node, run psiadmin which also starts up the ParaStation daemon
psid
. See
Section 6.1, “ Problem: psiadmin returns error ” for more details.
Check the logfile
/var/log/messages
on this node for error messages. Verify that all nodes have an
identical configuration (
/etc/parastation.conf
).
6.3. Problem: cannot start parallel task
Problem: a parallel task cannot be launched, an error is reported:
PSI: PSI_createPartition: Resource temporarily unavailable
Check for available nodes and active parallel tasks. Check for user or group restrictions.
If the error
PSI: dospawn: spawn to node 1 failed.
PSE: Could not spawn './mpi_latency' process 1, error = Bad \
file descriptor.
is reported, check if the current directory holding the program mpi_latency is accessible on all nodes.
Verify that the program is executable on all nodes.
6.4. Problem: bad performance
Verify that the proper interconnect and/or transport is used: check for environment variables controlling
transport (see Section 5.8, “Controlling ParaStation5 communication paths” and ps_environment(5)).
Watch protocol counters, e.g. counters indicating timeouts, retries, errors or other bad conditions. For
p4sock, check
recv_net_data
and
recv_user
. See Section 5.2, “ParaStation5 protocol p4sock”.
Look for a crystal bowl!
Or contact
.
6.5. Problem: different groups of nodes are seen as up
or down
Problem: depending on which node the psiadmin is run, different groups of nodes are seen as "up" or
"down".
Check for identical configuration on each node, e.g. compare the configuration file
/etc/
parastation.conf
on each node.
6.6. Problem: cannot start process on front end
Problem: Starting a job is canceled giving the error message
Connecting client 139.27.166.22:44784 (rank 6) failed : Network is
unreachable
PSIlogger: Child with rank 12 exited with status 1.