12 translating slurm and lsf-hpc jobids, 6 using the bhist command (long output), Translating slurm and lsf-hpc jobids – HP XC System 3.x Software User Manual
Page 96

For detailed information about a finished job, add the -l option to the bhist command, shown in
. The -l option specifies that the long format is requested.
Example 10-6 Using the bhist Command (Long Output)
$ bhist -l 24
Job <24>, User
Interactive pseudo-terminal shell mode,
Extsched
date and time stamp: Submitted from host
to Queue
4 Processors Requested, Requested Resources
date and time stamp: Dispatched to 4 Hosts/Processors
<4*lsfhost.localdomain>;
date and time stamp: slurm_id=22;ncpus=8;slurm_alloc=n[5-8];
date and time stamp: Starting (Pid 4785);
Summary of time in seconds spent in various states by
date and time stamp
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
11 0 124 0 0 0 135
10.12 Translating SLURM and LSF-HPC JOBIDs
LSF-HPC and SLURM are independent resource management components of the HP XC system. They
maintain their own job identifiers (JOBIDs). It may be useful to be able to determine which the
SLURM_JOBID
environment variable matches an LSF JOBID, and vice versa.
When a job is submitted to LSF-HPC, it is given an LSF JOBID, as in this example:
$ bsub -o %J.out -n 8 sleep 300
Job <99> is submitted to default queue
The following is the sequence of events when a SLURM JOBID is assigned:
•
No SLURM_JOBID exists while the job is PENDing in LSF-HPC.
•
After LSF-HPC determines that the resources are available in SLURM for this job, LSF-HPC requests
an allocation in SLURM.
•
After the SLURM allocation is established, there is a corresponding SLURM JOBID for the LSF JOBID.
Use the bjobs command to view the SLURM JOBID:
$ bjobs -l 99 | grep slurm
date and time stamp: slurm_id=123;ncpus=8;slurm_alloc=n[13-16];
The SLURM JOBID is 123 for the LSF JOBID 99.
You can also find the allocation information in the output of the bhist command:
$ bhist -l 99 | grep slurm
date and time stamp: slurm_id=123;ncpus=8;slurm_alloc=n[13-16];
When LSF-HPC creates an allocation in SLURM, it constructs a name for the allocation by combining the
LSF cluster name with the LSF-HPC JOBID. You can see this name with the scontrol and sacct
commands while the job is running:
$ scontrol show job | grep Name
Name=hptclsf@99
$ sacct -j 123
Jobstep Jobname Partition Ncpus Status Error
---------- ------------------ ---------- ------- ---------- -----
96
Using LSF-HPC