Discussion:
[gridengine users] Interpretation of Qacct output
Kidwai, Hashir Karim
2014-01-08 12:11:59 UTC
Permalink
Hello,

I am sure lots of people have asked the similar question, but I couldn’t find the exact answer which I am looking for.


I have run the command on cluster of 18 computing nodes with 12 cores each (qacct –j for a particular user) against all the jobs past 30 days and compile the results as follows.




Wall Clock

CPU Time



Job #933

Job #932

Job #935

Job #936

Job #937

Job #939

Job #940

Job #934

Job #944

Job #943

Job #942

Job #931

Job#930

Job #929

Job #927

Dec-13

654.334





69.1

165.99

3.301

0.0005

13.18

7.7

0.0005

122

13.49

52.56

207

0

0

0.012

0.002





5725.5522



822.17

0.0006

39.38

0.0006

157

0.0002

0.0006

1450

159

626

2472

0

0

0

0.0002

Slots







96

48

60

60

60

60

60

60

60

60

60

48

48

48

48










































I am analyzing and comparing the CPU time and Wall clock time in hours (from qacct command) for job submitted and finished in the month of December-2013. These are my findings, so please correct me if I am mistaken.



1. Wall clock time is the time from job submission to job finish.

2. CPU time is the usage time during the job execution. Since every node is equipped with 12 cores, one should divide the time (except for few instances) with 12 cores which will give one the same or close to the same time as the wall clock. But it is infact the total time of all the Cores involved in running the job (??). What is exactly the logic behind it, if my assumption is right ?

3. Slots are basically the total # of cores involved in job execution (slots = Cores)??

4. In some instances (not shown in the above table), although wall clock is quite significant but the CPU usage time is close to 0, what could be the logic behind it, it could be a problem with the job or any other factor ?

5. While analyzing the jobs , I noticed that there is only one hostname (compute node) associated with the job, why is that so? What about other nodes which are running the same job, is there a way to trace them?



I really appreciate somebody’s feedback on the above.

Thanks
Hashir


________________________________

The contents of this email, including all related responses, files and attachments transmitted with it (collectively referred to as “this Email”), are intended solely for the use of the individual/entity to whom/which they are addressed, and may contain confidential and/or legally privileged information. This Email may not be disclosed or forwarded to anyone else without authorization from the originator of this Email. If you have received this Email in error, please notify the sender immediately and delete all copies from your system. Please note that the views or opinions presented in this Email are those of the author and may not necessarily represent those of Saudi Aramco. The recipient should check this Email and any attachments for the presence of any viruses. Saudi Aramco accepts no liability for any damage caused by any virus/error transmitted by this Email.
Reuti
2014-01-08 13:16:02 UTC
Permalink
Hi,
I am sure lots of people have asked the similar question, but I couldn’t find the exact answer which I am looking for.
I have run the command on cluster of 18 computing nodes with 12 cores each (qacct –j for a particular user) against all the jobs past 30 days and compile the results as follows.
<snip>
I am analyzing and comparing the CPU time and Wall clock time in hours (from qacct command) for job submitted and finished in the month of December-2013. These are my findings, so please correct me if I am mistaken.
1. Wall clock time is the time from job submission to job finish.
No. It's start time of the job to stop time of the job. When the job was submitted is unrelated to the measured wall clock time.
2. CPU time is the usage time during the job execution.
Yes.
Since every node is equipped with 12 cores, one should divide the time (except for few instances) with 12 cores which will give one the same or close to the same time as the wall clock.
(Not by 12, but the number of requested and used cores for this job. Otherwise slave tasks are not tightly integrated into SGE [see below]. Unless you need X11 forwarding inside the cluster, you can even disable ssh/rsh in the cluster [I do this to allow only admin staff to reach the nodes by a direct `ssh`])

This depends on the implementation of the application. If it scales perfectly linear with an increasing number of cores: yes. But often you gain only a speedup by a much smaller rate like 66% per core instead of 50% for each doubling of number of cores. And due to the algorithm and communcation between the processes some cores might be idle for some time and hence the overall CPU time across all cores divided by X is less than the wall clock time. There is nothing you can do about it, except not using too many cores for this particular application.

E.g. gaining only 66% instead instead of 50%, using 8 cores would mean to lower the execution time (from 1.0) to 0.287 (reflecting needed wall clock time) instead of 0.125: roughly half of the computing time is wasted. The real used CPU time might lay somewhere between 0.287 * 8 and 0.125 * 8.
But it is infact the total time of all the Cores involved in running the job (??). What is exactly the logic behind it, if my assumption is right ?
3. Slots are basically the total # of cores involved in job execution (slots = Cores)??
Yes.
4. In some instances (not shown in the above table), although wall clock is quite significant but the CPU usage time is close to 0, what could be the logic behind it, it could be a problem with the job or any other factor ?
This might happen if a parallel library is not tightly integrated into SGE. I.e.: the main job script starting the `mpiexec` doesn't consume any computing time at all (only a fraction for the startup), and the kids are not tracked by SGE. Which library was taken for these jobs showing almost no computing time, what are the settings of the requested PE and the submission command itself?
5. While analyzing the jobs , I noticed that there is only one hostname (compute node) associated with the job, why is that so? What about other nodes which are running the same job, is there a way to trace them?
For a tightly integrated parallel job where all slave tasks are tracked by SGE, you would get an entry for each started remote process in `qacct` (unless the parallel environment (PE) has set "accounting_summary TRUE" (`man sge_pe`). If you get several entries, all entried must be summarized up for this job.

-- Reuti

Loading...