[gridengine users] qsub and reservation

Discussion:

Roberto Nunnari

2017-03-08 17:33:23 UTC

Hello.

I am using Oracle Grid Engine 6.2u7 and have some trouble understanding
reservation (qsub -R y ..).

I'm trying to use this because of big jobs starving because of queues
always full of smaller jobs..

Apparently the -R y switch doesn't help at all.. somebody long ago told
me it's a bug in my version of grid engine..

Is there a way to find out what is going on with reservation? qstat -j
jobID doesn't show nothing about it..

Any ideas or hints?

Thank you and best regards.

--
Roberto Nunnari
Servizi Informatici Ti-Edu SUPSI USI
Via Pobiette 11 - 6928 Manno - Switzerland

Reuti

2017-03-08 17:48:35 UTC

Permalink

Hi,

I am using Oracle Grid Engine 6.2u7 and have some trouble understanding reservation (qsub -R y ..).
I'm trying to use this because of big jobs starving because of queues always full of smaller jobs..
Apparently the -R y switch doesn't help at all.. somebody long ago told me it's a bug in my version of grid engine..
Is there a way to find out what is going on with reservation? qstat -j jobID doesn't show nothing about it..

- do you request any expected runtime in the job submissions (-l h_rt=âŠ)?
- is a sensible default set in `qconf -msconf` for the runtime (default_duration 8760:00:00)?
- is a sensible default set in `qconf -msconf` for the number of reservations (max_reservation 20)?

-- Reuti

Roberto Nunnari

2017-03-09 13:24:38 UTC

Permalink

Hi Reuti.
Hi William.

here's my settings you required:
params MONITOR=1
max_reservation 32
default_duration 0:10:0

I cannot understand how What I see in
${SGE_ROOT}/${SGE_CELL}/common/schedule can help me.. here's a little
extract for a job submitted with -R y, and it keeps repeating without change
...
3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
3653372:1:RESERVING:1489043424:660:Q:***@node19.cluster:slots:32.000000
3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
3653372:1:RESERVING:1489043424:660:Q:***@node19.cluster:slots:32.000000
...

Thank you for your help.
Roberto

Hi,

- do you request any expected runtime in the job submissions (-l h_rt=…)?
- is a sensible default set in `qconf -msconf` for the runtime (default_duration 8760:00:00)?
- is a sensible default set in `qconf -msconf` for the number of reservations (max_reservation 20)?
-- Reuti

--
Roberto Nunnari
Servizi Informatici Ti-Edu
Via Pobiette 11 - 6928 Manno - Switzerland
helpdesk email: mailto: ***@ti-edu.ch
direct email: mailto:***@supsi.ch
tel: +41-58-6666561

Reuti

2017-03-09 14:14:24 UTC

Permalink

Hi,

Post by Roberto Nunnari
Hi Reuti.
Hi William.
params MONITOR=1
max_reservation 32
default_duration 0:10:0
I cannot understand how What I see in ${SGE_ROOT}/${SGE_CELL}/common/schedule can help me.. here's a little extract for a job submitted with -R y, and it keeps repeating without change
...
3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000

What else is running in the cluster? Are there other jobs blocked which would otherwise slip in? All request -l h_rt=âŠ?

(When no job requests -l h_rt=âŠ and only the default length apply [which won't be enforced], SGE might look for another node to make the reservation.)

-- Reuti

Roberto Nunnari

2017-03-09 16:41:26 UTC

Permalink

Hi,

What else is running in the cluster? Are there other jobs blocked which would otherwise slip in? All request -l h_rt=…?

Hi.

There are always smaller jobs (without -R y) pending in the queue that
get in front of bigger jobs (with -R y).
The user of this big job doesn't make use of options like h_rt,
mem_free, etc.. but only asks for a particular node, ie:
hostname=node19.cluster

(When no job requests -l h_rt=… and only the default length apply [which won't be enforced], SGE might look for another node to make the reservation.)

the other users usually use -l h_rt=.. and mem_free=.. and as they are
serial jobs or parallel jobs that asks less resources, they slip in
front of the job that asks more resources even if it was submitted long
before and makes use of -R y.

One more question. how can I understand that something is moving with
reservation (ie see that the scheduler has started reserving slots) by
looking in the file ${SGE_ROOT}/${SGE_CELL}/common/schedule ?

Thank you.
Roberto

--
Roberto Nunnari
Servizi Informatici Ti-Edu
Via Pobiette 11 - 6928 Manno - Switzerland
helpdesk email: mailto: ***@ti-edu.ch
direct email: mailto:***@supsi.ch
tel: +41-58-6666561

Reuti

2017-03-09 17:52:38 UTC

Permalink

Post by Reuti
Hi,

What else is running in the cluster? Are there other jobs blocked which would otherwise slip in? All request -l h_rt=âŠ?

Hi.
There are always smaller jobs (without -R y) pending in the queue that get in front of bigger jobs (with -R y).
The user of this big job doesn't make use of options like h_rt, mem_free, etc.. but only asks for a particular node, ie: hostname=node19.cluster

So essentially the node19 should get drained over time.

Post by Reuti
(When no job requests -l h_rt=âŠ and only the default length apply [which won't be enforced], SGE might look for another node to make the reservation.)

What you can see of course is the possible back-filling of node19. Can you check the requested h_rt requests for the other jobs already running on node19? As long as the longest job on this node will run, shorter jobs can be filled in in case their runtime is lower than this longest job will continue to run.

One more question. how can I understand that something is moving with reservation (ie see that the scheduler has started reserving slots) by looking in the file ${SGE_ROOT}/${SGE_CELL}/common/schedule ?

When you request a special node, the reservation can't move to another node. I saw this only in case the job with -R y may freely be scheduled inside the cluster and the already running jobs have no h_rt (hence the default_runtime applies) and they run much longer than anticipated, so that the reservation at one point can be fulfilled sooner when it moves to a another node.

-- Reuti

Roberto Nunnari

2017-03-09 18:29:25 UTC

Permalink

Post by Reuti

Hi,

What else is running in the cluster? Are there other jobs blocked which would otherwise slip in? All request -l h_rt=…?

Hi.
There are always smaller jobs (without -R y) pending in the queue that get in front of bigger jobs (with -R y).
The user of this big job doesn't make use of options like h_rt, mem_free, etc.. but only asks for a particular node, ie: hostname=node19.cluster

So essentially the node19 should get drained over time.

Yes, I expect that over time slots on node19 will be reserved for the
job requesting reservation, as they become free when jobs running on
node19 exit.

Post by Reuti

(When no job requests -l h_rt=… and only the default length apply [which won't be enforced], SGE might look for another node to make the reservation.)

I don't mean move from node to node.. by moving I mean that something
happens in the scheduler.. that the scheduler reserves a slot for the
pending job requesting reservation.. in the schedule file, I see only
lines with the word RESERVING.. and never something like RESERVED.. or
little changes that tell me that something is changing.. I always see
lines like these:
3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
3653372:1:RESERVING:1489043424:660:Q:***@node19.cluster:slots:32.000000
I believe that if the scheduler reserves a slot, something in these
lines should change..

Thank you.

--
Roberto Nunnari
Servizi Informatici Ti-Edu
Via Pobiette 11 - 6928 Manno - Switzerland
helpdesk email: mailto: ***@ti-edu.ch
direct email: mailto:***@supsi.ch
tel: +41-58-6666561

William Hay

2017-03-10 08:46:23 UTC

Permalink

Post by Roberto Nunnari
I don't mean move from node to node.. by moving I mean that something
happens in the scheduler.. that the scheduler reserves a slot for the
pending job requesting reservation.. in the schedule file, I see only lines
with the word RESERVING.. and never something like RESERVED.. or little
changes that tell me that something is changing.. I always see lines like
3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
I believe that if the scheduler reserves a slot, something in these lines
should change..

Nope. RESERVING is what the schedule file says when the scheduler
reserves a resource. The file is describing what the scheduler is
doing not the state of the resource. One quirk is that the scheduler
reconsiders where the reservation goes with each scheduling cycle.
The reservation just prevents jobs from starting that would conflict with
the reservation this scheduling cycle. This means that the same resources
should have the same predicted availability next scheduling cycle..

As long as the number in the 4th field isn't continually increase your job should
eventually get the resources marked as RESERVING.

One quirk is that if you heavily weight a per user functional share
policy(like we do at UCL) then small jobs from a user backfilling can
deprive large jobs from that same user of the priority needed to hold
onto a reservation. The workaround for this is to educate said users
to have their small jobs wait for (depend on) the large job to run.
If you use functional share based on somethng other than users have fun
convincing said users to co-ordinate.

William

William Hay

2017-03-10 08:20:55 UTC

Permalink

Post by Roberto Nunnari
Hi Reuti.
Hi William.
params MONITOR=1
max_reservation 32
default_duration 0:10:0
I cannot understand how What I see in
${SGE_ROOT}/${SGE_CELL}/common/schedule can help me.. here's a little
extract for a job submitted with -R y, and it keeps repeating without change
...
3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
...
Thank you for your help.
Roberto

The 4th field is the date of the reservation in seconds since the unix epoch.

date -***@1489043424

Will convert it to something readable (early yesterday morning). The 5th field
(despite what the man page says) is the duration of the reservation. You usually
want to look at the information from the last scheduling run (scheduling runs are
separated from each other by lines of consecurive colons).

That default_duration looks a little short. The scheduler is assuming any running jobs
will terminate in under 10 minutes and as a result is probably trying to reserve
resoorces that won't actually be free when the reservation comes due.

To make reservations work you really need most jobs to have a hard time limit associated with them
and a long default_duration (as in Reuti's example) to encourage the scheduler not to schedule jobs
on resources currently occupied by jobs without such a limit.

Post by Roberto Nunnari

- do you request any expected runtime in the job submissions (-l h_rt=???)?
- is a sensible default set in `qconf -msconf` for the runtime (default_duration 8760:00:00)?
- is a sensible default set in `qconf -msconf` for the number of reservations (max_reservation 20)?

Bear in mind that, with your current config, only the 32 highest priority jobs in the queue that request
one will get a reservation.

William

William Hay

2017-03-09 09:53:36 UTC

Permalink

Post by Roberto Nunnari
Hello.
I am using Oracle Grid Engine 6.2u7 and have some trouble understanding
reservation (qsub -R y ..).
I'm trying to use this because of big jobs starving because of queues always
full of smaller jobs..
Apparently the -R y switch doesn't help at all.. somebody long ago told me
it's a bug in my version of grid engine..
Is there a way to find out what is going on with reservation? qstat -j jobID
doesn't show nothing about it..
Any ideas or hints?

Make sure you have max_reservation set to some positive number in the scheduler configuration.

In order to see what is going on you can set MONITOR=1 in the scheduler's params.

The schedulers view of what is happening (including reservations) will then be recorded in
${SGE_ROOT}/${SGE_CELL}/common/schedule

William