Discussion:
[gridengine users] qmaster: hard descriptor limit and soft descriptor limit?
Brodie, Kent
2012-10-12 18:55:07 UTC
Permalink
Seeing this on my head node:

10/12/2012 08:56:18| main|lima|I|read job database with 0 entries in 0 seconds
10/12/2012 08:56:18| main|lima|W|nr of dynamic event clients exceeds max file descriptor limit, setting MAX_DYN_EC=979
10/12/2012 08:56:18| main|lima|I|qmaster hard descriptor limit is set to 1024
10/12/2012 08:56:18| main|lima|I|qmaster soft descriptor limit is set to 1024
10/12/2012 08:56:18| main|lima|I|qmaster will use max. 1004 file descriptors for communication
10/12/2012 08:56:18| main|lima|I|qmaster will accept max. 979 dynamic event clients
10/12/2012 08:56:18| main|lima|I|starting up SGE 8.1.2 (lx-amd64)


I'm confused; I cannot figure out why the limit of 1024 is coming up?



I have 8192 (soft) and 65536 (hard) set for my system-wide file descriptor limits .

[***@lima sge]# ulimit -aS
....
open files (-n) 8192


[***@lima sge]# ulimit -aH
...
open files (-n) 65536



Hopefully missing something obvious, --Kent
Reuti
2012-10-12 19:40:03 UTC
Permalink
Post by Brodie, Kent
10/12/2012 08:56:18| main|lima|I|read job database with 0 entries in 0 seconds
10/12/2012 08:56:18| main|lima|W|nr of dynamic event clients exceeds max file descriptor limit, setting MAX_DYN_EC=979
10/12/2012 08:56:18| main|lima|I|qmaster hard descriptor limit is set to 1024
10/12/2012 08:56:18| main|lima|I|qmaster soft descriptor limit is set to 1024
10/12/2012 08:56:18| main|lima|I|qmaster will use max. 1004 file descriptors for communication
10/12/2012 08:56:18| main|lima|I|qmaster will accept max. 979 dynamic event clients
10/12/2012 08:56:18| main|lima|I|starting up SGE 8.1.2 (lx-amd64)
I'm confused; I cannot figure out why the limit of 1024 is coming up?
It is set during boot when the daemon is started (the limit can be different from a later user login). But you can check the parameters:

S_DESCRIPTORS, H_DESCRIPTORS

in `man sge_conf`.

-- Reuti
Post by Brodie, Kent
I have 8192 (soft) and 65536 (hard) set for my system-wide file descriptor limits .
....
open files (-n) 8192
...
open files (-n) 65536
Hopefully missing something obvious, --Kent
_______________________________________________
users mailing list
https://gridengine.org/mailman/listinfo/users
Brodie, Kent
2012-10-12 20:30:49 UTC
Permalink
I'm confused. I thought that parameter relates to execution hosts? (for which, I have confirmed it's correct). AND- I have my execd params set as follows:

execd_params S_DESCRIPTORS=8192,H_DESCRIPTORS=65536

I'm referring to the QMASTER message from the qmaster spool. I am concerned about 1024 being too low.
Brodie, Kent
2012-10-12 20:51:53 UTC
Permalink
Nevermind. I have it working somehow. I must have set those S_ and H_ descriptor settings after the qmaster started. *Swore* I rebooted everything, but who knows. I restarted qmaster and everything looks fine. Happy weekend, everyone.
Brodie, Kent
2012-10-12 20:59:26 UTC
Permalink
Scratch that. OK, now I'm stumped.

Here is my qmaster log when simply 'restarting' the qmaster deamon:

10/12/2012 15:49:19| main|lima|I|controlled shutdown 8.1.2
10/12/2012 15:49:25| main|lima|I|read job database with 1 entries in 0 seconds
10/12/2012 15:49:25| main|lima|I|qmaster hard descriptor limit is set to 65536
10/12/2012 15:49:25| main|lima|I|qmaster soft descriptor limit is set to 8192
10/12/2012 15:49:25| main|lima|I|qmaster will use max. 8172 file descriptors for communication
10/12/2012 15:49:25| main|lima|I|qmaster will accept max. 1000 dynamic event clients
10/12/2012 15:49:25| main|lima|I|starting up SGE 8.1.2 (lx-amd64)

Note, the 8192 and 65536. Yay! This comes from the /etc/security/limits.conf and also probably from the S_DESCRIPTORS / H_DESCRIPTORS setting.



Now, look what happens when the system is rebooted:

10/12/2012 15:50:27| main|lima|I|controlled shutdown 8.1.2
10/12/2012 15:55:47| main|lima|I|read job database with 1 entries in 0 seconds
10/12/2012 15:55:47| main|lima|W|nr of dynamic event clients exceeds max file descriptor limit, setting MAX_DYN_EC=979
10/12/2012 15:55:47| main|lima|I|qmaster hard descriptor limit is set to 1024
10/12/2012 15:55:47| main|lima|I|qmaster soft descriptor limit is set to 1024
10/12/2012 15:55:47| main|lima|I|qmaster will use max. 1004 file descriptors for communication
10/12/2012 15:55:47| main|lima|I|qmaster will accept max. 979 dynamic event clients
10/12/2012 15:55:47| main|lima|I|starting up SGE 8.1.2 (lx-amd64)


WHOA. Where the heck is the 1024 limit coming from?
Alex Chekholko
2012-10-12 21:19:31 UTC
Permalink
Post by Brodie, Kent
Scratch that. OK, now I'm stumped.
10/12/2012 15:49:19| main|lima|I|controlled shutdown 8.1.2
10/12/2012 15:49:25| main|lima|I|read job database with 1 entries in 0 seconds
10/12/2012 15:49:25| main|lima|I|qmaster hard descriptor limit is set to 65536
10/12/2012 15:49:25| main|lima|I|qmaster soft descriptor limit is set to 8192
10/12/2012 15:49:25| main|lima|I|qmaster will use max. 8172 file descriptors for communication
10/12/2012 15:49:25| main|lima|I|qmaster will accept max. 1000 dynamic event clients
10/12/2012 15:49:25| main|lima|I|starting up SGE 8.1.2 (lx-amd64)
Note, the 8192 and 65536. Yay! This comes from the /etc/security/limits.conf and also probably from the S_DESCRIPTORS / H_DESCRIPTORS setting.
10/12/2012 15:50:27| main|lima|I|controlled shutdown 8.1.2
10/12/2012 15:55:47| main|lima|I|read job database with 1 entries in 0 seconds
10/12/2012 15:55:47| main|lima|W|nr of dynamic event clients exceeds max file descriptor limit, setting MAX_DYN_EC=979
10/12/2012 15:55:47| main|lima|I|qmaster hard descriptor limit is set to 1024
10/12/2012 15:55:47| main|lima|I|qmaster soft descriptor limit is set to 1024
10/12/2012 15:55:47| main|lima|I|qmaster will use max. 1004 file descriptors for communication
10/12/2012 15:55:47| main|lima|I|qmaster will accept max. 979 dynamic event clients
10/12/2012 15:55:47| main|lima|I|starting up SGE 8.1.2 (lx-amd64)
WHOA. Where the heck is the 1024 limit coming from?
I think those env settings are inherited from the parent process. So
maybe qmaster is started up by some init process that didn't get its
settings from limits.conf?

I bet if you restart the process, it'll get the settings from your
current root shell, and all will be well.

Regards,
--
Alex Chekholko ***@stanford.edu
Brodie, Kent
2012-10-12 21:26:03 UTC
Permalink
Yes, this is the case. Restarting (qmaster) again, it's fine. (8192 / 65536)

OK, so how the heck do I get my qmaster process to have the right number of file descriptors following a reboot? I really can't have this so I have to manually re-restart the qmaster. And I'd be floored to discover I'm the first to see this.....

Between having the increased numbers in both /etc/sysctl.conf and /etc/security/limits.conf, I've followed any redhat document I can find.

Obviously, something is missing. I agree, it has to do with the init process.
Post by Brodie, Kent
WHOA. Where the heck is the 1024 limit coming from?
I think those env settings are inherited from the parent process. So maybe qmaster is started up by some init process that didn't get its settings from limits.conf?
I bet if you restart the process, it'll get the settings from your current root shell, and all will be well.
Regards,
--
_______________________________________________
Brodie, Kent
2012-10-12 21:52:49 UTC
Permalink
Got around by editing /etc/init.d/sgemaster.{cellname} , and adding "ulimit -n 8192" in there near the top.

If anyone has a more elegant solution, I'd love to know it!

Happy weekend everyone.
Dave Love
2012-10-15 10:47:59 UTC
Permalink
Post by Reuti
S_DESCRIPTORS, H_DESCRIPTORS
in `man sge_conf`.
To avoid possible confusion: those are execd parameters. MAX_DYN_EC is
the relevant qmaster one, limited by the process file descriptor limit
and a number set aside for static connexions. (It was raised a while
ago from a much lower value which only allowed a few clients.)
--
Community Grid Engine: http://arc.liv.ac.uk/SGE/
Dave Love
2012-10-15 10:44:38 UTC
Permalink
Post by Brodie, Kent
10/12/2012 08:56:18| main|lima|I|read job database with 0 entries in 0 seconds
10/12/2012 08:56:18| main|lima|W|nr of dynamic event clients exceeds max file descriptor limit, setting MAX_DYN_EC=979
10/12/2012 08:56:18| main|lima|I|qmaster hard descriptor limit is set to 1024
10/12/2012 08:56:18| main|lima|I|qmaster soft descriptor limit is set to 1024
10/12/2012 08:56:18| main|lima|I|qmaster will use max. 1004 file descriptors for communication
10/12/2012 08:56:18| main|lima|I|qmaster will accept max. 979 dynamic event clients
10/12/2012 08:56:18| main|lima|I|starting up SGE 8.1.2 (lx-amd64)
I'm confused; I cannot figure out why the limit of 1024 is coming up?
That's normal on GNU/Linux systems:

# grep files /proc/$(pgrep qmaster)/limits
Max open files 1024 1024 files

Probably I should take the default MAX_DYN_EC down to avoid the harmless
message.

Do you really need >~1000 dynamic clients?
--
Community Grid Engine: http://arc.liv.ac.uk/SGE/
Loading...