I got it working again, there was already a proces of execd running that
needed to be killed and then restart the services.
I'm trying to run a script now:
#!/bin/bash
#$-cwd
#$-N SA
#$-S /bin/sh
#$-t 1-4200:1
/var/software/packages/Mathematica/7.0/Executables/math -run
"teller=$SGE_TASK_ID;<< ModelCaCO31.m"
but it gives the following output:
stdin: is not a tty
and this is the output of my qstat -f:
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
***@camilla.UGent.be BIP 0/1/1 0.70 lx26-amd64
35 0.50000 SA root r 11/14/2012 09:57:47 1 1
---------------------------------------------------------------------------------
***@node0 BIP 0/24/24 27.71 lx26-amd64
35 0.50000 SA root r 11/14/2012 09:57:47 1 2
35 0.50000 SA root r 11/14/2012 09:57:47 1 3
35 0.50000 SA root r 11/14/2012 09:57:47 1 4
35 0.50000 SA root r 11/14/2012 09:57:47 1 5
35 0.50000 SA root r 11/14/2012 09:57:47 1 6
35 0.50000 SA root r 11/14/2012 09:57:47 1 7
35 0.50000 SA root r 11/14/2012 09:57:47 1 8
35 0.50000 SA root r 11/14/2012 09:57:47 1 9
35 0.50000 SA root r 11/14/2012 09:57:47 1 10
35 0.50000 SA root r 11/14/2012 09:57:47 1 11
35 0.50000 SA root r 11/14/2012 09:57:47 1 12
35 0.50000 SA root r 11/14/2012 09:57:47 1 13
35 0.50000 SA root r 11/14/2012 09:57:47 1 14
35 0.50000 SA root r 11/14/2012 09:57:47 1 15
35 0.50000 SA root r 11/14/2012 09:57:47 1 16
35 0.50000 SA root r 11/14/2012 09:57:47 1 17
35 0.50000 SA root r 11/14/2012 09:57:47 1 18
35 0.50000 SA root r 11/14/2012 09:57:47 1 19
35 0.50000 SA root r 11/14/2012 09:57:47 1 20
35 0.50000 SA root r 11/14/2012 09:57:47 1 21
35 0.50000 SA root r 11/14/2012 09:57:47 1 22
35 0.50000 SA root r 11/14/2012 09:57:47 1 23
35 0.50000 SA root r 11/14/2012 09:57:47 1 24
35 0.50000 SA root r 11/14/2012 09:57:47 1 25
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
35 0.50000 SA root qw 11/14/2012 09:57:38 1
26-4200:1
***@camilla:/nfs/share/sge# qstat -explain c -j 35
==============================================================
job_number: 35
exec_file: job_scripts/35
submission_time: Wed Nov 14 09:57:38 2012
owner: root
uid: 0
group: root
gid: 0
sge_o_home: /root
sge_o_log_name: root
sge_o_path:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
sge_o_shell: /bin/bash
sge_o_workdir: /nfs/share/sge
sge_o_host: camilla
account: sge
cwd: /nfs/share/sge
mail_list: ***@camilla
notify: FALSE
job_name: SA
jobshare: 0
shell_list: NONE:/bin/sh
env_list:
script_file: HistDisCaCO31.sh
job-array tasks: 1-4200:1
usage 1: cpu=00:05:20, mem=105.16135 GBs, io=0.01537,
vmem=1.110G, maxvmem=1.110G
usage 2: cpu=00:04:17, mem=179.44371 GBs, io=0.01395,
vmem=3.643G, maxvmem=3.643G
usage 3: cpu=00:04:37, mem=191.69532 GBs, io=0.01394,
vmem=3.657G, maxvmem=3.657G
usage 4: cpu=00:04:34, mem=188.12645 GBs, io=0.01394,
vmem=3.655G, maxvmem=3.655G
usage 5: cpu=00:04:16, mem=180.18292 GBs, io=0.01394,
vmem=3.636G, maxvmem=3.636G
usage 6: cpu=00:04:22, mem=183.47616 GBs, io=0.01394,
vmem=3.644G, maxvmem=3.644G
usage 7: cpu=00:04:15, mem=179.89624 GBs, io=0.01400,
vmem=3.640G, maxvmem=3.640G
usage 8: cpu=00:04:55, mem=207.28643 GBs, io=0.01394,
vmem=3.669G, maxvmem=3.669G
usage 9: cpu=00:04:27, mem=184.86707 GBs, io=0.01394,
vmem=3.653G, maxvmem=3.653G
usage 10: cpu=00:04:14, mem=179.09446 GBs, io=0.01394,
vmem=3.635G, maxvmem=3.635G
usage 11: cpu=00:04:47, mem=195.80372 GBs, io=0.01400,
vmem=3.668G, maxvmem=3.668G
usage 12: cpu=00:04:49, mem=203.43895 GBs, io=0.01394,
vmem=3.665G, maxvmem=3.665G
usage 13: cpu=00:04:45, mem=196.67175 GBs, io=0.01394,
vmem=3.663G, maxvmem=3.663G
usage 14: cpu=00:04:24, mem=185.68047 GBs, io=0.01400,
vmem=3.648G, maxvmem=3.648G
usage 15: cpu=00:04:40, mem=195.96253 GBs, io=0.01394,
vmem=3.656G, maxvmem=3.656G
usage 16: cpu=00:04:11, mem=179.84016 GBs, io=0.01394,
vmem=3.633G, maxvmem=3.633G
usage 17: cpu=00:04:43, mem=196.21689 GBs, io=0.01394,
vmem=3.662G, maxvmem=3.662G
usage 18: cpu=00:04:37, mem=197.39875 GBs, io=0.01394,
vmem=3.653G, maxvmem=3.653G
usage 19: cpu=00:04:35, mem=191.55982 GBs, io=0.01394,
vmem=3.653G, maxvmem=3.653G
usage 20: cpu=00:04:26, mem=191.62928 GBs, io=0.01394,
vmem=3.643G, maxvmem=3.643G
usage 21: cpu=00:04:42, mem=197.87398 GBs, io=0.01394,
vmem=3.660G, maxvmem=3.660G
usage 22: cpu=00:04:36, mem=193.43107 GBs, io=0.01394,
vmem=3.652G, maxvmem=3.652G
usage 23: cpu=00:04:32, mem=193.12103 GBs, io=0.01394,
vmem=3.652G, maxvmem=3.652G
usage 24: cpu=00:04:25, mem=186.56485 GBs, io=0.01400,
vmem=3.644G, maxvmem=3.644G
usage 25: cpu=00:04:51, mem=201.81706 GBs, io=0.01400,
vmem=3.669G, maxvmem=3.669G
scheduling info: queue instance "***@camilla" dropped because
it is full
queue instance "***@node0" dropped because
it is full
All queues dropped because of overload or full
not all array task may be started due to
'max_aj_instances'
You guys know how this can be solved?
http://verahill.blogspot.be/2012/06/setting-up-sun-grid-engine-with-three.htmlon how to install the SGE. It all went fine on my masternode but on my exec
node i have some troubles.
Post by jan roels11/13/2012 13:44:43| main|node0|E|communication error for
"node0/execd/1" running on port 6445: "can't bind socket"
Is there already something running on this port - any older version of the execd?
Post by jan roels11/13/2012 13:44:44| main|node0|E|commlib error: can't bind socket (no
additional information available)
Post by jan roels11/13/2012 13:45:12| main|node0|C|abort qmaster registration due to
communication errors
Post by jan roels11/13/2012 13:45:14| main|node0|W|daemonize error: child exited before
sending daemonize state
Post by jan roelsbut then i killed the proces and restarted the gridengine-execd but then
/etc/init.d/gridengine-exec restart
* Restarting Sun Grid Engine Execution Daemon sge_execd
error: can't resolve host name
Post by jan roelserror: can't get configuration from qmaster -- backgrounding
What can i do to fix this?
Any firewall on the machines? Ports 6444 and 6445 need to be excluded.
-- Reuti
Post by jan roels_______________________________________________
users mailing list
https://gridengine.org/mailman/listinfo/users