This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
tutorial:torque [2016/10/18 15:13] sertalpbilal Minor fixes |
tutorial:torque [2023/10/25 13:54] mjm519 [Submitting a Small or Large Memory Job] |
||
---|---|---|---|
Line 2: | Line 2: | ||
TORQUE provides control over batch jobs and distributed computing resources. It is an advanced open-source product based on the original PBS project and incorporates the best of both community and professional development. It incorporates significant advances in the areas of scalability, | TORQUE provides control over batch jobs and distributed computing resources. It is an advanced open-source product based on the original PBS project and incorporates the best of both community and professional development. It incorporates significant advances in the areas of scalability, | ||
+ | |||
+ | |||
+ | |||
+ | ===== Prerequisite ===== | ||
+ | In order to extract your output and error results in Torque, you need to have password-less connection between nodes. If you have not set it once, execute the following commands. These commands create a public and private key so that when a node want to transfer a file to your home folder, it does not require the password. | ||
+ | After connecting to polyps enter: | ||
+ | |||
+ | <code bash> | ||
+ | ssh-keygen -N "" | ||
+ | </ | ||
+ | |||
+ | Then just press ENTER for any question. After that type the following commands: | ||
+ | |||
+ | <code bash> | ||
+ | touch ~/ | ||
+ | chmod 600 ~/ | ||
+ | cat ~/ | ||
+ | </ | ||
+ | Now, you will get the error log and output log files for your jobs. | ||
+ | |||
+ | |||
+ | |||
===== Hardware ===== | ===== Hardware ===== | ||
Line 11: | Line 33: | ||
===== Submitting Jobs ===== | ===== Submitting Jobs ===== | ||
- | |||
- | Check [[# | ||
Jobs can be submitted either using a submission file or directly from command line. First we explain how it is done and then we will discuss the options. | Jobs can be submitted either using a submission file or directly from command line. First we explain how it is done and then we will discuss the options. | ||
Line 24: | Line 44: | ||
#PBS -o / | #PBS -o / | ||
#PBS -l nodes=1: | #PBS -l nodes=1: | ||
+ | #PBS -l pmem=2GB: | ||
#PBS -q batch | #PBS -q batch | ||
Line 41: | Line 62: | ||
</ | </ | ||
If you do not want to write the submission script you can do it just by calling | If you do not want to write the submission script you can do it just by calling | ||
- | < | + | < |
Now, we will run the code but we are setting the job parameters using '' | Now, we will run the code but we are setting the job parameters using '' | ||
- | ===== Important | + | ===== Options ===== |
+ | ^ Option | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
- | * '' | + | You can find detailed information |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | + | ||
- | See [[http:// | + | |
+ | <note tip>You need to use option '' | ||
===== Monitoring and Removing jobs ===== | ===== Monitoring and Removing jobs ===== | ||
To show the jobs use '' | To show the jobs use '' | ||
< | < | ||
- | To show jobs of some user use '' | + | To show jobs of some user use '' |
+ | <code shell> | ||
+ | qdel JOB_ID | ||
+ | </ | ||
+ | Moreover, you can use the following command: | ||
+ | < | ||
+ | < | ||
+ | < | ||
==== Queues ==== | ==== Queues ==== | ||
- | We have few queues '' | + | We have few queues '' |
< | < | ||
Queue Memory CPU Time Walltime Node Run Que Lm State | Queue Memory CPU Time Walltime Node Run Que Lm State | ||
---------------- ------ -------- -------- ---- --- --- -- ----- | ---------------- ------ -------- -------- ---- --- --- -- ----- | ||
+ | gpu -- -- | ||
medium | medium | ||
short -- -- | short -- -- | ||
long | long | ||
batch -- -- | batch -- -- | ||
- | verylong | + | verylong |
- | | + | AMPL -- -- |
- | 0 | + | MOSEK |
</ | </ | ||
+ | |||
+ | If you want to use AMPL or MOSEK, you have to use queue: AMPL or MOSEK, because we have limited licenses for them. | ||
+ | |||
+ | |||
You can see limits using this command '' | You can see limits using this command '' | ||
Line 96: | Line 131: | ||
| very long | 240:00:00 | | | very long | 240:00:00 | | ||
- | ==== Examples ==== | + | ===== Examples |
- | === Submitting Large Memory Job === | + | ==== Submitting |
- | Sometimes your job needs more memory. This can be achieved by '' | + | You can use the option |
- | < | + | |
- | === Running MATLAB | + | <code bash limited.sh> |
+ | qsub -l pmem=4gb,vmem=4gb test.pbs | ||
+ | </ | ||
+ | |||
+ | Sometimes your job needs more memory. You can choose a larger memory size with the same option: | ||
+ | |||
+ | <code bash large.pbs> | ||
+ | |||
+ | To see what resources have been assigned by the batch queuing system run the ulimit command (bash) or limit comamnd: | ||
+ | <code bash pbs job submission command> | ||
+ | <code bash ulimit> | ||
+ | core file size (blocks, -c) 0 | ||
+ | data seg size | ||
+ | scheduling priority | ||
+ | file size | ||
+ | pending signals | ||
+ | max locked memory | ||
+ | max memory size | ||
+ | open files (-n) 65536 | ||
+ | pipe size (512 bytes, -p) 8 | ||
+ | POSIX message queues | ||
+ | real-time priority | ||
+ | stack size (kbytes, -s) unlimited | ||
+ | cpu time | ||
+ | max user processes | ||
+ | virtual memory | ||
+ | file locks (-x) unlimited</ | ||
+ | |||
+ | [[https:// | ||
+ | ==== Running MATLAB ==== | ||
You just have to create a submission job which looks like this | You just have to create a submission job which looks like this | ||
Line 111: | Line 174: | ||
#PBS -o / | #PBS -o / | ||
#PBS -l nodes=1: | #PBS -l nodes=1: | ||
+ | #PBS -l pmem=2GB: | ||
#PBS -q batch | #PBS -q batch | ||
Line 116: | Line 180: | ||
</ | </ | ||
- | === Interactive Jobs === | + | <note tip>Use **-singleCompThread** [[https:// |
+ | |||
+ | ==== Running Solvers | ||
+ | |||
+ | In order to run solvers (such as Gurobi/ | ||
+ | |||
+ | < | ||
+ | |||
+ | This flag enables the solver to find necessary authentication information. | ||
+ | ==== Interactive Jobs ==== | ||
If you do not care where you run your job just use '' | If you do not care where you run your job just use '' | ||
Line 126: | Line 199: | ||
and you will be running interactive session on polyp15. | and you will be running interactive session on polyp15. | ||
- | === Using GPU's === | + | ==== Using GPU' |
Line 134: | Line 207: | ||
However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 | However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 | ||
- | === Running MPI and Parallel Jobs === | + | If you are using TensorFlow in Python, you can set the limit on amount of GPU memory using: |
+ | < | ||
+ | config_tf.gpu_options.per_process_gpu_memory_fraction = p</ | ||
+ | in which **//p//** is the percent of GPU memory (a number between zero and one). | ||
+ | |||
+ | ==== Running MPI and Parallel Jobs ==== | ||
<code bash mpi.pbs> | <code bash mpi.pbs> | ||
Line 196: | Line 274: | ||
c2 | c2 | ||
</ | </ | ||
- | |||
- | ===== Advanced ===== | ||
- | |||
- | |||
- | The qsub command will pass certain environment variables in the Variable_List attribute of the job. These variables will be available to the job. The value for the following variables will be taken from the environment of the qsub command: | ||
- | * **HOME** (the path to your home directory) | ||
- | * **LANG** (which language you are using) | ||
- | * **LOGNAME** (the name that you logged in with) | ||
- | * **PATH** (standard path to excecutables) | ||
- | * **MAIL** (location of the users mail file) | ||
- | * **SHELL** (command shell, i.e bash, | ||
- | * **TZ** (time zone) | ||
- | These values will be assigned to a new name which is the current name prefixed with the string " | ||
- | * **PBS_O_HOST** (the name of the host upon which the qsub command is running) | ||
- | * **PBS_SERVER** (the hostname of the pbs_server which qsub submits the job to) | ||
- | * **PBS_O_QUEUE** (the name of the original queue to which the job was submitted) | ||
- | * **PBS_O_WORKDIR** (the absolute path of the current working directory of the qsub command) | ||
- | * **PBS_ARRAYID** (each member of a job array is assigned a unique identifier) | ||
- | * **PBS_ENVIRONMENT** (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job) | ||
- | * **PBS_JOBID** (the job identifier assigned to the job by the batch system) | ||
- | * **PBS_JOBNAME** (the job name supplied by the user) | ||
- | * **PBS_NODEFILE** (the name of the file contain the list of nodes assigned to the job) | ||
- | * **PBS_QUEUE** (the name of the queue from which the job was executed from) | ||
- | * **PBS_WALLTIME** (the walltime requested by the user or default walltime allotted by the scheduler) | ||
===== Mass Operations ===== | ===== Mass Operations ===== | ||
Line 250: | Line 304: | ||
</ | </ | ||
to cancel all of your running jobs. | to cancel all of your running jobs. | ||
- | |||
- | ===== Prerequisite ===== | ||
- | In order to extract your output and error results in Torque, you need to have password-less connection between nodes. If you have not set it once, execute the following commands. These commands create a public and private key so that when a node want to transfer a file to your home folder, it does not require the password. | ||
- | After connecting to polyps enter: | ||
<code bash> | <code bash> | ||
- | ssh-keygen -N "" | + | qselect |
</ | </ | ||
+ | will cancel all jobs (both running jobs and queue). | ||
- | Then just press ENTER for any question. After that type the following commands: | ||
- | <code bash> | + | ===== Advanced ===== |
- | touch ~/ | + | |
- | chmod 600 ~/ | + | |
- | cat ~/ | + | |
- | </ | + | |
- | Now, you will get the error log and output log files for your jobs. | + | |
+ | The qsub command will pass certain environment variables in the Variable_List attribute of the job. These variables will be available to the job. The value for the following variables will be taken from the environment of the qsub command: | ||
+ | * **HOME** (the path to your home directory) | ||
+ | * **LANG** (which language you are using) | ||
+ | * **LOGNAME** (the name that you logged in with) | ||
+ | * **PATH** (standard path to excecutables) | ||
+ | * **MAIL** (location of the users mail file) | ||
+ | * **SHELL** (command shell, i.e bash, | ||
+ | * **TZ** (time zone) | ||
+ | These values will be assigned to a new name which is the current name prefixed with the string " | ||
+ | * **PBS_O_HOST** (the name of the host upon which the qsub command is running) | ||
+ | * **PBS_SERVER** (the hostname of the pbs_server which qsub submits the job to) | ||
+ | * **PBS_O_QUEUE** (the name of the original queue to which the job was submitted) | ||
+ | * **PBS_O_WORKDIR** (the absolute path of the current working directory of the qsub command) | ||
+ | * **PBS_ARRAYID** (each member of a job array is assigned a unique identifier) | ||
+ | * **PBS_ENVIRONMENT** (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job) | ||
+ | * **PBS_JOBID** (the job identifier assigned to the job by the batch system) | ||
+ | * **PBS_JOBNAME** (the job name supplied by the user) | ||
+ | * **PBS_NODEFILE** (the name of the file contain the list of nodes assigned to the job) | ||
+ | * **PBS_QUEUE** (the name of the queue from which the job was executed from) | ||
+ | * **PBS_WALLTIME** (the walltime requested by the user or default walltime allotted by the scheduler) | ||
+ | |||
+ | |||
+ | ==== Tensorflow with GPU ==== | ||
+ | To use tensorflow with a specific GPU, say GPU 1, you can simply set | ||
+ | <code bash> | ||
+ | export CUDA_VISIBLE_DEVICES=1 | ||
+ | </ | ||
+ | and then schedule your jobs with Torque to perform experiments on GPU 1. |