This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
tutorial:torque [2016/10/07 09:22] sertalpbilal [ADVANCED] |
tutorial:torque [2017/04/08 15:03] sertalpbilal [Running Solvers (Gurobi/CPLEX/Mosek/AMPL/...)] |
||
---|---|---|---|
Line 2: | Line 2: | ||
TORQUE provides control over batch jobs and distributed computing resources. It is an advanced open-source product based on the original PBS project and incorporates the best of both community and professional development. It incorporates significant advances in the areas of scalability, | TORQUE provides control over batch jobs and distributed computing resources. It is an advanced open-source product based on the original PBS project and incorporates the best of both community and professional development. It incorporates significant advances in the areas of scalability, | ||
+ | |||
+ | |||
+ | |||
+ | ===== Prerequisite ===== | ||
+ | In order to extract your output and error results in Torque, you need to have password-less connection between nodes. If you have not set it once, execute the following commands. These commands create a public and private key so that when a node want to transfer a file to your home folder, it does not require the password. | ||
+ | After connecting to polyps enter: | ||
+ | |||
+ | <code bash> | ||
+ | ssh-keygen -N "" | ||
+ | </ | ||
+ | |||
+ | Then just press ENTER for any question. After that type the following commands: | ||
+ | |||
+ | <code bash> | ||
+ | touch ~/ | ||
+ | chmod 600 ~/ | ||
+ | cat ~/ | ||
+ | </ | ||
+ | Now, you will get the error log and output log files for your jobs. | ||
+ | |||
+ | |||
+ | |||
===== Hardware ===== | ===== Hardware ===== | ||
Line 42: | Line 64: | ||
Now, we will run the code but we are setting the job parameters using '' | Now, we will run the code but we are setting the job parameters using '' | ||
- | ===== Important | + | ===== Options ===== |
+ | ^ Option | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
- | * '' | + | You can find detailed information |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | + | ||
- | See [[http:// | + | |
===== Monitoring and Removing jobs ===== | ===== Monitoring and Removing jobs ===== | ||
To show the jobs use '' | To show the jobs use '' | ||
< | < | ||
- | To show jobs of some user use '' | + | To show jobs of some user use '' |
+ | <code shell> | ||
+ | qdel JOB_ID | ||
+ | </ | ||
+ | Moreover, you can use the following command: | ||
+ | < | ||
+ | < | ||
+ | < | ||
==== Queues ==== | ==== Queues ==== | ||
- | We have few queues '' | + | We have few queues '' |
< | < | ||
Queue Memory CPU Time Walltime Node Run Que Lm State | Queue Memory CPU Time Walltime Node Run Que Lm State | ||
---------------- ------ -------- -------- ---- --- --- -- ----- | ---------------- ------ -------- -------- ---- --- --- -- ----- | ||
+ | gpu -- -- | ||
medium | medium | ||
short -- -- | short -- -- | ||
long | long | ||
batch -- -- | batch -- -- | ||
- | verylong | + | verylong |
- | | + | AMPL -- -- |
- | 0 | + | MOSEK |
</ | </ | ||
+ | |||
+ | If you want to use AMPL or MOSEK, you have to use queue: AMPL or MOSEK, because we have limited licenses for them. | ||
+ | |||
+ | |||
You can see limits using this command '' | You can see limits using this command '' | ||
Line 94: | Line 128: | ||
| very long | 240:00:00 | | | very long | 240:00:00 | | ||
- | ==== Examples ==== | + | ===== Examples |
- | === Submitting Large Memory Job === | + | ==== Submitting |
- | Sometimes your job needs more memory. This can be achieved by '' | + | You can use the option |
- | < | + | |
- | === Running MATLAB | + | <code bash limited.sh> |
+ | qsub -l mem=4gb, | ||
+ | </ | ||
+ | |||
+ | Sometimes your job needs more memory. You can choose a larger memory size with the same option: | ||
+ | |||
+ | <code bash large.pbs> | ||
+ | |||
+ | ==== Running MATLAB | ||
You just have to create a submission job which looks like this | You just have to create a submission job which looks like this | ||
Line 114: | Line 155: | ||
</ | </ | ||
- | === Interactive Jobs === | + | <note tip>Use **-singleCompThread** [[https:// |
+ | |||
+ | ==== Running Solvers | ||
+ | |||
+ | In order to run solvers (such as Gurobi/ | ||
+ | |||
+ | < | ||
+ | |||
+ | This flag enables the solver to find necessary authentication information. | ||
+ | ==== Interactive Jobs ==== | ||
If you do not care where you run your job just use '' | If you do not care where you run your job just use '' | ||
Line 124: | Line 174: | ||
and you will be running interactive session on polyp15. | and you will be running interactive session on polyp15. | ||
- | === Using GPU's === | + | ==== Using GPU' |
Line 132: | Line 182: | ||
However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 | However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 | ||
- | === Running MPI and Parallel Jobs === | + | ==== Running MPI and Parallel Jobs ==== |
<code bash mpi.pbs> | <code bash mpi.pbs> | ||
Line 154: | Line 204: | ||
</ | </ | ||
- | Allocating more than one CPU under PBS can be done in a number of ways, using the -l flag and the following resource descriptions: | + | Allocating more than one CPU under PBS can be done in a number of ways, using the '' |
* nodes - specifies the number of separate nodes that should be allocated | * nodes - specifies the number of separate nodes that should be allocated | ||
Line 160: | Line 210: | ||
* ppn - how many processes to allocate for each node | * ppn - how many processes to allocate for each node | ||
- | The allocation made by pbs will be reflected in the contents of the nodefile, which can be accessed via the $PBS_NODEFILE environment variable. | + | The allocation made by pbs will be reflected in the contents of the nodefile, which can be accessed via the '' |
The difference between ncpus and ppn is a bit subtle. ppn is used when you actually want to allocate multiple processes per node. ncpus is used to qualify the sort of nodes you want, and only secondarily to allocate multiple slots on a cpus. Some examples should help. | The difference between ncpus and ppn is a bit subtle. ppn is used when you actually want to allocate multiple processes per node. ncpus is used to qualify the sort of nodes you want, and only secondarily to allocate multiple slots on a cpus. Some examples should help. | ||
Line 194: | Line 244: | ||
c2 | c2 | ||
</ | </ | ||
+ | |||
+ | ===== Mass Operations ===== | ||
+ | |||
+ | ==== Submitting multiple jobs ==== | ||
+ | An easy way to submit multiple jobs via PBS is using a batch script. Suppose we would like to give all file names inside a folder with MPS extension into our solver. We can write a PBS Script such as | ||
+ | <code bash submit.pbs> | ||
+ | cd / | ||
+ | / | ||
+ | </ | ||
+ | and a BASH script: | ||
+ | <code bash bashloop.sh> | ||
+ | for f in dataset/ | ||
+ | do | ||
+ | qsub -q batch -v FILENAME=$f submit.pbs | ||
+ | done | ||
+ | </ | ||
+ | Here, option '' | ||
+ | |||
+ | After having these two files, simply calling | ||
+ | < | ||
+ | ./ | ||
+ | </ | ||
+ | will submit all jobs into Torque. | ||
+ | |||
+ | ==== Cancelling all jobs ==== | ||
+ | You can call | ||
+ | <code bash> | ||
+ | qselect -u < | ||
+ | </ | ||
+ | to cancel all of your running jobs. | ||
+ | |||
+ | <code bash> | ||
+ | qselect -u < | ||
+ | </ | ||
+ | will cancel all jobs (both running jobs and queue). | ||
+ | |||
===== Advanced ===== | ===== Advanced ===== | ||
Line 220: | Line 306: | ||
- | + | ==== Tensorflow with GPU ==== | |
- | + | To use tensorflow with a specific GPU, say GPU 1, you can simply set | |
- | + | <code bash> | |
- | + | export CUDA_VISIBLE_DEVICES=1 | |
+ | </ | ||
+ | and then schedule your jobs with Torque to perform experiments on GPU 1. |