This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
tutorial:torque [2016/11/10 14:59] sertalpbilal Change of Order |
tutorial:torque [2024/02/28 13:06] mjm519 [Table] |
||
---|---|---|---|
Line 31: | Line 31: | ||
| polyp1--polyp15 | | polyp1--polyp15 | ||
| polyp30 | 24 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz | 128 GB | 2x K80 (4GPUs) | | | polyp30 | 24 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz | 128 GB | 2x K80 (4GPUs) | | ||
+ | |||
+ | |||
+ | Configured Resources as provided in the Maui scheduler. This is pulled from Torque: | ||
+ | PROCS: 16 | ||
+ | MEM: 31G | ||
+ | SWAP: 63G | ||
===== Submitting Jobs ===== | ===== Submitting Jobs ===== | ||
Line 44: | Line 50: | ||
#PBS -o / | #PBS -o / | ||
#PBS -l nodes=1: | #PBS -l nodes=1: | ||
+ | #PBS -l pmem=2GB: | ||
#PBS -q batch | #PBS -q batch | ||
Line 61: | Line 68: | ||
</ | </ | ||
If you do not want to write the submission script you can do it just by calling | If you do not want to write the submission script you can do it just by calling | ||
- | < | + | < |
Now, we will run the code but we are setting the job parameters using '' | Now, we will run the code but we are setting the job parameters using '' | ||
- | ===== Important | + | ===== Options ===== |
+ | ^ Option | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
- | * '' | + | You can find detailed information |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | + | ||
- | See [[http:// | + | |
+ | <note tip>You need to use option '' | ||
===== Monitoring and Removing jobs ===== | ===== Monitoring and Removing jobs ===== | ||
To show the jobs use '' | To show the jobs use '' | ||
< | < | ||
- | To show jobs of some user use '' | + | To show jobs of some user use '' |
+ | <code shell> | ||
+ | qdel JOB_ID | ||
+ | </ | ||
+ | Moreover, you can use the following command: | ||
+ | < | ||
+ | < | ||
+ | < | ||
==== Queues ==== | ==== Queues ==== | ||
- | We have few queues '' | + | We have few queues '' |
< | < | ||
- | Queue | + | Queue |
- | ---------------- ------ -------- -------- ---- | + | ---------------- |
- | gpu -- -- | + | MOSEK |
- | medium | + | AMPL |
- | short -- -- | + | long 30 1 |
- | long | + | gpu 4 |
- | batch | + | verylong |
- | verylong | + | medium |
- | AMPL | + | coraverylong |
- | MOSEK -- -- | + | special |
+ | batch | ||
+ | short 0 | ||
+ | urgent | ||
+ | background | ||
+ | mediumlong | ||
</ | </ | ||
Line 115: | Line 134: | ||
You can see limits using this command '' | You can see limits using this command '' | ||
- | ^ Queue ^ Wall Time ^ | + | ^ Queue |
- | | batch | 01: | + | | urgent |
- | | short | 02: | + | | batch | 01: |
- | | medium | 04: | + | | short |
- | | long | 72: | + | | medium |
- | | very long | + | | mediumlong |
+ | | long | 72: | ||
+ | | verylong | ||
+ | | special | ||
+ | | background | ||
+ | | gpu | ||
+ | | AMPL | | | 8 | 6 | ||
+ | | MOSEK | ||
- | ==== Examples ==== | ||
- | === Submitting Large Memory Job === | ||
- | Sometimes your job needs more memory. This can be achieved by '' | + | Notes: |
- | < | + | * Urgent queue has no limits and jobs have a higher priority over all other jobs in the queues. Please |
+ | | ||
+ | ===== Examples ===== | ||
- | === Running MATLAB | + | ==== Submitting a Small or Large Memory Job ==== |
+ | |||
+ | You can use the option '' | ||
+ | |||
+ | <code bash limited.sh> | ||
+ | qsub -l pmem=4gb, | ||
+ | </ | ||
+ | |||
+ | Sometimes your job needs more memory. You can choose a larger memory size with the same option: | ||
+ | |||
+ | <code bash large.pbs> | ||
+ | |||
+ | To see what resources have been assigned by the batch queuing system run the ulimit command (bash) or limit comamnd: | ||
+ | <code bash pbs job submission command> | ||
+ | <code bash ulimit> | ||
+ | core file size (blocks, -c) 0 | ||
+ | data seg size | ||
+ | scheduling priority | ||
+ | file size | ||
+ | pending signals | ||
+ | max locked memory | ||
+ | max memory size | ||
+ | open files (-n) 65536 | ||
+ | pipe size (512 bytes, -p) 8 | ||
+ | POSIX message queues | ||
+ | real-time priority | ||
+ | stack size (kbytes, -s) unlimited | ||
+ | cpu time | ||
+ | max user processes | ||
+ | virtual memory | ||
+ | file locks (-x) unlimited</ | ||
+ | |||
+ | **[[https:// | ||
+ | ==== Running MATLAB ==== | ||
You just have to create a submission job which looks like this | You just have to create a submission job which looks like this | ||
Line 137: | Line 196: | ||
#PBS -o / | #PBS -o / | ||
#PBS -l nodes=1: | #PBS -l nodes=1: | ||
+ | #PBS -l pmem=2GB: | ||
#PBS -q batch | #PBS -q batch | ||
Line 142: | Line 202: | ||
</ | </ | ||
- | === Interactive Jobs === | + | <note tip>Use **-singleCompThread** [[https:// |
+ | |||
+ | ==== Running Solvers | ||
+ | |||
+ | In order to run solvers (such as Gurobi/ | ||
+ | |||
+ | < | ||
+ | |||
+ | This flag enables the solver to find necessary authentication information. | ||
+ | ==== Interactive Jobs ==== | ||
If you do not care where you run your job just use '' | If you do not care where you run your job just use '' | ||
Line 152: | Line 221: | ||
and you will be running interactive session on polyp15. | and you will be running interactive session on polyp15. | ||
- | === Using GPU's === | + | ==== Using GPU' |
Line 160: | Line 229: | ||
However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 | However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 | ||
- | === Running MPI and Parallel Jobs === | + | If you are using TensorFlow in Python, you can set the limit on amount of GPU memory using: |
+ | < | ||
+ | config_tf.gpu_options.per_process_gpu_memory_fraction = p</ | ||
+ | in which **//p//** is the percent of GPU memory (a number between zero and one). | ||
+ | |||
+ | ==== Running MPI and Parallel Jobs ==== | ||
<code bash mpi.pbs> | <code bash mpi.pbs> | ||
Line 283: | Line 357: | ||
* **PBS_WALLTIME** (the walltime requested by the user or default walltime allotted by the scheduler) | * **PBS_WALLTIME** (the walltime requested by the user or default walltime allotted by the scheduler) | ||
+ | |||
+ | ==== Tensorflow with GPU ==== | ||
+ | To use tensorflow with a specific GPU, say GPU 1, you can simply set | ||
+ | <code bash> | ||
+ | export CUDA_VISIBLE_DEVICES=1 | ||
+ | </ | ||
+ | and then schedule your jobs with Torque to perform experiments on GPU 1. | ||
+ | |||
+ | |||
+ | ====== MOAB Scheduler ====== | ||
+ | PBS Torque is used to schedule and run jobs on our cluster. Two PBS processes are required to run jobs. On the PBS server, the pbs_server process runs to accept your job and add it to the queue. It will also dispatch the job to the nodes to run under the pbs_mom process. | ||
+ | |||
+ | |||
+ | ==== Useful MOAB Commands ==== | ||
+ | 1. [[https:// | ||
+ | |||
+ | 2. [[https:// | ||
+ | |||
+ | 3. [[https:// | ||
+ | |||
+ | ====Useful External Resources==== | ||
+ | [[https:// | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | [[http:// | ||
+ | |||
+ | [[https:// | ||