This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
tutorial:torque [2016/10/18 15:13] sertalpbilal Minor fixes |
tutorial:torque [2017/04/08 15:03] sertalpbilal [Running Solvers (Gurobi/CPLEX/Mosek/AMPL/...)] |
||
---|---|---|---|
Line 2: | Line 2: | ||
TORQUE provides control over batch jobs and distributed computing resources. It is an advanced open-source product based on the original PBS project and incorporates the best of both community and professional development. It incorporates significant advances in the areas of scalability, | TORQUE provides control over batch jobs and distributed computing resources. It is an advanced open-source product based on the original PBS project and incorporates the best of both community and professional development. It incorporates significant advances in the areas of scalability, | ||
+ | |||
+ | |||
+ | |||
+ | ===== Prerequisite ===== | ||
+ | In order to extract your output and error results in Torque, you need to have password-less connection between nodes. If you have not set it once, execute the following commands. These commands create a public and private key so that when a node want to transfer a file to your home folder, it does not require the password. | ||
+ | After connecting to polyps enter: | ||
+ | |||
+ | <code bash> | ||
+ | ssh-keygen -N "" | ||
+ | </ | ||
+ | |||
+ | Then just press ENTER for any question. After that type the following commands: | ||
+ | |||
+ | <code bash> | ||
+ | touch ~/ | ||
+ | chmod 600 ~/ | ||
+ | cat ~/ | ||
+ | </ | ||
+ | Now, you will get the error log and output log files for your jobs. | ||
+ | |||
+ | |||
+ | |||
===== Hardware ===== | ===== Hardware ===== | ||
Line 11: | Line 33: | ||
===== Submitting Jobs ===== | ===== Submitting Jobs ===== | ||
- | |||
- | Check [[# | ||
Jobs can be submitted either using a submission file or directly from command line. First we explain how it is done and then we will discuss the options. | Jobs can be submitted either using a submission file or directly from command line. First we explain how it is done and then we will discuss the options. | ||
Line 44: | Line 64: | ||
Now, we will run the code but we are setting the job parameters using '' | Now, we will run the code but we are setting the job parameters using '' | ||
- | ===== Important | + | ===== Options ===== |
+ | ^ Option | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
- | * '' | + | You can find detailed information |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | + | ||
- | See [[http:// | + | |
===== Monitoring and Removing jobs ===== | ===== Monitoring and Removing jobs ===== | ||
To show the jobs use '' | To show the jobs use '' | ||
< | < | ||
- | To show jobs of some user use '' | + | To show jobs of some user use '' |
+ | <code shell> | ||
+ | qdel JOB_ID | ||
+ | </ | ||
+ | Moreover, you can use the following command: | ||
+ | < | ||
+ | < | ||
+ | < | ||
==== Queues ==== | ==== Queues ==== | ||
- | We have few queues '' | + | We have few queues '' |
< | < | ||
Queue Memory CPU Time Walltime Node Run Que Lm State | Queue Memory CPU Time Walltime Node Run Que Lm State | ||
---------------- ------ -------- -------- ---- --- --- -- ----- | ---------------- ------ -------- -------- ---- --- --- -- ----- | ||
+ | gpu -- -- | ||
medium | medium | ||
short -- -- | short -- -- | ||
long | long | ||
batch -- -- | batch -- -- | ||
- | verylong | + | verylong |
- | | + | AMPL -- -- |
- | 0 | + | MOSEK |
</ | </ | ||
+ | |||
+ | If you want to use AMPL or MOSEK, you have to use queue: AMPL or MOSEK, because we have limited licenses for them. | ||
+ | |||
+ | |||
You can see limits using this command '' | You can see limits using this command '' | ||
Line 96: | Line 128: | ||
| very long | 240:00:00 | | | very long | 240:00:00 | | ||
- | ==== Examples ==== | + | ===== Examples |
- | === Submitting Large Memory Job === | + | ==== Submitting |
- | Sometimes your job needs more memory. This can be achieved by '' | + | You can use the option |
- | < | + | |
- | === Running MATLAB | + | <code bash limited.sh> |
+ | qsub -l mem=4gb, | ||
+ | </ | ||
+ | |||
+ | Sometimes your job needs more memory. You can choose a larger memory size with the same option: | ||
+ | |||
+ | <code bash large.pbs> | ||
+ | |||
+ | ==== Running MATLAB | ||
You just have to create a submission job which looks like this | You just have to create a submission job which looks like this | ||
Line 116: | Line 155: | ||
</ | </ | ||
- | === Interactive Jobs === | + | <note tip>Use **-singleCompThread** [[https:// |
+ | |||
+ | ==== Running Solvers | ||
+ | |||
+ | In order to run solvers (such as Gurobi/ | ||
+ | |||
+ | < | ||
+ | |||
+ | This flag enables the solver to find necessary authentication information. | ||
+ | ==== Interactive Jobs ==== | ||
If you do not care where you run your job just use '' | If you do not care where you run your job just use '' | ||
Line 126: | Line 174: | ||
and you will be running interactive session on polyp15. | and you will be running interactive session on polyp15. | ||
- | === Using GPU's === | + | ==== Using GPU' |
Line 134: | Line 182: | ||
However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 | However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 | ||
- | === Running MPI and Parallel Jobs === | + | ==== Running MPI and Parallel Jobs ==== |
<code bash mpi.pbs> | <code bash mpi.pbs> | ||
Line 196: | Line 244: | ||
c2 | c2 | ||
</ | </ | ||
- | |||
- | ===== Advanced ===== | ||
- | |||
- | |||
- | The qsub command will pass certain environment variables in the Variable_List attribute of the job. These variables will be available to the job. The value for the following variables will be taken from the environment of the qsub command: | ||
- | * **HOME** (the path to your home directory) | ||
- | * **LANG** (which language you are using) | ||
- | * **LOGNAME** (the name that you logged in with) | ||
- | * **PATH** (standard path to excecutables) | ||
- | * **MAIL** (location of the users mail file) | ||
- | * **SHELL** (command shell, i.e bash, | ||
- | * **TZ** (time zone) | ||
- | These values will be assigned to a new name which is the current name prefixed with the string " | ||
- | * **PBS_O_HOST** (the name of the host upon which the qsub command is running) | ||
- | * **PBS_SERVER** (the hostname of the pbs_server which qsub submits the job to) | ||
- | * **PBS_O_QUEUE** (the name of the original queue to which the job was submitted) | ||
- | * **PBS_O_WORKDIR** (the absolute path of the current working directory of the qsub command) | ||
- | * **PBS_ARRAYID** (each member of a job array is assigned a unique identifier) | ||
- | * **PBS_ENVIRONMENT** (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job) | ||
- | * **PBS_JOBID** (the job identifier assigned to the job by the batch system) | ||
- | * **PBS_JOBNAME** (the job name supplied by the user) | ||
- | * **PBS_NODEFILE** (the name of the file contain the list of nodes assigned to the job) | ||
- | * **PBS_QUEUE** (the name of the queue from which the job was executed from) | ||
- | * **PBS_WALLTIME** (the walltime requested by the user or default walltime allotted by the scheduler) | ||
===== Mass Operations ===== | ===== Mass Operations ===== | ||
Line 250: | Line 274: | ||
</ | </ | ||
to cancel all of your running jobs. | to cancel all of your running jobs. | ||
- | |||
- | ===== Prerequisite ===== | ||
- | In order to extract your output and error results in Torque, you need to have password-less connection between nodes. If you have not set it once, execute the following commands. These commands create a public and private key so that when a node want to transfer a file to your home folder, it does not require the password. | ||
- | After connecting to polyps enter: | ||
<code bash> | <code bash> | ||
- | ssh-keygen -N "" | + | qselect |
</ | </ | ||
+ | will cancel all jobs (both running jobs and queue). | ||
- | Then just press ENTER for any question. After that type the following commands: | ||
- | <code bash> | + | ===== Advanced ===== |
- | touch ~/ | + | |
- | chmod 600 ~/ | + | |
- | cat ~/ | + | |
- | </ | + | |
- | Now, you will get the error log and output log files for your jobs. | + | |
+ | The qsub command will pass certain environment variables in the Variable_List attribute of the job. These variables will be available to the job. The value for the following variables will be taken from the environment of the qsub command: | ||
+ | * **HOME** (the path to your home directory) | ||
+ | * **LANG** (which language you are using) | ||
+ | * **LOGNAME** (the name that you logged in with) | ||
+ | * **PATH** (standard path to excecutables) | ||
+ | * **MAIL** (location of the users mail file) | ||
+ | * **SHELL** (command shell, i.e bash, | ||
+ | * **TZ** (time zone) | ||
+ | These values will be assigned to a new name which is the current name prefixed with the string " | ||
+ | * **PBS_O_HOST** (the name of the host upon which the qsub command is running) | ||
+ | * **PBS_SERVER** (the hostname of the pbs_server which qsub submits the job to) | ||
+ | * **PBS_O_QUEUE** (the name of the original queue to which the job was submitted) | ||
+ | * **PBS_O_WORKDIR** (the absolute path of the current working directory of the qsub command) | ||
+ | * **PBS_ARRAYID** (each member of a job array is assigned a unique identifier) | ||
+ | * **PBS_ENVIRONMENT** (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job) | ||
+ | * **PBS_JOBID** (the job identifier assigned to the job by the batch system) | ||
+ | * **PBS_JOBNAME** (the job name supplied by the user) | ||
+ | * **PBS_NODEFILE** (the name of the file contain the list of nodes assigned to the job) | ||
+ | * **PBS_QUEUE** (the name of the queue from which the job was executed from) | ||
+ | * **PBS_WALLTIME** (the walltime requested by the user or default walltime allotted by the scheduler) | ||
+ | |||
+ | |||
+ | ==== Tensorflow with GPU ==== | ||
+ | To use tensorflow with a specific GPU, say GPU 1, you can simply set | ||
+ | <code bash> | ||
+ | export CUDA_VISIBLE_DEVICES=1 | ||
+ | </ | ||
+ | and then schedule your jobs with Torque to perform experiments on GPU 1. |