User Tools

Site Tools


tutorial:torque

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
tutorial:torque [2017/06/08 22:45]
afo214
tutorial:torque [2023/12/07 09:49]
mjm519 [Queues]
Line 44: Line 44:
 #PBS -o /home/mat614/TEST.out #PBS -o /home/mat614/TEST.out
 #PBS -l nodes=1:ppn=4  #PBS -l nodes=1:ppn=4 
 +#PBS -l pmem=2GB:vmem=1GB
 #PBS -q batch #PBS -q batch
  
Line 61: Line 62:
 </code> </code>
 If you do not want to write the submission script you can do it just by calling If you do not want to write the submission script you can do it just by calling
-<code>qsub -N JobName -q batch -l nodes=1:pnn=2  myscript.sh</code>+<code>qsub -N JobName -q batch -l nodes=1:ppn=2  myscript.sh</code>
 Now, we will run the code but we are setting the job parameters using ''-'' character (e.g. ''-N JobName'') Now, we will run the code but we are setting the job parameters using ''-'' character (e.g. ''-N JobName'')
  
Line 123: Line 124:
  
 You can see limits using this command ''qstat -f -Q'' You can see limits using this command ''qstat -f -Q''
-^ Queue ^ Wall Time ^ +^ Queue      ^ Wall Time  ^ Total Max Queued Jobs      
-| batch  | 01:00:00  +| batch      | 01:00:00   | 2000                       
-| short  | 02:00:00  +| short      | 02:00:00   | 1000                       
-| medium | 04:00:00  +| medium     | 04:00:00   | 500                        
-| long  | 72:00:00  +| long       | 72:00:00   | 100                        
-| very long  | 240:00:00 |+| very long  | 240:00:00  100                        | 
 +| AMPL                  | 200                        | 
 +| MOSEK      |            | 50                         | 
 + 
 + 
 +**Note the max queued jobs can be altered as it was set artificially low to prevent flooding the queuing system and choking the scheduler so no jobs will run**
  
 ===== Examples ===== ===== Examples =====
Line 134: Line 140:
 ==== Submitting a Small or Large Memory Job ==== ==== Submitting a Small or Large Memory Job ====
  
-You can use the option ''-l mem=size,vmem=size'' to limit memory usage of your job.+You can use the option ''-l pmem=size,vmem=size'' to limit memory usage of your job.
  
 <code bash limited.sh> <code bash limited.sh>
-qsub -l mem=4gb,vmem=4gb test.pbs+qsub -l pmem=4gb,vmem=4gb test.pbs
 </code> </code>
  
 Sometimes your job needs more memory. You can choose a larger memory size with the same option: Sometimes your job needs more memory. You can choose a larger memory size with the same option:
  
-<code bash large.pbs>qsub  -l mem=20gb  test.pbs</code>+<code bash large.pbs>qsub  -l pmem=20gb  test.pbs</code> 
 + 
 +To see what resources have been assigned by the batch queuing system run the ulimit command (bash) or limit comamnd: 
 +<code bash pbs job submission command>qsub -I -l nodes=1:ppn=1 -l pmem=30GB:vmem=4GB -q short -N test -e TEST.err -o TEST.out -w e</code> 
 +<code bash ulimit>user@polyp13:~$ ulimit -a 
 +core file size          (blocks, -c) 0 
 +data seg size           (kbytes, -d) 31457280 
 +scheduling priority             (-e) 0 
 +file size               (blocks, -f) unlimited 
 +pending signals                 (-i) 128344 
 +max locked memory       (kbytes, -l) unlimited 
 +max memory size         (kbytes, -m) 31457280 
 +open files                      (-n) 65536 
 +pipe size            (512 bytes, -p) 8 
 +POSIX message queues     (bytes, -q) 819200 
 +real-time priority              (-r) 0 
 +stack size              (kbytes, -s) unlimited 
 +cpu time               (seconds, -t) unlimited 
 +max user processes              (-u) 128344 
 +virtual memory          (kbytes, -v) unlimited 
 +file locks                      (-x) unlimited</code>
  
 +**[[https://www.geeksforgeeks.org/ulimit-soft-limits-and-hard-limits-in-linux|For more information on the ulimit command review this link.]]**
 ==== Running MATLAB ==== ==== Running MATLAB ====
  
Line 152: Line 179:
 #PBS -o /home/mat614/TEST.out #PBS -o /home/mat614/TEST.out
 #PBS -l nodes=1:ppn=4  #PBS -l nodes=1:ppn=4 
 +#PBS -l pmem=2GB:vmem:1GB
 #PBS -q batch #PBS -q batch
  
Line 184: Line 212:
 However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30
  
-If you are using TensorFlow, you can set the limit on amount of GPU memory using: +If you are using TensorFlow in Python, you can set the limit on amount of GPU memory using: 
-<code>      config_tf = tf.ConfigProto() +<code>config_tf = tf.ConfigProto() 
-        config_tf.gpu_options.per_process_gpu_memory_fraction = p</code> +config_tf.gpu_options.per_process_gpu_memory_fraction = p</code> 
-in which <code>p<code> is the percent of GPU memory (a number between zero and one). +in which **//p//** is the percent of GPU memory (a number between zero and one). 
  
 ==== Running MPI and Parallel Jobs ==== ==== Running MPI and Parallel Jobs ====
Line 319: Line 347:
 </code> </code>
 and then schedule your jobs with Torque to perform experiments on GPU 1. and then schedule your jobs with Torque to perform experiments on GPU 1.
 +
 +
 +====== MOAB Scheduler ======
 +PBS Torque is used to schedule and run jobs on our cluster. Two PBS processes are required to run jobs. On the PBS server, the pbs_server process runs to accept your job and add it to the queue. It will also dispatch the job to the nodes to run under the pbs_mom process.
 +
 +
 +==== Useful MOAB Commands ====
 +1. [[https://docs.adaptivecomputing.com/maui/commands/showq.php|showq]] - Displays information about active, eligible, blocked, and/or recently completed jobs.
 +
 +2. [[https://docs.adaptivecomputing.com/maui/commands/showstart.php|showstart]] - Displays the estimated start time of a job based a number of analysis types.
 +
 +3. [[https://docs.adaptivecomputing.com/maui/commands/checkjob.php|checkjob]] - Allows end users to view the status of their own jobs.
 +
 +====Useful External Resources====
 +[[https://www.icer.msu.edu/sites/default/files/files/understand_job_scheduler_v2.pdf|MSU -Understand job scheduler and resource manager]] - Describes the batch queuing system and has some useful diagrams explaining the interrelationship between the scheduler and the server.
 +
 +[[https://wvuhpc.github.io/2019-Intro-HPC/07-jobs/index.html|WVU - Job Submission (Torque and Moab)]] - Lists frequently used commands for Torque and Moab. Also includes information on Prologue and Epilogue scripts.
 +
 +[[http://docs.adaptivecomputing.com/mwm/7-1-3/help.htm#pbsintegration.html|Moab-TORQUE/PBS Integration Guide]] - Guide for Administrators and integrators on the deployment and integration of PBS Torque and Moab into a computer system
 +
 +[[https://silas.net.br/tech/hpc/torque.html|Torque Notes]] - Information about the processes involved in using torque and debugging information.
 +
 +
tutorial/torque.txt · Last modified: 2024/02/28 13:12 by mjm519