This is an old revision of the document!
CONDOR is a job manager to schedule computational jobs.
Check the following link for an introduction to CONDOR.
To submit a job via CONDOR, you need to create a .sub file. This .sub file must include a program that you will execute (e.g., matlab, cplex, etc.) along with the arguments for the program (such as your file to be executed). It's an automated way to run programs.
Provide your files here to show different usage of programs!
Here is an example .sub file which submits the matlab file 'test.m' to condor for running:
# Specify the executable software, i.e. mathematica, mosek, etc Executable = /usr/local/matlab/latest/bin/matlab Universe = vanilla getenv = true # Specify argument file arguments = -nosplash -nodesktop -logfile test.log -r test #request_cpus = 16 #request_memory = 2 # name output file output = ./out.txt # name error file error = ./error.txt #name log file log = ./log.txt transfer_executable = false # Submit to queue Queue
After making sure all the files you specified exists in the correct directory, use
condor_submit myexp.sub
to submit the file to condor.
You can use the same executable, options, etc. and change some of them to create new jobs. Then when you submit your file using condor_submit
, it will put all of them at the same time.
For your experiments, you can create a script to generate multiple jobs. Below, you will find an example Python script that generates multiple experiments with a changing argument.
# This create.py script search the data folder and # create condor submission file (condor.sub) for same problem with different arguments # Open file and write common part cfile = open('condor.sub','w') common_command = \ 'Executable = ../test/portfolio \n\ Universe = vanilla\n\ getenv = true\n\ transfer_executable = false \n\n' cfile.write(common_command) # Loop over various values of an argument and create different output file for each # Then put it in the queue for a in xrange(5,8): run_command = \ 'arguments = -a %d\n\ output = out.%d.txt\n\ queue 1\n\n' %(a,a) cfile.write(run_command)
This script will generate the following condor file
Executable = ../test/portfolio Universe = vanilla getenv = true transfer_executable = false arguments = -a 5 output = out.5.txt queue 1 arguments = -a 6 output = out.6.txt queue 1 arguments = -a 7 output = out.7.txt queue 1
To check the job progress, use command
condor_q -global #this checks all the jobs on condor
condor_q -run #this checks all running jobs
condor_q userid #this checks all jobs under specific user name
First find the ID of the job you will terminate
condor_q userid
Then call
condor_rm ID
Example:
I call condor_q sec312
to list all jobs belong to my username. This gives a list similar to this
-- Submitter: polyp1.ie.lehigh.edu : <128.180.35.200:50671> : polyp1.ie.lehigh.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 42989.0 sec312 10/25 19:56 0+00:00:29 R 0 0.0 symphony -F air04. 42989.1 sec312 10/25 19:56 0+00:00:29 R 0 0.0 symphony -F air05. 42989.5 sec312 10/25 19:56 0+00:00:28 R 0 0.0 symphony -F dsbmip
Now let say I want to terminate 42989.5. I call condor_rm 42989.5
. CONDOR confirms by saying
Job 42989.5 marked for removal
You can remove all your jobs using command condor_rm username
.
A summary of frequently used commands in CONDOR:
Command | Action | Basic Usage | Example |
---|---|---|---|
condor_submit | submit a job | condor_submit [submit file] | $ condor_submit job.condor |
condor_q | show status of jobs | condor_q [cluster] | $ condor_q 1170 |
condor_rm | remove jobs from the queue | condor_rm [cluster] | $ condor_rm 1170 |
condor_rm userid | remove all jobs of user |
To submit MPI jobs to our condor pool you can check Dr. Takac's MPI tutorial