[Contents] [Prev] [Next] [End]


Chapter 6. Submitting Batch Jobs


This chapter describes how to use the bsub command. Command options are divided into groups with related functions. Topics covered in this chapter are:

The options to the bsub command related to job checkpointing and migration are described in 'Checkpointing and Migration'.

Input and Output

When a batch job completes or exits, LSF Batch by default sends you a job report by electronic mail. The report includes the standard output (stdout) and error output (stderr) of the job. The output from stdout and stderr are merged together in the order printed, as if the job was run interactively. The default standard input (stdin) file is the null device (for UNIX systems, /dev/null).

If you want mail sent to another user, use the -u username option to the bsub command. Mail associated with the job will be sent to the named user instead of to you.

If you do not want output to be sent by mail, you can specify stdout and stderr files. You can also specify the standard input file if the job needs to read input from stdin. For example:

% bsub -q night -i job_in -o job_out -e job_err myjob

submits myjob to the night queue. The job reads its input from file job_in. Standard output is stored in file job_out, and standard error is stored in file job_err. If you specify a -o outfile argument and do not specify a -e errfile argument, the standard output and error are merged and stored in outfile.

The output file created by the -o option to the bsub command normally contains job report information as well as the job output. This information includes the submitting user and host, the execution host, the CPU time (user plus system time) used by the job, and the exit status. If you want to separate the job report information from the job output, use the -N option to specify that the job report information should be sent by email.

The output files specified by the -o and -e options are created on the execution host. See 'Remote File Access' for an example of copying the output file back to the submission host if the job executes on a file system that is not shared between the submission and execution hosts.

Resource Requirements

If you need to explicitly specify resource requirements for your job, use the -R option to the bsub command. For example:

% bsub -R "swp > 15 && hpux order[cpu]" myjob

runs myjob on an HP-UX host that is lightly loaded (CPU utilization) and has at least 15 megabytes of swap memory available. See 'Resource Requirement Strings' for a complete discussion of resource requirements.

You do not have to specify resource requirements every time you submit a job. The LSF administrator may have already configured the resource requirements for your jobs, or you can put your executable name together with its resource requirements into your personal remote task list. The bsub command automatically uses the resource requirements of the job from the remote task lists. See 'Configuring Resource Requirements' for more information about displaying task lists and putting tasks into your remote task list.

Dynamic Resource Requirements

When a job is dispatched, the system assumes that the resources that the job consumes will be reflected in the load information. However, many jobs often do not consume the resources they require when they first start. Instead, they will typically use the resources over a period of time. For example, a job requiring 100 megabytes of swap is dispatched to a host having 150 megabytes of available swap. The job starts off initially allocating 5 megabytes, gradually increasing the amount consumed to 100 megabytes over a 30-minute period. During this period, another job requiring more than 50 megabytes of swap should not be started on the same host to avoid overcommitting the resource.

Specifying Resource Reservation

When submitting a job, you can specify the amount of resources to be reserved through the resource usage section of resource requirement string argument to the bsub command. The syntax of the resource reservation in the rusage section of resource requirement string is:

res=value[:res=value]...[:res=value][:duration=value][:decay=value]

The res parameter can be any load index. The value parameter is the initial reserved amount. If res or value is not given, the default is to not reserve that resource.

The duration parameter is the time period within which the specified resources should be reserved. It is specified in minutes by default. If the value is followed by the letter 'h', it is specified in hours. For example, 'duration=30' and 'duration=2h' specify a duration of 30 minutes and two hours respectively. If duration is not specified, the default is to reserve the total amount for the lifetime of the job.

The decay parameter indicates how the reserved amount should decrease over the duration. A value of 1, 'decay=1', indicates that system should linearly decrease the amount reserved over the duration. The default decay value is 0, which causes the total amount to be reserved for the entire duration. Values other than 0 or 1 are unsupported. If duration is not specified decay is ignored.

When deciding whether to schedule a job on a host, the LSF Batch system considers the reserved resources of jobs that have previously started on that host. For each load index, the amount reserved by all jobs on that host is summed up and subtracted (or added if the index is increasing) from the current value of the resources as reported by the LIM to get amount available for scheduling new jobs:

available amount = current value - reserved amount for all jobs 

Reservation of the resources mem and swap are handled as special cases. For these resources, the run time usage is used to determine the amount to reserve (see 'Monitoring Resource Consumption of Jobs'). The reserved amount is the specified amount minus the run time usage. The duration and decay parameters are ignored for these resources.

For example:

% bsub -R "rusage[swap=50]" my_job

will reserve 50 megabytes of swap for the job.

% bsub -R "rusage[tmp=30:duration=30:decay=1]" my_job

will reserve 30 megabytes of /tmp space for the job. As the job runs, the amount reserved will decrease at approximately 1 megabyte/minute such that the reserved amount is 0 after 30 minutes.

The queue level resource requirement parameter RES_REQ may also specify the resource reservation. If a queue reserves certain amount of a resource, you cannot use the -R option of the bsub command to reserve a greater amount of that resource. For example, if the output of bqueues -l command contains:

RES_REQ: rusage[mem=40:swp=80:tmp=100]

the following submission will be rejected since the requested amount of certain resource(s) exceeds queue's specification:

% bsub -R "rusage[mem=50:swp=100]" my_job

Viewing Reserved Resources

The amount of resources reserved on each host can be viewed through the -l option of the bhosts command.

Host Selection

If you want to restrict the set of candidate hosts for running your batch job, use the -m option to bsub.

% bsub -q idle -m "hostA hostD hostB" myjob

This command submits myjob to the idle queue and tells LSF Batch to choose one host from hostA, hostD and hostB to run the job. All other LSF Batch scheduling conditions still apply, so the selected host must be eligible to run the job.

If you have applications that need specific resources, it is more flexible to create a new boolean resource and configure that resource for the appropriate hosts in the LSF cluster. This must be done by the LSF administrator. If you specify a host list using the -m option to bsub, you must change the host list every time you add a new host that supports the desired resources. By using a boolean resource, the LSF administrator can add, move or remove resources without forcing users to learn about changes to resource configuration.

Host Preference

When several hosts can satisfy the resource requirements of a job, the hosts are ordered by load. However, in certain situations it may be desirable to override this behaviour to give preference to specific hosts, even if they are more heavily loaded.

For example, you may have licensed software which runs on different groups of hosts, but prefer to run on a particular host group because the jobs will finish faster, thereby freeing the software license to be used by other jobs.

Another situation arises in clusters consisting of dedicated batch servers and desktop machines which can also run jobs when no user is logged in. You may prefer to run on the batch servers and only use the desktop machines if no server is available.

The -m option of the bsub command allows you to specify preference by using '+' after the hostname. The special hostname, others, can be used to refer to all the hosts that are not explicitly listed. For example:

% bsub -R "solaris && mem> 10" -m "hostD+ others" myjob

will select all solaris hosts having more than 10 megabytes of memory available. If host hostD satisfies this criteria, it will be picked over any other host which otherwise meets the same criteria. If hostD does not satisfy the criteria, the least loaded host among the others will be selected. All the other hosts are considered as a group and are ordered by load.

You can specify different levels of preference by specifying a number after the '+'. The larger the number, the higher the preference for that host or host group. For example:

% bsub -m "groupA+2 groupB+1 groupC" myjob

gives first preference to hosts in groupA, second preference to hosts in groupB and last preference to those in groupC. The ordering within a group is still determined by the load. You can use the bmgroup command to display the host groups configured in the system.

Note
A queue may also define the host preference for jobs via HOSTS parameter. The queue specification is ignored if a job specifies its own preference.

You can also exclude a host by specifying a resource requirement using hname resource:

% bsub -R "hname!=hostb && type==sgi6" myjob

Resource Limits

Resource limits are constraints you or your LSF administrator can specify to limit the use of resources. Jobs that consume more than the specified amount of a resource are signalled or have their priority lowered.

Resource limits can be specified either at the queue level by your LSF administrator or at the job level when you submit a job. Resource limits specified at the queue level are hard limits while those specified with job submission are soft limits. See setrlimit(2) man page for concepts of hard and soft limits.

The following resource limits can be specified to the bsub command:

-c cpu_limit[/host_spec]
Set the soft CPU time limit to cpu_limit for this batch job. The default is no limit. This option is useful for preventing erroneous jobs from running away, or to avoid using up too many resources. A SIGXCPU signal is sent to all processes belonging to the job when it has accumulated the specified amount of CPU time. If the job has no signal handler for SIGXCPU, this causes it to be killed. LSF Batch keeps track of the CPU time used by all processes of the job.
cpu_limit is in the form [hour:]minute, where minute can be greater than 59. So, 3.5 hours can either be specified as 3:30 or 210. The CPU limit is scaled by the host CPU factors of the submitting and execution hosts. This is done so that the job does approximately the same amount of processing for a given CPU limit, even if it is sent to a host with a faster or slower CPU. For example, if a job is submitted from a host with a CPU factor of 2 and executed on a host with a CPU factor of 3, the CPU time limit is multiplied by 2/3 because the execution host can do the same amount of work as the submission host in 2/3 of the time.
The optional host_spec specifies a host name or a CPU model name defined by LSF. The lsinfo command displays CPU model information. If host_spec is not given, the CPU limit is scaled based on the DEFAULT_HOST_SPEC shown by the bparams -l command. (If DEFAULT_HOST_SPEC is not defined, the fastest batch host in the cluster is used as the default.) If host_spec is given, the appropriate CPU scaling factor for the specified host or CPU model is used to adjust the actual CPU time limit at the execution host. The following example specifies that myjob can run for 10 minutes on a DEC3000 host, or the corresponding time on any other host:
bsub -c 10/DEC3000 myjob 
-W run_limit[/host_spec]
Set the wall-clock run time limit of this batch job. The default is no limit. If the accumulated time the job has spent in RUN state exceeds this limit, the job is sent a USR2 signal. If the job does not terminate within 10 minutes after being sent this signal, it is killed. run_limit and host_spec have the same format as the argument to the bsub -c option.
-F file_limit
Set a per-process (soft) file size limit for each process that belongs to this batch job. If a process of this job attempts to write to a file such that the file size would increase beyond file_limit kilobytes, the kernel sends that process a SIGXFSZ signal. This condition normally terminates the process, but may be caught. The default is no soft limit.
-D data_limit
Set a per-process (soft) data segment size limit for each process that belongs to this batch job. An sbrk() or malloc() call to extend the data segment beyond data_limit kilobytes returns an error. The default is no soft limit.
-S stack_limit
Set a per-process (soft) stack segment size limit for each process that belongs to this batch job. An sbrk() call to extend the stack segment beyond stack_limit kilobytes causes the process to be terminated. The default is no soft limit.
-C core_limit
Set a per-process (soft) core file size limit for each process that belongs to this batch job. On some systems, no core file is produced if the image for the process is larger than core_limit kilobytes. On other systems only the first core_limit kilobytes of the image are dumped. The default is no soft limit.
-M mem_limit
Set the per-process (soft) process resident set size limit to mem_limit kilobytes for all processes that belong to this batch job. Exceeding this limit when free physical memory is in short supply results in a low scheduling priority being assigned to the process. That is, the process is reniced. The default is no soft limit. On HP-UX and Sun Solaris 2.x, a resident set size limit cannot be set, so this option has no effect.

Pre-execution Commands

Some batch jobs require resources that LSF does not directly support. For example, a batch job may need to reserve a tape drive or check for the availability of a software license.

The -E pre_exec_command option to the bsub command specifies an arbitrary command to run before starting the batch job. When LSF Batch finds a suitable host on which to run a job, the pre-execution command is executed on that host. If the pre-execution command runs successfully, the batch job is started.

An alternative to using the -E pre_exec_command option is for the LSF administrator to set up a queue level pre-execution command. See 'Queue-Level Pre-/Post-Execution Commands' of the LSF Administrator's Guide for more information.

The standard input, output and error files for the pre-execution command are opened to the same files as for the job. Standard input and output from the pre-execution command cannot be redirected.

The pre-execution command is run under the same user ID, environment, and home and working directories as the batch job. If the pre-execution command is not in your normal execution path, the full path name of the command must be specified.

For parallel batch jobs, the pre-execution command is run on the first selected host.

The pre-execution command returns information to LSF Batch using the exit status. If the pre-execution command exits with non-zero status, the batch job is not dispatched. The job goes back to the PEND state, and LSF Batch tries to dispatch another job to that host. The next time LSF Batch tries to dispatch jobs this process is repeated.

LSF Batch assumes that the pre-execution command runs without side effects. For example, if the pre-execution command reserves a software license or other resource, you must take care not to reserve the same resource more than once for the same batch job.

The following example shows a batch job that requires a tape drive. The tapeCheck program is a site specific program that exits with status zero if the specified tape drive is ready, and one otherwise:

% bsub -E "/usr/local/bin/tapeCheck /dev/rmt0l" myjob

Job Dependencies

Some batch jobs depend on the results of other jobs. For example, a series of jobs could process input data, run a simulation, generate images based on the simulation output, and finally, record the images on a high-resolution film output device. Each step can only be performed when the previous step completes and all subsequent steps must be aborted if any step fails.

The -w depend_cond option to the bsub command specifies a dependency condition, which is a logical expression based on the execution states of preceding batch jobs. When the depend_cond expression evaluates to TRUE, the batch job can be started. Complex conditions can be written using the logical operators '&&' (AND), '||' (OR), '!' (NOT) and parentheses '()'.

If there is a space character, a logic operator or parentheses in the expression string, the string must be enclosed in single or double quotes (' or ") to prevent the shell from interpreting the special characters.

Batch jobs are identified by job ID number or job name. The job ID number is displayed by the bsub command when the job is submitted. The job name is a string specified by the -J job_name option.

In job dependency expressions, numeric job names must be enclosed in quotes.

Job names refer to jobs submitted by the same user. If more than one of your jobs has the same name, the condition is tested on the last job submitted with that name.

A wildcard character '*' can be specified at the end of a job name to indicate all jobs matching the name. For example, jobA* will match jobA, jobA1, jobA_test, jobA.log etc. There must be at least one match.

The conditions that can be tested are:

started(jobID | jobName)
If the specified batch job has started running or has run to completion, the condition is TRUE; that is, the job is not in the PEND or PSUSP state, and also is not currently running the pre-execution command if the bsub -E option was specified.
done(jobID | jobName)
If the specified batch job has completed successfully and is in the DONE state, the condition is TRUE, otherwise FALSE.
exit(jobID | jobName)
If the specified batch job has terminated abnormally and is in the EXIT state, the condition is TRUE, otherwise FALSE.
ended(jobID | jobName)
If the specified batch job has finished (either in the EXIT or DONE state), the condition is TRUE, otherwise FALSE.

Specifying only jobID or jobName is equivalent to done(jobID | jobName). Note that a numeric job name should be doubly quoted, e.g. -w "'210'", since the Unix shell treats -w "210" the same as -w 210.

If any one of the depended batch jobs is not found, bsub fails and the job is not submitted.

The following are examples of job dependency conditions:

done(312) && (started(Job2)||exit(Job3))

The submitted job will not start until job 312 has completed successfully, and either the job named Job2 has started or the job named Job3 has terminated abnormally.

1532 || jobName2 || ended(jobName3*)

The submitted job will not start until either job 1532 has completed, the job named jobName2 has completed, or all jobs with names beginning with jobName3 have finished.

Note
If you require more extensive dependencies, for example, calendar or event dependencies, you may want to examine the LSF JobScheduler component of LSF Suite. See the LSF JobScheduler User's Guide for further information.

Remote File Access

LSF is usually used in networks with shared file space. When shared file space is not available, LSF can copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes.

The -f "[lfile op [rfile]]" option to the bsub command copies a file between the submission host and the execution host. lfile is the file name on the submission host, and rfile is the name on the execution host. op is the operation to perform on the file. lfile and rfile can be absolute or relative file path names. If one of the files is not specified, it defaults to the other, which must be given.

The -f option may be repeated to specify multiple files.

op must be surrounded by white space. The possible values for op are:

>
lfile on the submission host is copied to rfile on the execution host before job execution. rfile is overwritten if it exists
<
rfile on the execution host is copied to lfile on the submission host after the job completes. lfile is overwritten if it exists
<<
rfile is appended to lfile after the job completes. lfile is created if it does not exist
><, <>
equivalent to performing the > and then the < operation. lfile is copied to rfile before the job executes, and rfile is copied back (replacing the previous lfile) after the job completes. '<>' is the same as '><'

You must include lfile with op, otherwise it will result in a syntax error. When rfile is not given, it is assumed to be the same as lfile.

If the input file specified with the -i argument to bsub is not found on the execution host, the file is copied from the submission host using LSF's remote file access facility and is removed from the execution host after the job finishes.

The output files specified with the -o and -e arguments to bsub are created on the execution host, and are not copied back to the submission host by default. You can use the remote file access facility to copy these files back to the submission host if they are not on a shared file system. For example, the following command stores the job output in the job_out file and copies the file back to the submission host:

% bsub -o job_out -f 'job_out <' myjob

If the submission and execution hosts have different directory structures, you must ensure that the directory where rfile and lfile will be placed exists. LSF tries to change the directory to the same path name as the directory where the bsub command was run. If this directory does not exist, the job is run in your home directory on the execution host.

You should specify rfile as a file name with no path when running in non-shared file systems; this places the file in the job's current working directory on the execution host. This way the job will work correctly even if the directory where the bsub command is run does not exist on the execution host. Be careful not to overwrite an existing file in your home directory.

For example, to submit myjob to LSF Batch, with input taken from the file /data/data3 and the output copied back to /data/out3, run the command:

% bsub -f "/data/data3 > data3" -f "/data/out3 < out3" myjob data3 out3

To run the job batch_update, which updates the batch_data file in place, you need to copy the file to the execution host before the job runs and copy it back after the job completes:

% bsub -f "batch_data <>" batch_update batch_data

LSF Batch uses the lsrcp(1) command to transfer files. lsrcp contacts the RES on the remote host to perform the file transfer. If the RES is not available, rcp(1) is used. Because LSF client hosts do not run the RES daemon, jobs that are submitted from client hosts should only specify the -f option to bsub if rcp is allowed. You must set up the permissions for rcp if account mapping is used.

Start and Termination Time

If you do not want LSF Batch to start your job immediately, use the bsub -b option to specify the time after which the job should be dispatched.

% bsub -b 5:00 myjob

The submitted job remains pending until after the local time on the LSF master host reaches 5 A.M. You can also specify a time after which the job should be terminated with the -t option to bsub. The command

% bsub -b 11:12:5:40 -t 11:12:20:30 myjob

submits myjob to the default queue to start after November 12 at 05:40 A.M. If the job is still running on Nov 12 at 8:30 P.M., it is killed.

Parallel Jobs

LSF Batch can allocate more than one host or processor to run a job and automatically keeps track of the job status, while a parallel job is running. To submit a parallel job, use the -n option of bsub:

% bsub -n 10 lsmake

This command submits lsmake as a parallel job. The job is started when 10 job slots are available.

For parallel jobs, LSF Batch only starts one controlling process for the batch job. This process is started on the first host in the list of selected hosts. The controlling process is responsible for starting the actual parallel components on all the hosts selected by LSF Batch.

LSF Batch sets a number of environment variables for each batch job. The variable LSB_JOBID is set to the LSF Batch job ID number as printed by bsub. The LSB_HOSTS variable is set to the names of the hosts running the batch job. For a sequential job, LSB_HOSTS is set to a single host name. For a parallel batch job, LSB_HOSTS contains the complete list of hosts that LSF Batch has allocated to that job. Parallel batch jobs must get the list of hosts from the LSB_HOSTS variable and start up all of the job components on the allocated hosts.

In the lsmake example above, LSF Batch starts lsmake on the first host. lsmake reads the LSB_HOSTS environment variable to get the list of hosts and uses the RES to execute subtasks on those hosts.

LSF includes scripts for running PVM, P4, and MPI parallel programs as batch jobs. See 'Parallel Jobs' and the pvmjob(1), p4job(1), and mpijob(1) manual pages for more information.

The following features support parallel jobs running through the LSF Batch system.

Minimum and Maximum Number of Processors

When submitting a parallel job that requires multiple processors, you can specify the minimum number and maximum number of the processors using -n option to the bsub command. The syntax of the -n option is:

bsub -n min_proc[,max_proc] <other bsub options>

If max_proc is not specified then it is assumed to be equal to min_proc. For example:

% bsub -n 4,16 myjob

At most, 16 processors can be allocated to this job. If there are less than 16 processors eligible to run the job, this job can still be started as long as the number of eligible processors is greater than 4. Once the job gets started, no more processors will be allocated to it even though more may be available later on.

If the specified maximum number is greater than the value of PROCLIMIT defined for the queue to which the job is submitted, the job will be rejected.

Specifying Locality

Sometimes you need to control how the selected processors for a parallel job are distributed across the hosts in the cluster. You are able to specify "select all the processors for this parallel batch job on the same host", or "do not chose more than one processor on one host" by using the span section in the -R option string. For example:

% bsub -n 4 -R "span[hosts=1]" my_job

This job should be dispatched to a multiprocessor that has at least 4 processors currently eligible to run the 4 components of this job.

% bsub -n 4 -R "span[ptile=1]" my_job

This job should be dispatched to 4 hosts even though some of the 4 hosts may have more than one processor currently available.

Note
The queue may also define the locality for parallel jobs using RES_REQ parameter. The queue specification is ignored if your job specifies its own locality.

Processor Reservation

The scheduling of parallel jobs supports the notion of processor reservation. Parallel jobs requiring a large number of processors can often not be started if there are many lower priority sequential jobs in the system. There may not be enough resources at any one instant to satisfy a large parallel job, but there may be enough to allow a sequential job to be started. With the processor reservation feature the problem of starvation of parallel jobs can be reduced.

When a parallel job cannot be dispatched because there aren't enough execution slots to satisfy its minimum processor requirements, the currently available slots will be reserved for the job. These reserved job slots are accumulated until there are enough available to start the job. When a slot is reserved for a job it is unavailable to any other job.

To use this feature, a queue must have processor reservation policy enabled through the SLOT_RESERVE parameter (see 'Processor Reservation for Parallel Jobs' of the LSF Administrator's Guide). To avoid deadlock situations, the period of reservation is specified through the MAX_RESERVE_TIME parameter. The system will accumulate reserved slots for a job until MAX_RESERVE_TIME minutes and if an insufficient number have been accumulated, all slots are freed and made available to other jobs. The MAX_RESERVE_TIME parameter takes effect from the start of the first reservation for a job and a job can go through multiple reservation cycles before it accumulates enough slots to be actually started.

Reserved slots can be displayed with the bjobs command. The number of reserved slots can be displayed with the bqueues, bhosts, bhpart, and busers commands. Look in the RSV column.

Re-initializing Job Environment on the Execution Host

By default LSF Batch copies the environment of the job from the submission host when the job is submitted. The environment is recreated on the execution host when the job is started. This is convenient, in many cases, because the job runs as if it were run interactively on the submission host.

There are cases where you want to use a platform specific or host specific environment to run the job, rather than using the same environment as on the submission host. For example, you may want to set up different search paths on the execution host.

The -L shell option to the bsub command causes LSF Batch to emulate a login on the execution host before starting your job. This makes sure that the login start-up files (.profile for /bin/sh, or .cshrc and .login for /bin/csh) are sourced before the job is started. The shell argument specifies the login shell to use.

% bsub -L /a/b/shell myjob
Job <1234> is submitted to default queue <normal>.

This tells LSF Batch to use /a/b/shell as the login shell to reinitialize the environment.

Note:
This does not affect the shell under which the job is run. When a login shell is specified with the -L shell option to the bsub command, that shell is only used as a login shell to set the environment. The job is run using /bin/sh, unless you specify otherwise as described in 'Running a Job Under a Particular Shell'. For example, if your job script is written in /bin/sh and your regular login shell is /bin/csh, you can run your job under /bin/sh but use /bin/csh to reinitialize the job environment by sourcing your .cshrc and .login files.

Other bsub Options

This section lists some other bsub options. For details on these options see the bsub(1) manual page.

-x
The job must run exclusively on a host. The job is started on a host that has no other LSF Batch jobs running on it. The host is locked (status lockU) while this job is running so that no other LSF jobs are sent to the host.
-r
Specify that the job is rerunnable. See 'Automatically Rerunning and Restarting Jobs'.
-B
Send email to the job submitter when the job begins executing.
-I
An interactive batch job is submitted to the LSF Batch system. See 'Interactive Batch Job Support' for more details.
-k "checkdir[ interval ]"
Specify the checkpoint directory and interval. See 'Submitting Checkpointable Jobs'.
-P project
Associate a project name with a job. Project names are logged in the lsb.acct file and you can use the bacct command to gather accounting information on a per-project basis.
On systems running IRIX 6, before the submitted job begins execution, a new array session is created and the project Id corresponding to the project name is assigned to the session.

Job Scripts

If bsub is run without giving a command to submit, it reads job command lines from the standard input. If the standard input is a controlling terminal, you are prompted with bsub> for each line. For example:

% bsub -q simulation
bsub> cd /work/data/myhomedir
bsub> myjob arg1 arg2 ......
bsub> rm myjob.log
bsub> ^D
Job <1234> submitted to queue <simulation>.

In this case, the three command lines are submitted to LSF Batch and run as a /bin/sh script. Note that only valid /bin/sh command lines are acceptable in this case. Here is another example:

% bsub -q simulation < command_file
Job <1234> submitted to queue <simulation>.

command_file must contain /bin/sh command lines.

On NT systems, commands must be specified using batch file (BAT) syntax. For example:

C:\> bsub -q simulation
bsub> cd \\server\data\myhomedir
bsub> myjob arg1 arg2 ......
bsub> del myjob.log
bsub> ^Z
Job <1234> submitted to queue <simulation>.

Embedded Submission Options

You can specify job submission options in the script read from the standard input by the bsub command using lines starting with '#BSUB':

% bsub -q simulation
bsub> #BSUB -q test
bsub> #BSUB -o outfile -R "mem>10"
bsub> myjob arg1 arg2
bsub> #BSUB -J simjob
bsub> ^D
Job <1234> submitted to queue <simulation>.

There are a few things to note:

As a second example, you can redirect a script to the standard input of the bsub command:

% bsub < myscript
Job <1234> submitted to queue <test>.

The myscript file contains job submission options as well as command lines to execute. When the bsub command reads a script from its standard input, the script file is actually spooled by the LSF Batch system; therefore, the script can be modified right after bsub returns for the next job submission. When the script is specified on the bsub command line, the script is not spooled:

% bsub myscript
Job <1234> submitted to default queue <normal>.

In this case the command line myscript is spooled by LSF Batch, instead of the contents of the myscript file. Later modifications to the myscript file can affect the job's behaviour.

Note
The bsub command interprets embedded options only if the script is supplied as the stdin of its command line. When the script is specified on the bsub command line, as is the case with the above example, the options embedded in the script file are ignored.

Running a Job Under a Particular Shell

By default, LSF runs job scripts using the /bin/sh shell. You can specify the shell under which the job is run. This is done by specifying an interpreter in the first line of the script.

% bsub
bsub> #!/bin/csh -f
bsub> set coredump=`ls |grep core`
bsub> if ( "$coredump" != "") then
bsub> mv core core.`date | cut -d" " -f1`
bsub> endif
bsub> myjob
bsub> ^D
Job <1234> is submitted to default queue <normal>.

The bsub command must read the job script from the standard input to set the execution shell.

If you do not specify a shell in the script, the script is run using /bin/sh. If the first line of the script starts with a '#' not immediately followed by a '!', then /bin/csh is used to run the job. For example:

% bsub
bsub> # This is a comment line. This tells the system to use /bin/csh to
bsub> # interpret the script.
bsub>
bsub> setenv DAY `date | cut -d" " -f1`
bsub> myjob
bsub> ^D
Job <1234> is submitted to default queue <normal>.

If running jobs under a particular is a system wide or queue wide requirements, you can ask your system administrator to configure the shell as the job starter of your queue. You can find out if your queue has a job starter configured or not by running bqueues -l command.

See 'Using A Job Starter' of the LSF Administrator's Guide for more details.

Submitting Jobs Using xbsub

LSF Batch provides a GUI for submitting jobs. The main window of xbsub is shown in 'Figure 3. xbsub Job Submission Window'. All the job submission options can be selected using xbsub.

Detailed parameters can be set by clicking the 'Advanced' button. Figure 10 shows the resulting window.

Figure 10. Advanced Parameters of xbsub

Advanced Parameters of xbsub


[Contents] [Prev] [Next] [End]

doc@platform.com

Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.