[Contents] [Prev] [Next] [End]


Chapter 6. Managing LSF Batch


This chapter describes the operating concepts and maintenance tasks of the batch queuing system, LSF Batch. This chapter requires concepts from 'Managing LSF Base'. The topics covered in this chapter are:

Managing LSF Batch Logs

Managing error log files for LSF Batch daemons was described in 'Managing Error Logs'. In this section discusses the other important log files LSF Batch daemons produce. The LSF Batch log files are found in the directory LSB_SHAREDIR/cluster/logdir.

LSF Batch Accounting Log

Each time a batch job completes or exits, an entry is appended to the lsb.acct file. This file can be used to create accounting summaries of LSF Batch system use. The bacct(1) command produces one form of summary. The lsb.acct file is a text file suitable for processing with awk, perl, or similar tools. See the lsb.acct(5) manual page for details of the contents of this file. Additionally, the LSF Batch API supports calls to process the lsb.acct records. See the LSF Programmer's Guide for details of LSF Batch API.

You should move the lsb.acct file to a backup location, and then run your accounting on the backup copy. The daemon automatically creates a new lsb.acct file to replace the moved file. This prevents problems that might occur if the daemon writes new log entries while the accounting programs are running. When the accounting is complete, you can remove or archive the backup copy.

LSF Batch Event Log

The LSF Batch daemons keep an event log in the lsb.events file. The mbatchd daemon uses this information to recover from server failures, host reboots, and LSF Batch reconfiguration. The lsb.events file is also used by the bhist command to display detailed information about the execution history of batch jobs, and by the badmin command to display the operational history of hosts, queues and LSF Batch daemons.

For performance reasons, the mbatchd automatically backs up and rewrites the lsb.events file after every 1000 batch job completions (this is the default; the value is controlled by the MAX_JOB_NUM parameter in the lsb.param file). The old lsb.events file is moved to lsb.events.1, and each old lsb.events.n file is moved to lsb.events.n+1. The mbatchd never deletes these files. If disk storage is a concern, the LSF administrator should arrange to archive or remove old lsb.events.n files occasionally.

CAUTION!
Do not remove or modify the lsb.events file. Removing or modifying the lsb.events file could cause batch jobs to be lost.

Controlling LSF Batch Servers

The lsadmin command is used to control LSF Base daemons, LIM and RES. LSF Batch has the badmin command to perform similar operations on LSF Batch daemons.

LSF Batch System Status

To check the status of LSF Batch server hosts and queues, use the bhosts and bqueues commands:

% bhosts
HOST_NAME          STATUS    JL/U  MAX  NJOBS  RUN  SSUSP USUSP  RSV
hostA                ok        2     1     0     0     0     0     0
hostB              closed      2     2     2     2     0     0     0
hostD                ok        -     8     1     1     0     0     0
% bqueues
QUEUE_NAME     PRIO      STATUS     MAX  JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
night           30    Open:Inactive  -     -    -    -    4     4     0    0
short           10    Open:Active    50    5    -    -    1     0     1    0
simulation      10    Open:Active    -     2    -    -    0     0     0    0
default          1    Open:Active    -     -    -    -    6     4     2    0

If the status of a batch server is 'closed', then it will not accept more jobs. A server host can become closed if one of the following conditions is true:

An inactive queue will accept new job submissions, but will not dispatch any new jobs. A queue can become inactive if the LSF cluster administrator explicitly inactivates it via badmin command, or if the queue has a dispatch or run window defined and the current time is outside the time window.

mbatchd automatically logs the history of the LSF Batch daemons in the LSF Batch event log. You can display the administrative history of the batch system using the badmin command.

The badmin hhist command displays the times when LSF Batch server hosts are opened and closed by the LSF administrator.

The badmin qhist command displays the times when queues are opened, closed, activated, and inactivated.

The badmin mbdhist command displays the history of the mbatchd daemon, including the times when the master starts, exits, reconfigures, or changes to a different host.

The badmin hist command displays all LSF Batch history information, including all the events listed above.

Remote Start-up of sbatchd

You can use badmin hstartup command to start sbatchd on some or all remote hosts from one host:

% badmin hstartup all
Start up slave batch daemon on <hostA> ......done
Start up slave batch daemon on <hostB> ......done
Start up slave batch daemon on <hostD> ......done

Note that you do not have to be root to use the badmin command to start LSF Batch daemons.

For remote startup to work, file /etc/lsf.sudoers has to be set up properly and you have to be able to run rsh across all LSF hosts without having to enter a password. See 'The lsf.sudoers File' for configuration details of lsf.sudoers.

Restarting sbatchd

mbatchd is restarted by the badmin reconfig command. sbatchd can be restarted using the badmin hrestart command:

% badmin hrestart hostD
Restart slave batch daemon on <hostD> ...... done

You can specify more than one host name to restart sbatchd on multiple hosts, or use 'all' to refer to all LSF Batch server hosts. Restarting sbatchd on a host does not affect batch jobs that are running on that host.

Shutting Down LSF Batch Daemons

The badmin hshutdown command shuts down the sbatchd.

% badmin hshutdown hostD
Shut down slave batch daemon on <hostD> .... done

If sbatchd is shutdown, that particular host will not be available for running new jobs. Existing jobs running on that host will continue to completion, but the results will not be sent to the user until sbatchd is later restarted.

To shut down mbatchd you must first use the badmin hshutdown command to shut down the sbatchd on the master host, and then run the badmin reconfig command. The mbatchd is normally restarted by sbatchd; if there is no sbatchd running on the master host, badmin reconfig causes mbatchd to exit.

If mbatchd is shut down, all LSF Batch service will be temporarily unavailable. However all existing jobs will not be affected. When mbatchd is later restarted, previous status will be restored from the event log file and job scheduling will continue.

Opening and Closing of Batch Server Hosts

Occasionally you may want to drain a batch server host for purposes of rebooting, maintenance, or host removal. This can be achieved by running the badmin hclose command:

% badmin hclose hostB
Close <hostB> ...... done

When a host is open, LSF Batch can dispatch jobs to it. When a host is closed no new batch jobs are dispatched, but jobs already dispatched to the host continue to execute. To reopen a batch server host, run badmin hopen command:

% badmin hopen hostB
Open <hostB> ...... done

To view the history of a batch server host, run badmin hhist command:

% badmin hhist hostB
Wed Nov 20 14:41:58: Host <hostB> closed by administrator <lsf>.
Wed Nov 20 15:23:39: Host <hostB> opened by administrator <lsf>.

Controlling LSF Batch Queues

Each batch queue can be open or closed, active or inactive. Users can submit jobs to open queues, but not to closed queues. Active queues start jobs on available server hosts, and inactive queues hold all jobs. The LSF administrator can change the state of any queue. Queues may also become active or inactive because of queue run or dispatch windows.

bqueues --- Queue Status

The current status of a particular queue or all queues is displayed by the bqueues(1) command. The bqueues -l option also gives current statistics about the jobs in a particular queue such as the total number of jobs in this queue, the number of jobs running, suspended, etc.

% bqueues normal
QUEUE_NAME     PRIO      STATUS      MAX  JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
normal          30    Open:Active      -    -    -    2     6     4    2     0

Opening and Closing Queues

When a batch queue is open, users can submit jobs to the queue. When a queue is closed, users cannot submit jobs to the queue. If a user tries to submit a job to a closed queue, an error message is printed and the job is rejected. If a queue is closed but still active, previously submitted jobs continue to be processed. This allows the LSF administrator to drain a queue.

% badmin qclose normal
Queue <normal> is closed
% bqueues normal
QUEUE_NAME     PRIO      STATUS      MAX  JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
normal          30   Closed:Active     -    -    -    2     6     4    2     0
% bsub -q normal hostname
normal: Queue has been closed
% badmin qopen normal
Queue <normal> is opened

Activating and Inactivating Queues

When a queue is active, jobs in the queue are started if appropriate hosts are available. When a queue is inactive, jobs in the queue are not started. Queues can be activated and inactivated by the LSF administrator using the badmin qact and badmin qinact commands, or by configured queue run or dispatch windows.

If a queue is open and inactive, users can submit jobs to this queue but no new jobs are dispatched to hosts. Currently running jobs continue to execute. This allows the LSF administrator to let running jobs complete before removing queues or making other major changes.

% badmin qinact normal
Queue <normal> is inactivated
% bqueues normal
QUEUE_NAME     PRIO      STATUS      MAX  JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
normal          30   Open:Inctive      -    -    -    -     0     0     0     0
% badmin qact normal
Queue <normal> is activated

Managing LSF Batch Configuration

The LSF Batch cluster is a subset of the LSF Base cluster. All servers used by LSF Batch must belong to the base cluster, however not all servers in the base cluster must provide LSF Batch services.

LSF Batch configuration consists of four files: lsb.params, lsb.hosts, lsb.users, and lsb.queues. These files are stored in LSB_CONFDIR/cluster/configdir, where cluster is the name of your cluster.

All these files are optional. If any of these files does not exist, LSF Batch will assume a default configuration.

The lsb.params file defines general parameters about LSF Batch system operation such as the name of the default queue when the user does not specify one, scheduling intervals for mbatchd and sbatchd, etc. Detailed parameters are described in 'The lsb.params File'.

The lsb.hosts file defines LSF Batch server hosts together with their attributes. Not all LSF hosts defined by LIM configuration have to be configured to run batch jobs. Batch server host attributes include scheduling load thresholds, dispatch windows, job slot limits, etc. This file is also used to define host groups and host partitions. See 'The lsb.hosts File' for details of this file.

The lsb.users file contains user-related parameters such as user groups, user job slot limits, and account mapping. See 'The lsb.users File' for details.

The lsb.queues file define job queues. Numerous controls are available at queue level to allow cluster administrators to customize site resource allocation policies. See 'The lsb.queues File' for more details.

When you first install LSF on your cluster, some example queues are already configured for you. You should customize these queues or define new queues to meet your site need.

Note
After changing any of the LSF Batch configuration files, you need to run badmin reconfig to tell mbatchd to pick up the new configuration. You also must run this every time you change LIM configuration.

Adding a Batch Server Host

You can add a batch server host to LSF Batch configuration following the steps below:

Step 1.
If you are adding a host that has not been added to the LSF Base cluster yet, do steps described in 'Adding a Host to a Cluster'.
Step 2.
Modify LSB_CONFDIR/cluster/configdir/lsb.hosts file to add the new host together with its attributes. If you want to limit the added host for use only by some queues, you should also update lsb.queues file. Since host types and host models as well as the virtual name 'default' can be used to refer to all hosts of that type, model, or every other LSF host not covered by the definitions, you may not need to change any of the files, if the host is already covered.
Step 3.
Run badmin reconfig to tell mbatchd to pick up the new configuration.
Step 4.
Start sbatchd on the added host by running badmin hstartup or simply start it by hand.

Removing a Batch Server Host

To remove a host as a batch server host, follow the steps below:

Step 1.
If you need to permanently remove a host from your cluster, you should use badmin hclose to prevent new batch jobs from starting on the host, and wait for any running jobs on that host to finish. If you wish to shut the host down before all jobs complete, use bkill to kill the running jobs.
Step 2.
Modify lsb.hosts and lsb.queues in LSB_CONFDIR/cluster/configdir directory and remove the host from any of the sections.
Step 3.
Run badmin hshutdown to shutdown sbatchd on that host.

CAUTION!
You should never remove the master host from LSF Batch. Change LIM configuration to assign a different default master host if you want to remove your current default master from the LSF Batch server pool.

Adding a Batch Queue

Adding a batch queue does not affect pending or running LSF Batch jobs. To add a batch queue to a cluster:

Step 1.
Log in as the LSF administrator on any host in the cluster.
Step 2.
Edit the LSB_CONFDIR/cluster/configdir/lsb.queues file to add the new queue definition. You can copy another queue definition from this file as a starting point; remember to change the QUEUE_NAME of the copied queue. Save the changes to lsb.queues. See 'The lsb.queues File' for a complete description of LSF Batch queue configuration.
Step 3.
Run the command badmin ckconfig to check the new queue definition. If any errors are reported, fix the problem and check the configuration again. See 'Checking the LSF Configuration' for an example of normal output from badmin ckconfig.
Step 4.
When the configuration files are ready, run badmin reconfig. The master batch daemon (mbatchd) is unavailable for approximately one minute while it reconfigures. Pending and running jobs are not affected.

Removing a Batch Queue

Before removing a queue, you should make sure there are no jobs in that queue. If you remove a queue that has jobs in it, the jobs are temporarily moved to a lost and found queue. Jobs in the lost and found queue remain pending until the user or the LSF administrator uses the bswitch command to switch the jobs into regular queues. Jobs in other queues are not affected.

In this example, move all pending and running jobs in the night queue to the idle queue, and then delete the night queue.

Step 1.
Log in as the LSF administrator on any host in the cluster.
Step 2.
Close the queue to prevent any new jobs from being submitted:
% badmin qclose night
Queue <night> is closed
Step 3.
Move all pending and running jobs into another queue. The bswitch -q night argument chooses jobs from the night queue, and the job ID number 0 specifies that all jobs should be switched:
% bjobs -u all -q night
JOBID USER  STAT  QUEUE    FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
5308  user5 RUN   night    hostA       hostD       sleep 500  Nov 21 18:16
5310  user5 PEND  night    hostA                   sleep 500  Nov 21 18:17

% bswitch -q night idle 0
Job <5308> is switched to queue <idle>
Job <5310> is switched to queue <idle>
Step 4.
Edit the LSB_CONFDIR/cluster/configdir/lsb.queues file. Remove (or comment out) the definition for the queue being removed. Save the changes.
Step 5.
Run the command badmin reconfig. If any problems are reported, fix them and run badmin reconfig again. The batch system is unavailable for about one minute while the system rereads the configuration.

Controlling LSF Batch Jobs

The LSF administrator can control batch jobs belonging to any user. Other users may control only their own jobs. Jobs can be suspended, resumed, killed, and moved within and between queues.

Moving Jobs - bswitch, btop, and bbot

The bswitch command moves pending and running jobs from queue to queue. The btop and bbot commands change the dispatching order of pending jobs within a queue. The LSF administrator can move any job. Other users can move only their own jobs.

The btop and bbot commands do not allow users to move their own jobs ahead of those submitted by other users. Only the execution order of the user's own jobs is changed. The LSF administrator can move one user's job ahead of another user's. The btop, bbot, and bswitch commands are described in the LSF User's Guide and in the btop(1)and bswitch(1)manual pages.

Signalling Jobs - bstop, bresume, and bkill

The bstop, bresume and bkill commands send UNIX signals to batch jobs. See the kill(1) manual page for a discussion of the UNIX signals.

bstop sends SIGSTOP to sequential jobs and SIGTSTP to parallel jobs.

bresume sends a SIGCONT.

bkill sends the specified signal to the process group of the specified jobs. If the -s option is not present, the default operation of bkill is to send a SIGKILL signal to the specified jobs to kill these jobs. Twenty seconds before SIGKILL is sent, SIGTERM and SIGINT are sent to give the job a chance to catch the signals and clean up.

Users are only allowed to send signals to their own jobs. The LSF administrator can send signals to any job. See the LSF User's Guide and the manual pages for more information about these commands.

This example shows the use of the bstop and bkill commands:

% bstop 5310
Job <5310> is being stopped

% bjobs 5310
JOBID USER  STAT  QUEUE    FROM_HOST   EXEC_HOST  JOB_NAME   SUBMIT_TIME
5310  user5 PSUSP night    hostA                  sleep 500  Nov 21 18:17

% bkill 5310
Job <5310> is being terminated

% bjobs 5310
JOBID USER  STAT  QUEUE    FROM_HOST   EXEC_HOST  JOB_NAME   SUBMIT_TIME
5310  user5 EXIT  night    hostA                  sleep 500  Nov 21 18:17

Tuning LSF Batch

Each batch job has its resource requirements. Batch server hosts that match the resource requirements are the candidate hosts. When the batch daemon wants to schedule a job, it first asks the LIM for the load index values of all the candidate hosts. The load values for each host are compared to the scheduling conditions. Jobs are only dispatched to a host if all load values are within the scheduling thresholds.

When a job is running on a host, the batch daemon periodically gets the load information for that host from the LIM. If the load values cause the suspending conditions to become true for that particular job, the batch daemon performs the SUSPEND action to the process group of that job. The batch daemon allows some time for changes to the system load to register before it considers suspending another job.

When a job is suspended, the batch daemon periodically checks the load on that host. If the load values cause the scheduling conditions to become true, the daemon performs the RESUME action to the process group of the suspended batch job.

The SUSPEND and RESUME actions are configurable as described in 'Configurable Job Control Actions'.

LSF Batch has a wide variety of configuration options. This section describes only a few of the options to demonstrate the process. For complete details, see 'LSF Batch Configuration Reference'. The algorithms used to schedule jobs and concepts involved are described in 'How LSF Batch Schedules Jobs'.

Controlling Interference via Load Conditions

LSF is often used on systems that support both interactive and batch users. On one hand, users are often concerned that load sharing will overload their workstations and slow down their interactive tasks. On the other hand, some users want to dedicate some machines for critical batch jobs so that they have guaranteed resources. Even if all your workload is batch jobs, you still want to reduce resource contentions and operating system overhead to maximize the use of your resources.

Numerous parameters in LIM and LSF Batch configurations can be used to control your resource allocation and to avoid undesirable contention.

Since interferences are often reflected from the load indices, LSF Batch responds to load changes to avoid or reduce contentions. LSF Batch can take actions on jobs to reduce interference before or after jobs are started. These actions are triggered by different load conditions. Most of the conditions can be configured at both the queue level and at the host level. Conditions defined at the queue level apply to all hosts used by the queue, while conditions defined at the host level apply to all queues using the host.

To effectively reduce interference between jobs, correct load indices should be used properly. Below are examples of a few frequently used parameters.

Paging Rate (pg)

The paging rate (pg) load index relates strongly to the perceived interactive performance. If a host is paging applications to disk, the user interface feels very slow.

The paging rate is also a reflection of a shortage of physical memory. When an application is being paged in and out frequently, the system is spending a lot of time doing overhead, resulting in reduced performance.

The paging rate load index can be used as a threshold to either stop sending more jobs to the host, or to suspend an already running batch job so that interactive users will not be interfered.

This parameter can be used in different configuration files to achieve different purposes. By defining paging rate threshold in lsf.cluster.cluster, the host will become busy from LIM's point of view, therefore no more jobs will be advised by LIM to run on this host.

By including paging rate in LSF Batch queue or host scheduling conditions, batch jobs can be prevented from starting on machines with heavy paging rate, or be suspended or even killed if they are interfering with the interactive user on the console.

A batch job suspended due to pg threshold will not be resumed even if the resume conditions are met unless the machine is interactively idle for more than PG_SUSP_IT minutes, as described in 'Parameters'.

Interactive Idle Time (it)

Stricter control can be achieved using the idle time (it) index. This index measures the number of minutes since any interactive terminal activity. Interactive terminals include hard wired ttys, rlogin and lslogin sessions, and X shell windows such as xterm. On some hosts, LIM also detects mouse and keyboard activity.

This index is typically used to prevent batch jobs from interfering with interactive activities. By defining the suspending condition in LSF Batch queue as 'it==0 && pg >50', a batch job from this queue will be suspended if the machine is not interactively idle and paging rate is higher than 50 pages per second. Further more, by defining resuming condition as 'it>5 && pg <10' in the queue, a suspended job from the queue will not resume unless it has been idle for at least 5 minutes and the paging rate is less than 10 pages per second.

The it index is only non-zero if no interactive users are active. Setting the it threshold to 5 minutes allows a reasonable amount of think time for interactive users, while making the machine available for load sharing, if the users are logged in but absent.

For lower priority batch queues, it is appropriate to set an it scheduling threshold of 10 minutes and suspending threshold of 2 minutes in the lsb.queues file. Jobs in these queues are suspended while the execution host is in use, and resume after the host has been idle for a longer period. For hosts where all batch jobs, no matter how important, should be suspended, set a per-host suspending threshold in the lsb.hosts file.

CPU Run Queue Length (r15s, r1m, r15m)

Running more than one CPU-bound process on a machine (or more than one process per CPU for multiprocessors) can reduce the total throughput because of operating system overhead, as well as interfering with interactive users. Some tasks such as compiling can create more than one CPU intensive task.

Batch queues should normally set CPU run queue scheduling thresholds below 1.0, so that hosts already running compute-bound jobs are left alone. LSF Batch scales the run queue thresholds for multiprocessor hosts by using the effective run queue lengths, so multiprocessors automatically run one job per processor in this case. For concept of effective run queue lengths, see lsfintro(1).

For short to medium-length jobs, the r1m index should be used. For longer jobs, you may wish to add an r15m threshold. An exception to this are high priority queues, where turnaround time is more important than total throughput. For high priority queues, an r1m scheduling threshold of 2.0 is appropriate.

CPU Utilization (ut)

The ut parameter measures the amount of CPU time being used. When all the CPU time on a host is in use, there is little to gain from sending another job to that host unless the host is much more powerful than others on the network. The lsload command reports ut in percent, but the configuration parameter in the lsf.cluster.cluster file and the LSF Batch configuration files is set as a fraction in the range from 0 to 1. A ut threshold of 0.9 prevents jobs from going to a host where the CPU does not have spare processing cycles.

If a host has very high pg but low ut, then it may be desirable to suspend some jobs to reduce the contention.

The commands bhist and bjobs are useful for tuning batch queues. bhist shows the execution history of batch jobs, including the time spent waiting in queues or suspended because of system load. bjobs -p shows why a job is pending.

Understanding Suspended Jobs

A batch job is suspended when the load level of the execution host causes the suspending condition to become true. The bjobs -lp command shows the reason why the job was suspended together with the scheduling parameters. Use bhosts -l to check the load levels on the host, and adjust the suspending conditions of the host or queue if necessary.

The bhosts -l gives the most recent load values used for the scheduling of jobs.

% bhosts -l hostB
HOST:  hostB
STATUS        CPUF  JL/U  MAX NJOBS RUN SSUSP USUSP RSV  DISPATCH_WINDOWS
ok           20.00    2    2    0    0     0     0    0        -

CURRENT LOAD USED FOR SCHEDULING:
           r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem
Total       0.3   0.8   0.9   61%   3.8    72   26     0    6M  253M  297M
Reserved    0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M

LOAD THRESHOLD USED FOR SCHEDULING:
           r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem
loadSched   -      -     -      -    -      -    -     -    -     -     -
loadStop    -      -     -      -    -      -    -     -    -     -     -

A '-' in the output indicates that the particular threshold is not defined. If no suspending threshold is configured for a load index, LSF Batch does not check the value of that load index when deciding whether to suspend jobs. Normally, the swp and tmp indices are not considered for suspending jobs, because suspending a job does not free up the space being used. However, if swp and tmp are specified by the STOP_COND parameter in your queue, these indices are considered for suspending jobs.

The load indices most commonly used for suspending conditions are the CPU run queue lengths, paging rate and idle time. To give priority to interactive users, set the suspending threshold on the it load index to a non-zero value. Batch jobs are stopped (within about 1.5 minutes) when any user is active, and resumed when the host has been idle for the time given in the it scheduling condition.

To tune the suspending threshold for paging rate, it is desirable to know the behaviour of your application. On an otherwise idle machine, check the paging rate using lsload. Then start your application. Watch the paging rate as the application runs. By subtracting the active paging rate from the idle paging rate, you get a number for the paging rate of your application. The suspending threshold should allow at least 1.5 times that amount. A job may be scheduled at any paging rate up to the scheduling threshold, so the suspending threshold should be at least the scheduling threshold plus 1.5 times the application paging rate. This prevents the system from scheduling a job and then immediately suspending it because of its own paging.

The effective CPU run queue length condition should be configured like the paging rate. For CPU-intensive sequential jobs, the effective run queue length indices increase by approximately one for each job. For jobs that use more than one process, you should make some test runs to determine your job's effect on the run queue length indices. Again, the suspending threshold should be equal to at least the scheduling threshold plus 1.5 times the load for one job.

Suspending thresholds can also be used to enforce inter-queue priorities. For example, if you configure a low-priority queue with an r1m (1 minute CPU run queue length) scheduling threshold of 0.25 and an r1m suspending threshold of 1.75, this queue starts one job when the machine is idle. If the job is CPU intensive, it increases the run queue length from 0.25 to roughly 1.25. A high-priority queue configured with a scheduling threshold of 1.5 and an unlimited suspending threshold will send a second job to the same host, increasing the run queue to 2.25. This exceeds the suspending threshold for the low priority job, so it is stopped. The run queue length stays above 0.25 until the high priority job exits. After the high priority job exits the run queue index drops back to the idle level, so the low priority job is resumed.

Controlling Fairshare

By default LSF Batch schedules user jobs according to the First-Come-First-Serve (FCFS) principle. If your sites have many users contending for limited resources, the FCFS policy is not enough. For example, a user could submit 1000 long jobs in one morning and occupying all the resources for a whole week, while other users's urgent jobs wait in queues.

LSF Batch provides fairshare scheduling to give you control on how resources should be shared by competing users. Fairshare can be configured so that LSF Batch can schedule jobs according to each user or user group's configured shares. When fairshare is configured, each user or user group is assigned a priority based on the following factors:

If a user or group has used less than their share of the processing resources, their pending jobs (if any) are scheduled first, jumping ahead of other jobs in the batch queues. The CPU times used for fairshare scheduling are not normalised for the host CPU speed factors.

The special user names others and default can also be assigned shares. The name others refers to all users not explicitly listed in the USER_SHARES parameter. The name default refers to each user not explicitly named in the USER_SHARES parameter. Note that default represents a single user name while others represents a user group name. The special host name all can be used to refer to all batch server hosts in the cluster.

Fairshare affects job scheduling only if there is resource contentions among users such that users with more shares will run more jobs than users with less shares. If there is only one user having jobs to run, then fairshare has no effect on job scheduling.

Fairshare in LSF Batch can be configured at either queue level or host level. At queue level, the shares apply to all users who submit jobs to the queue and all hosts that are configured as hosts for the queue. It is possible that several queues share some hosts as servers, but each queue can have its own fairshare policy.

Queue level fairshare is defined using the keyword FAIRSHARE.

If you want strict resource allocation control on some hosts for all workload, configure fairshare at the host level. Host level fairshare is configured as a host partition. Host partition is a configuration option that allows a group of server hosts to be shared by users according to configured shares. In a host partition each user or group of users is assigned a share. The bhpart command displays the current cumulative CPU usage and scheduling priority for each user or group in a host partition.

Below are some examples of configuring fairshare at both queue level and host level. Details of the configuration syntax are described in 'Host Partitions' and 'Scheduling Policy'.

Note
Do not define fairshare at both the host and the queue level if the queue uses some or all hosts belonging to the host partition because this results in policy conflicts. Doing so will result in undefined scheduling behaviour.

Favouring Critical Users

If you have a queue that is shared by critical users and non-critical users, you can configure fairshare so that as long as there are jobs from key users waiting for resource, non-critical users' jobs will not be dispatched.

First you can define a user group key_users in lsb.users file. Then define your queue such that FAIRSHARE is defined:

Begin Queue
QUEUE_NAME = production 
FAIRSHARE = USER_SHARES[[key_users@, 2000] [others, 1]]
...
End Queue

By this configuration, key_users each have 2000 shares, while other users together have only 1 share. This makes it virtually impossible for other users' jobs to get dispatched unless no user in the key_users group has jobs waiting to run.

Note that a user group followed by an '@' refers to each user in that group, as you could otherwise configure by listing every user separately, each having 2000 shares. This also defines equal shares among the key_users. If '@' is not present, then all users in the user group share the same share and there will be no fairshare among them.

You can also use host partition to achieve similar result if you want the same fairshare policy to apply to jobs from all queues.

Sharing Hosts Between Two Groups

Suppose two departments contributed to the purchase of a large system. The engineering department contributed 70 percent of the cost, and the accounting department 30 percent. Each department wants to get (roughly) their money's worth from the system.

Configure two user groups in the lsb.users file, one listing all the users in the engineering group, and one listing all the members in the accounting group:

Begin UserGroup
Group_Name   Group_Member
eng_users    (user6 user4)
acct_users   (user2 user5)
End UserGroup

Then configure a host partition for the host, listing the appropriate shares:

Begin HostPartition
PART_NAME = big_servers
HOSTS = hostH
USER_SHARES = [eng_users, 7] [acct_users, 3]
End HostPartition

Note the difference in defining USER_SHARES in a queue and in a host partition. Alternatively, the shares can be configured for each member of a user group by appending an '@' to the group name:

USER_SHARES = [eng_users@, 7] [acct_users@, 3]

If a user is configured to belong to two user groups, the user can specify which group the job belongs to with the -P option to the bsub command.

Similarly you can define the same policy at the queue level if you want to enforce this policy only within a queue.

Round-Robin Scheduling

Round-robin scheduling balances the resource usage between users by running one job from each user in turn, independent of what order the jobs arrived in. This can be configured by defining equal share for everybody. For example:

Begin HostPartition
HPART_NAME = even_share
HOSTS = all
USER_SHARES = [default, 1]
End HostPartition

Dispatch and Run Windows

The concept of dispatch and run windows for LSF Batch are described in 'How LSF Batch Schedules Jobs'.

This can be achieved by configuring dispatch windows for the host in the lsb.hosts files, and run windows and dispatch windows for queues in lsb.queues file.

Dispatch windows in lsb.hosts file cause batch server hosts to be closed unless the current time is inside the time windows. When a host is closed by a time window, no new jobs will be sent to it, but the existing jobs running on it will remain running. Details about this parameter is described in 'Host Section'.

Dispatch and run windows defined in lsb.queues limit when a queue can dispatch new jobs and when jobs from a queue are allowed to run. A run window differs from a dispatch window in that when a run window is closed, jobs that are already running will be suspended instead of remain running. Details of these two parameters are described in 'The lsb.queues File'.

Controlling Job Slot Limits

By defining different job slot limits to hosts, queues, and users, you can control batch job processing capacity for your cluster, hosts, and users. For example, by limiting maximum job slot for each of your hosts, you can make sure that your system operates at optimal performance. By defining a job slot limit for some users, you can prevent some users from using up all the job slots in the system at one time. There are a variety of job slot limits that can be used for very different purposes, see 'Job Slot Limits' for more concepts and descriptions of job slot limits. Configuration parameters for job slot limits are described in 'LSF Batch Configuration Reference'.

Resource Reservation

The concept of resource reservation was discussed in 'Resource Reservation'.

The resource reservation feature at the queue level allows the cluster administrator to specify the amount of resources the system should reserve for jobs in the queue. It also serves as the upper limits of resource reservation if a user also specifies it when submitting a job.

The resource reservation requirement can be configured at the queue level as part of the queue level resource requirements. For example:

Begin Queue
.
RES_REQ = select[type==any] rusage[swap=100:mem=40:duration=60]
.
End Queue

will allow a job to be scheduled on any host that the queue is configured to use and will reserve 100 megabytes of swap and 40 megabytes of memory for a duration of 60 minutes. See 'Queue-Level Resource Reservation' for detailed configuration syntax for this parameter.

Processor Reservation

The concept of processor reservation was described in 'Processor Reservation'. You may want to configure this feature if your cluster has a lot of sequential jobs that compete for resources with parallel jobs.

See 'Processor Reservation for Parallel Jobs' for configuration options for this feature.

Controlling Job Execution Environment

Understanding Job Execution Environment

When LSF Batch runs your jobs, it tries to make it as transparent to the user as possible. By default, the execution environment is maintained to be as close to the current environment as possible. LSF Batch will copy the environment from the submission host to execution host. It also sets the umask and the current working directory.

Since a network can be heterogeneous, it is often impossible or undesirable to reproduce the submission host's execution environment on the execution host. For example, if home directory is not shared between submission and execution host, LSF Batch runs the job in the /tmp on the execution host. If DISPLAY environment variable is something like 'Unix:0.0', or ':0.0', then it must be processed before using on the execution host. These are automatically handled by LSF Batch.

Users can change the default behaviour by using a job starter, or using the '-L' option of the bsub command to change the default execution environment. See 'Using A Job Starter' for details of a job starter.

For resource control purpose, LSF Batch also changes some of the execution environment of jobs. These include nice values, resource limits, or any other environment by configuring a job starter.

In addition to environment variables inherited from the user, LSF Batch also sets a few more environment variables for batch jobs. These are:

NICE Value

Many LSF tools use LSF Remote Execution Server (RES) to run jobs such as lsrun, lsmake, lstcsh, and lsgrun. You can control the execution priority of jobs started via RES by modifying your LIM configuration file lsf.cluster.cluster. This can be done by defining the REXPRI parameter for individual hosts. See 'Descriptive Fields' for details of this parameter.

LSF Batch jobs can be run with a nice value as defined in your lsb.queues file. Each queue can have a different nice value. See 'NICE' for details of this parameter.

Resource Limits

Resource limits control how much resource can be consumed by jobs. By defining such limits, the cluster administrator can have better control of resource usage. For example, by defining a high priority short queue, you can allow short jobs to be scheduled earlier than long jobs. To prevent some users from submitting long jobs to this short queue, you can set CPU limit for the queue so that no jobs submitted from the queue can run for longer than that limit.

Details of resource limit configuration are described in 'Resource Limits'.

Pre-execution and Post-execution commands

Your batch jobs can be accompanied with a pre-execution and a post-execution command. This can be used for many purposes. For example, creation and deletion of scratch directories, or check for necessary conditions before running the real job. Details of these concepts are described in 'Pre- and Post-execution Commands'.

The pre-execution and post-execution commands can be configured at the queue level as described in 'Queue-Level Pre-/Post-Execution Commands'.

Using A Job Starter

Some jobs have to be started under particular shells or require certain setup steps to be performed before the actual job is executed. This is often handled by writing wrapper scripts around the job. The LSF job starter feature allows you to specify an executable which will perform the actual execution of the job, doing any necessary setup before hand. One typical use of this feature is to customize LSF for use with Atria ClearCase environment. See 'Support for Atria ClearCase'.

The job starter can be specified at the queue level using the JOB_STARTER parameter in the lsb.queues file. This allows the LSF Batch queue to control the job startup. For example, the following might be defined in a queue:

Begin Queue
.
JOB_STARTER = xterm -e
.
End Queue

This way all jobs submitted into this queue will be run under an xterm.

The following are other possible uses of a job starter:

A job starter is configured at the queue level. See 'Job Starter' for details.

Using Licensed Software with LSF Batch

Many applications have restricted access based on the number of software licenses purchased. LSF can help manage licensed software by automatically forwarding jobs to licensed hosts, or by holding jobs in batch queues until licenses are available.

There are three main types of software license: host locked, host locked counted, and network floating.

Host Locked Licenses

Host locked software licenses allow users to run an unlimited number of copies of the product on each of the hosts that has a license. You can configure a boolean resource to represent the software license, and configure your application to require the license resource. When users run the application, LSF chooses the best host from the set of licensed hosts.

See 'Customizing Host Resources' for instructions on configuring boolean resources, and 'The lsf.task and lsf.task.cluster Files' for instructions on configuring resource requirements for an application.

Host Locked Counted Licenses

Host locked counted licenses are only available on specific licensed hosts, but also place a limit on the maximum number of copies available on the host. If an External LIM can get the number of licenses currently available, you can configure an external load index licenses giving the number of free licenses on each host. By specifying licenses>=1 in the resource requirements for the application, you can restrict the application to run only on hosts with available licenses.

See 'Changing LIM Configuration' for instructions on writing and using an ELIM, and 'The lsf.task and lsf.task.cluster Files' for instructions on configuring resource requirements for an application.

If a shell script check_license can check license availability and acquires a license if one is available, another solution is to use this script as a pre-execution command when submitting the licensed job.

% bsub -m licensed_hosts -E check_license licensed_job

An alternative is to configure the check_license script as a queue level pre-execution command (see 'Queue-Level Pre-/Post-Execution Commands' for more details).

It is possible that the license becomes unavailable between the time the check_license script is run, and when the job is actually run. To handle this case, the LSF administrator can configure a queue so that jobs in this queue will be requeued if they exit with value(s) indicating that the license was not successfully obtained (see 'Automatic Job Requeue').

Floating Licenses

A floating license allows up to a fixed number of machines or users to run the product at the same time, without restricting which host the software can run on. Floating licenses can be thought of as 'cluster resources'; rather than belonging to a specific host, they belong to all hosts in the cluster.

You can also use the resource reservation feature to control floating licenses. To do this, configure an external load index and write an ELIM that always reports a static number N, where N is the total number of licenses. Configure queue level resource requirements such that the rusage section specifies the reservation requirement of one license for the duration of the job execution. This way, LSF Batch keeps track of the counter and will not over-commit licenses by always running no more than N jobs at the same time. Details for configuring a queue level resource requirement are described in 'Queue-Level Resource Requirement'.

Alternatively, a pre-execution command can be configured so that LSF Batch periodically checks for the availability of a license, and keeps the job pending in the queue until a license becomes available (and a suitable execution host can be found). Pre-execution conditions are described in 'Queue-Level Pre-/Post-Execution Commands'.

As another alternative, a site can configure requeue exit values so that a job will be requeued if it fails to get a license (see 'Automatic Job Requeue').

Using LSF Batch to run licensed software can improve the utilization of the licenses - the licenses can be kept in use 24 hours a day, 7 days a week. For expensive licenses, this increases their value to the users. Also, productivity can be increased, as users do not have to wait around for a license to become available.

Example LSF Batch Configuration Files

Example Queues

There are numerous ways to build queues. This section gives some examples.

Idle Queue

You want to dispatch large batch jobs only to those hosts that are idle. These jobs should be suspended as soon as an interactive user begins to use the machine. You can (arbitrarily) define a host to be idle if there has been no terminal activity for at least 5 minutes and the 1 minute average run queue is no more than 0.3. The idle queue does not start more than one job per processor.

Begin Queue
QUEUE_NAME  = idle
NICE        = 20
RES_REQ     = it>5 && r1m<0.3
DTOP_COND   = it==0
RESUME_COND = it>10
PJOB_LIMIT  = 1
End Queue

Owners Queue

If a department buys some fast servers with its own budget, they may want to restrict the use of these machines to users in their group. The owners queue includes a USERS section defining the list of users and user groups that are allowed to use these machines. This queue also defines fairshare policy so that users can have equal sharing of resources.

Begin Queue
QUEUE_NAME = owners
PRIORITY   = 40
r1m        = 1.0/3.0
FAIRSHARE  = USER_SHARES[[default, 1]]
USERS      = server_owners
HOSTS      = server1 server2 server3
End Queue

Night Queue

On the other hand, the department may want to allow other people to use its machines during off hours so that the machine cycles are not wasted. The night queue only schedules jobs after 7 p.m. and kills jobs around 8 a.m. every day. Jobs are also allowed to run over the weekend.

To ensure jobs in the night queue do not hold up resources after the run window is closed, TERMINATE_WHEN is defined as WINDOW so that when the run window is closed, jobs that have been started but have not finished will be killed.

Because no USERS section is given, all users can submit jobs to this queue. The HOSTS section still contains the server host names. By setting MEMLIMIT for this queue, jobs that use a lot of real memory automatically have their time sharing priority reduced on hosts that support the RLIMIT_RSS resource limit.

This queue also reserves swp memory of 40MB for the job and this reservation decreases to 0 over 20 minutes after the job starts.

Begin Queue
QUEUE_NAME     = night
RUN_WINDOW     = 5:19:00-1:08:00 19:00-08:00
PRIORITY       = 5
RES_REQ        = ut<0.5 && swp>50 rusage[swp=40:duration=20:decay=1]
r1m            = 0.5/3.0
MEMLIMIT       = 5000
TERMINATE_WHEN = WINDOW
HOSTS          = server1 server2 server3
DESCRIPTION    = Low priority queue for overnight jobs
End Queue

License Queue

Some software packages have fixed licenses and must be run on certain hosts. Suppose a package is licensed to run only on a few hosts as are tagged with product resource. Also suppose that on each of these hosts, only one license is available.

To ensure correct hosts are chosen to run jobs, a queue level resource requirement 'type==any && product' is defined. To ensure that the job gets a license when it starts, the HJOB_LIMIT has been defined to limit one job per host. Since software licenses are expensive resources that should not be under-utilized, the priority of this queue has been defined to be higher than any other queues so that jobs in this queue are considered for scheduling first. It also has a small nice value so that more CPU time is allocated to jobs from this queue.

Begin Queue
QUEUENAME   = license
NICE        = 0
PRIORITY    = 80
HJOB_LIMIT  = 1
RES_REQ     = type==any && product
r1m         = 2.0/4.0
DESCRIPTION = Licensed software queue
End Queue

Short Queue

The short queue can be used to give faster turnaround time for short jobs by running them before longer jobs.

Jobs from this queue should always be dispatched first, so this queue has the highest PRIORITY value. The r1m scheduling threshold of 2 and no suspending threshold mean that jobs are dispatched even when the host is being used and are never suspended. The CPULIMIT value of 15 minutes prevents users from abusing this queue; jobs running more than 15 minutes are killed.

Because the short queue runs at a high priority, each user is only allowed to run one job at a time.

Begin Queue
QUEUE_NAME  = short
PRIORITY    = 50
r1m         = 2/
CPULIMIT    = 15
UJOB_LIMIT  = 1
DESCRIPTION = For jobs running less than 15 minutes
End Queue

Because the short queue starts jobs even when the load on a host is high, it can preempt jobs from other queues that are already running on a host. The extra load created by the short job can make some load indices exceed the suspending threshold for other queues, so that jobs from those other queues are suspended. When the short queue job completes, the load goes down and the preempted job is resumed.

Front End Queue

Some special-purpose computers are accessed through front end hosts. You can configure the front end host in lsb.hosts so that it accepts only one job at a time, and then define a queue that dispatches jobs to the front end host with no scheduling constraints.

Suppose hostD is a front end host:

Begin Queue
QUEUE_NAME  = front
PRIORITY    = 50
HOSTS       = hostD
JOB_STARTER = pload 
DESCRIPTION = Jobs are queued at hostD and started with pload command
End Queue

NQS Forward Queue

To interoperate with NQS, you must configure one or more LSF Batch queues to forward jobs to remote NQS hosts. An NQS forward queue is an LSF Batch queue with the parameter NQS_QUEUES defined. The following queue forwards jobs to the NQS queue named pipe on host cray001:

Begin Queue
QUEUE_NAME  = nqsUse
PRIORITY    = 30
NICE        = 15
QJOB_LIMIT  = 5
CPULIMIT    = 15
NQS_QUEUES  = pipe@cray001
DESCRIPTION = Jobs submitted to this queue are forwarded to NQS_QUEUES
USERS       = all
End Queue

Example lsb.hosts file

The lsb.hosts file defines host attributes. Host attributes also affect the scheduling decisions of LSF Batch. By default LSF Batch uses all server hosts as configured by LIM configuration files. In this case you do not have to list all hosts in the Host section. For example:

Begin Host
HOST_NAME    MXJ    JL/U     swp     # This line is keyword(s)
default       2      1        20
End Host

The virtual host name default refers to each of the other hosts configured by LIM but is not explicitly mentioned in the Host section of the lsb.hosts file. This file defines a total allowed job slot limit of 2 and a per user job limit of 1 for every batch server host. It also defines a scheduling load threshold of 20MB of swap memory.

In most cases your cluster is heterogeneous in some way, so you may have different controls for different machines. For example:

Begin Host
HOST_NAME    MXJ    JL/U     swp     # This line is keyword(s)
hostA         8      2        ()
hppa          2     ()        ()
default       2      1        20
End Host

In this file you add host type hppa in the HOST_NAME column. This will include all server hosts from LIM configuration that have host type hppa and are not explicitly listed in the Host section of this file. You can also use a host model name for this purpose. Note the '()' in some of the columns. It refers to undefined parameters and serves as a place-holder for that column.

lsb.hosts file can also be used to define host groups and host partitions, as exemplified in 'Sharing Hosts Between Two Groups'.

Managing LSF Cluster Using xlsadmin

xlsadmin is a GUI tool for managing your LSF cluster. This tool allows you to do the LSF management work described so far in this chapter as well as the tasks in 'Managing LSF Base'.

xlsadmin consists of two operation modes: managing and configuration. In managing mode, xlsadmin allows you to:

In configuration mode, xlsadmin allows you to:

Figure 5 is the xlsadmin management main window. The upper area displays all cluster hosts defined by LIM configuration. The middle contains two areas listing queues and batch server hosts. The bottom area is a message window in response to an operation performed.

Figure 5. xlsadmin Management Window

xlsadmin - Management Window

To view the status of a host or queue, double click on the queue or host and you will get a popup window. Figure 6 is a batch server host popup window when you double click on hostB in the Batch Server Hosts area. Figure 7 is a batch queue popup window when you double click on night in the Batch Queues area.

To perform a control action on a host or queue, select the host or queue in the main management window and choose an operation from the Manage pull-down menu.

Figure 6. Batch Server Host Popup Window

Batch Server Host Popup Window

Figure 7. Batch Queue Popup Window

Batch Queue Popup Window

By clicking on the Config tab in the management main window, you switch to configuration mode and the main window will change to configuration main window, as shown in Figure 8.

Figure 8. Configuration Main Window

Configuration Main Window

The configuration main window contains all areas in a management main window, with the addition of definition areas shown as icons. The definition areas are for defining global names that are used by host or queue configurations or global parameters.

The icons in the upper area are used for defining host types, host models, resource names, task resource list, external load indices as you can otherwise do by editing lsf.shared file. The icons in the middle area allow you to define host groups, host partitions that you can otherwise do by editing lsb.hosts file, user parameters and user groups as you can otherwise do by editing lsb.users file, and parameters defined in lsb.params file.

By clicking on an icon you will get a popup window for editing that parameter. Figure 9 shows the resource name editing window when you click on the Resource icon.

Figure 9. Resource Name Editing Window

Resource Name Editing Window

By double clicking on a host or queue, you will get a popup window that allows you to modify the configuration parameters of that host or queue. You can also add or delete hosts or queues by using the Configure pull-down menu and choose the proper configuration options.

Figure 10 shows the host editing window for LIM configuration by double clicking on hostD in the Cluster Hosts area. This window modifies the host attributes of hostD as you can otherwise do by editing lsf.cluster.cluster file.

Figure 10. Host Editing Window for LIM Configuration

Host Editing Window for LIM Configuration

Figure 11 shows the queue editing window for creating a new queue.

Figure 11. Queue Definition Window

Queue Definition Window

After you have made all the configuration changes, you can save the changes to files by using the File pull-down menu and choosing Save To Files. You can use xlsadmin to verify the correctness of your configuration by using the File pull-down menu and choosing Check before you choose Commit from the File menu, which is equivalent to running lsadmin reconfig and badmin reconfig.


[Contents] [Prev] [Next] [End]

doc@platform.com

Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.