[Contents] [Prev] [Next] [End]


Chapter 7. Managing LSF JobScheduler


Production job scheduling has been an integral part of mainframe data processing operation for decades. With the emergence of distributed computing along with UNIX and Windows NT workstations and file servers, the system architecture has changed drastically, calling for a new approach to production job scheduling.

LSF JobScheduler is a distributed production job scheduling product; a separately licensed component of the LSF Version 3.0 suite. LSF JobScheduler integrates heterogeneous servers into a virtual mainframe to deliver high availability, robustness and ease-of-use. It provides the functions of traditional mainframe job scheduler with transparent operation across a network of heterogeneous UNIX and NT systems. LSF JobScheduler offers GUI input tools in addition to the standard command line interface.

Most features that can be configured for LSF Batch can also be configured for LSF JobScheduler. In addition, LSF JobScheduler supports many features such as calendars, file status events, site defined events, etc. See LSF JobScheduler User's Guide for full details of these features.

The configuration and management tasks for LSF JobScheduler is the same as those of LSF Batch, so you should read all previous chapters to do the administration tasks. This chapter discusses some additional tasks specific for LSF JobScheduler which include:

System Calendars

Calendars are normally created by users using the bcaladd command or the xbcal GUI interface. Calendars that are commonly used may be defined as system calendars, which can be referenced by all users. System calendars are defined in the lsb.calendars configuration file in the LSB_CONFDIR/cluster/configdir directory.

The lsb.calendars file consists of multiple Calendar sections where each section corresponds to one calendar. Each calendar section requires the NAME and TIME_EVENTS parameter and can optionally contain a DESCRIPTION parameter. The Calendar section is of the form:

Begin Calendar
NAME=<name>
TIME_EVENTS=<time events>
DESCRIPTION=<description>
End Calendar 

The syntax of the TIME_EVENTS parameter is described in the man page bcaladd(1). Also see 'Time Expression' in the LSF JobScheduler User's Guide.

The following is a sample lsb.calendars file:

Begin Calendar
NAME=Daily
TIME_EVENTS=*:*:*:8,6:00
DESCRIPTION=Daily morning and evening runs
End Calendar

Begin Calendar
NAME=Holiday
TIME_EVENTS=*:Dec:25:00:00%1D *:Jan:1:00:00%1D *:Jul:4:00:00%1D
DESCRIPTION=Holidays
End Calendar

System calendars are owned by the virtual user SYS and can be viewed by everybody. The bcal command displays the system calendars:

% bcal
CALENDAR_NAME      OWNER      STATUS    DURATION         NEXT_EVENT_TIME
Daily              SYS       inactive       -        Wed Dec 25 06:00:00 1996
Holiday            SYS       inactive       -        Wed Dec 25 00:00:00 1996
hourly             user1      active        7        Tue Dec 24 16:00:00 1996
complex            user1     inactive       -        Wed Dec 25 17:00:00 1996

System calendars cannot be created with the bcaladd command and they cannot be deleted with the bcaldel command. When a system calendar is defined, its name becomes a reserved calendar name in the cluster. Consequently, users cannot create a calendar with the same name as a system calendar.

External Event Management

LSF JobScheduler supports the scheduling of jobs based on external site-specific events. A typical use of this feature in data processing environment is to trigger jobs based on the arrival of data or the availability of tapes. Sites that use storage management systems, for example, can coordinate the dispatch of jobs with the staging of data from hierarchical storage onto disk.

The scheduling daemon (mbatchd) can startup and communicate with an external event daemon (EEVENTD) to detect the occurrence of events. The EEVENTD is implemented as an executable called eeventd which resides in LSF_SERVERDIR. Users can submit jobs specifying dependencies on any logical combination of external events using the -w option of the bsub command. External event dependencies can be combined with job, file, and calendar events.

A protocol is defined which allows mbatchd to indicate to the EEVENTD that a job is waiting on a particular event. The EEVENTD will monitor the event and possibly take actions to trigger it. When the event occurs, the EEVENTD informs mbatchd, which will then consider the job as eligible for dispatch provided appropriate hosts are available.

LSF JobScheduler comes with an EEVENTD for file event detection. If you want to monitor additional site events, you can simply add event detection functions into the existing EEVENTD. The source code of the default EEVENTD is also included in the release.

The EEVENTD Protocol

The protocol between the external event daemon, EEVENTD, and mbatchd consists of a sequence of ASCII messages that are exchanged over a socket pair. The startup sequence and message format for the protocol is described in the man page eeventd(8).

Each event is identified by an event name. The event name is an arbitrary string, which is site-specific. A user specifies job dependencies on an external event by using the -w option of the bsub command using the event keyword. For example:

% bsub -w 'event(tapeXYZ)' myjob

LSF JobScheduler considers the job to be waiting on an event with the name 'tapeXYZ'. There is no checking of the syntax of the event name by LSF JobScheduler. The EEVENTD can reject an event if the syntax is incorrect preventing the job from being dispatched until the user either modifies the event or removes the job. Alternatively, a site may write a wrapper submission script which checks the syntax of the event before it is submitted to LSF JobScheduler.

The following messages are sent from mbatchd to the EEVENTD:

SUB event_name
Subscribe to a the event given by event_name. Whenever a job is submitted with a new event name that mbatchd has not seen before, a subscribe request is sent to the EEVENTD. The EEVENTD is expected to monitor the event, and if necessary, to take any actions required for the event to occur.
UNSUB event_name
Unsubscribe to a given event when there are no jobs dependent on this event. This should cause the EEVENTD to stop monitoring the event.

The following messages are sent from the EEVENTD to mbatchd:

START event_name event_type [event_attrib]
Tells mbatchd to make the event active. The event_type field should be one of latched, pulse, pulseAll, or exclusive. The different event types control when mbatchd will inactivate an event as follows:
latched
Not automatically inactivated until an explicit END message is received.
pulse
Automatically inactivated when one job is dispatched. Subsequent START messages on the same event can cause one job to be dispatched, each time the event is pulsed.
pulseAll
Automatically inactivated after it is received. For pulseAll events, each job will maintain its own copy of the event state. When a pulseAll event is triggered, all jobs currently waiting on the event will have their copy of the event state marked as active and will be eligible for dispatch. Subsequently submitted jobs will view the event as inactive.
exclusive
Automatically inactivated when one job is dispatched and kept inactive until the job completes. Subsequent attempts by the EEVENTD to activate the event are ignored until job completion.
The event_attrib is an optional attribute string that can be associated with the event. The event attribute is not interpreted by the system and is passed to a job when it starts via the LSB_EVENT_ATTRIB environment variable. It can be used to communicate information between the event daemon and the job. It is also displayed by the bevents command.
END event_name
Causes the event to be put in the inactive state. If the event is already inactive, this has no effect.
REJECT event_name [event_attribute]
Causes the event to be put in the reject state. This can be used to indicate a syntax error in the event name. Rejected events are considered to be inactive so that jobs waiting on them are not dispatched. The optional event_attrib can be used to give more information about why the job is rejected. This information will be displayed by the bevents command.

The sequence of interactions between mbatchd and the EEVENTD are shown in Figure 12.

Figure 12. mbatchd and EEVENTD Interactions

mbatchd and EEVENTD Interactions

Step 1.
User submits a job specifying dependency on the external event eventX
Step 2.
mbatchd scans event table to see if eventX already exists. If not, it creates the event and sends a subscribe message to the EEVENTD. The EEVENTD recognizes eventX and initiates monitoring it. If the EEVENTD could not recognize eventX, it returns a REJECT message to mbatchd.
Step 3.
The EEVENTD detects an occurrence of eventX and sends a START message telling mbatchd to try to schedule any jobs waiting on the event. Since eventX is latched, mbatchd will consider the event as active indefinitely. If a job also has a dependency on a calendar, it can be run multiple times while eventX is still active.
Step 4.
The EEVENTD detects that eventX is no longer occurring and sends an END message. mbatchd considers eventX as inactive and stops scheduling jobs waiting on the event.
Step 3 and step 4 can be repeated multiple times.
Step 5.
The user deletes the job waiting on eventX.
Step 6.
If there are no jobs in the system waiting on eventX, an UNSUB message is sent to the EEVENTD. This should cause the EEVENTD to stop monitoring eventX.

The external event daemon given in examples/eevent/eevent.c provides an example of a simple event daemon. It receives requests from mbatchd to subscribe and unsubscribe to events. Periodically, it scans the list of subscribed events and toggles the state of the event between active and inactive. The type of the event is chosen based on the event name, that is event names beginning with the string 'exclusive' or 'pulse' are treated as exclusive events or pulse events respectively. Otherwise the event is treated as a latched event.

File Event Handling

The handling of file events is implemented using the default external event daemon. The installation scripts automatically install the event daemon that handles file events.

Since only one external event daemon can run on a system, sites requiring file event handling in addition to site-specific events must modify the existing file event daemon. The source is provided in examples/eevent/fevent.c in the distribution directory.

You can monitor all external events using the bevents command. See the LSF JobScheduler User's Guide for details.


[Contents] [Prev] [Next] [End]

doc@platform.com

Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.