[Contents] [Prev] [Next] [End]


Chapter 2. Getting Started


This chapter describes how to use some of the basic features of LSF. After following the examples in this chapter you should be able to use LSF for most of the everyday tasks.

Configuration options shown in the following examples, such as host types and model names, host CPU factors (representing relative processor speed), and resource names are examples only; your system likely has different values for these settings.

Getting Cluster Information

Cluster information includes the cluster master host, cluster name, cluster resource definitions, cluster administrator, etc.

Displaying the Cluster and Master Names

LSF provides tools for users to get information about the system. The first command you want to use when you learn LSF is lsid. This command tells you the version of LSF, the name of your LSF cluster, and the current master host.

% lsid
LSF 3.0, Dec 10, 1996
Copyright 1992-1996 Platform Computing Corporation

My cluster name is test_cluster
My master name is hostA

To find out who your cluster administrator is and a summary of your cluster, run the lsclusters command:

% lsclusters 
CLUSTER_NAME   STATUS   MASTER_HOST               ADMIN    HOSTS  SERVERS
test_cluster    ok      hostb                     lsf      6      6

If you are using the LSF MultiCluster product, you will see one line for each of the clusters that your local cluster is connected to in the output of lsclusters.

Displaying Available Resources

The lsinfo command lists all the resources available in the cluster.

% lsinfo
RESOURCE_NAME   TYPE   ORDER  DESCRIPTION
r15s          Numeric   Inc   15-second CPU run queue length
r1m           Numeric   Inc   1-minute CPU run queue length (alias: cpu)
r15m          Numeric   Inc   15-minute CPU run queue length
ut            Numeric   Inc   1-minute CPU utilization (0.0 to 1.0)
pg            Numeric   Inc   Paging rate (pages/second)
io            Numeric   Inc   Disk IO rate (Kbytes/second)
ls            Numeric   Inc   Number of login sessions (alias: login)
it            Numeric   Dec   Idle time (minutes) (alias: idle)
tmp           Numeric   Dec   Disk space in /tmp (Mbytes)
swp           Numeric   Dec   Available swap space (Mbytes) (alias: swap)
mem           Numeric   Dec   Available memory (Mbytes)
ncpus         Numeric   Dec   Number of CPUs
ndisks        Numeric   Dec   Number of local disks
maxmem        Numeric   Dec   Maximum memory (Mbytes)
maxswp        Numeric   Dec   Maximum swap space (Mbytes)
maxtmp        Numeric   Dec   Maximum /tmp space (Mbytes)
cpuf          Numeric   Dec   CPU factor
rexpri        Numeric   N/A   Remote execution priority
server        Boolean   N/A   LSF server host
irix          Boolean   N/A   IRIX UNIX
hpux          Boolean   N/A   HP_UX
solaris       Boolean   N/A   SunSolaris
cserver       Boolean   N/A   Compute Server
fserver       Boolean   N/A   File server
aix           Boolean   N/A   AIX UNIX
type           String   N/A   Host type
model          String   N/A   Host model
status         String   N/A   Host status
hname          String   N/A   Host name

TYPE_NAME
HPPA
SGI6
ALPHA
SUNSOL
RS6K
NTX86

MODEL_NAME             CPU_FACTOR
DEC3000                  10.00
R10K                     14.00
PENT200                   6.00
IBM350                    7.00
SunSparc                  6.00
HP735                     9.00
HP715                     6.00

The lsinfo command displays three lists of information:

The resources listed by lsinfo include built-in resources maintained by the LIM and site specific resources configured by the LSF administrator. For a complete description of how LSF manages resources, see 'Resources'.

The host types and host models are defined by the LSF administrator. Host types represent binary compatible hosts; all hosts of the same type can run the same executables. Host models give the relative CPU performance of different processors. In this example, your LSF cluster treats an R10K processor as being twice as fast as an IBM 350 processor1..

Getting Host Information

LSF keeps information about all hosts in the cluster. Some information is static and some is dynamic. Static information is either configured by the LSF administrator, or is a fixed property of the system. An example of static host information is the amount of RAM memory available to users on a host.

Dynamic host information, or load indices, is determined by the LSF system, and updated regularly. Dynamic information represents the changing resources available on the host. Examples of dynamic host information are the current CPU load and the currently available temporary file space.

Displaying Static Host Information

A load sharing cluster may consist of hosts of differing architecture and speed. The lshosts command displays configuration information about hosts. All these parameters are defined by the LSF administrator in the LSF configuration files, or determined by the LIM directly from the system.

% lshosts
HOST_NAME      type    model cpuf ncpus maxmem maxswp server  RESOURCES
hostD         SUNSOL SunSparc 6.0    1   64M    112M    Yes (solaris cserver)
hostB         ALPHA  DEC3000 10.0    1   94M    168M    Yes (alpha cserver)
hostM          RS6K   IBM350  7.0    1   64M    124M    Yes (cserver aix)
hostC         SGI6     R10K  14.0   16 1024M    1896M   Yes (irix cserver)
hostA         HPPA     HP715  6.0    1   98M    200M    Yes (hpux fserver)

In this example, the host type SUNSOL represents Sun SPARC systems running Solaris, and ALPHA represents a Digital Alpha server running Digital Unix.

See 'Listing Hosts' for a complete description of the lshosts command.

Displaying Load Information

The lsload command prints out current load information.

% lsload
HOST_NAME   status  r15s  r1m r15m  ut    pg   ls  it   tmp  swp  mem
hostD           ok   0.1  0.0  0.1   2%   0.0   5   3   81M  82M  45M
hostC           ok   0.7  1.2  0.5  50%   1.1  11   0  322M 337M 252M 
hostM           ok   0.8  2.2  1.4  60%  15.4   0  136  62M  57M  45M
hostA         busy  *5.2  3.6  2.6  99% *34.4   4   0   70M  34M  18M
hostB        lockU   1.0  1.0  1.5  99%   0.8   5  33   12M  24M  23M

The first line lists the load index names, and each following line gives the load levels for one host. The r15s, r1m and r15m fields give the CPU load, averaged over different time intervals. The ut field gives the percentage of time the CPU is in use. pg is the paging rate, ls is the number of login sessions, it is the idle time (the time since the last interactive user activity), swp is the available swap space in megabytes, mem is the available RAM in megabytes, and tmp is the available temporary disk space in megabytes.

The status column gives the load status of the host. A host is busy if any load index is beyond its configured threshold. When a load index is beyond its threshold, it is printed with an asterisk '*'. In the above example, hostA is busy because load indices r15s and pg are too high. The lshosts -l command shows the load thresholds.

Hosts with ok status are listed first. The ok hosts are sorted based on CPU and memory load, with the best host listed first.

The lsload command reports more load indices if the -l option is given.

The lsmon command provides an updating display of load information. The xlsmon command is an X-windows graphical display of host status and load levels in your LSF cluster.

See the lsload(1), lsmon(1), and xlsmon(1) manual pages for more information. Also see 'Displaying the Load'.

Running Jobs

LSF supports transparent execution of jobs on all server hosts in the cluster. You can run your program on the best available host and interact with it just as if it were running directly on your workstation. Keyboard signals such as CTRL-Z and CTRL-C work as expected.

Running Jobs on Remote Hosts

There are different ways to run jobs on a remote host. To run myjob on the best available host, enter:

% lsrun myjob

LSF automatically selects the best host that is of the same type as the local host.

If you want to run myjob on a host with specific resources, you must specify the resource requirements. For example,

% lsrun -R 'cserver && swp>100' myjob

runs myjob on a host that has resource 'cserver' (see 'Displaying Available Resources') and has at least 100 megabytes of virtual memory available.

If you want to run your job on a particular host, use the -m option:

% lsrun -m hostD myjob

When you run an interactive job on a remote host, you can do most of the job controls as if it were running locally. If your shell supports job control, you can suspend and resume the job and bring the job to background or foreground as if it were a local job. For a complete description, see the lsrun(1) manual page.

You can also write one-line shell scripts or csh aliases to hide the remote execution. For example:

#! /bin/sh
# Script to remote execute myjob
exec lsrun -m hostD myjob

or

% alias myjob "lsrun -m hostD myjob"

Load Sharing Commands With lstcsh

The lstcsh shell is a load-sharing version of the tcsh command interpreter. It is compatible with csh and supports many useful extensions. csh and tcsh users can use lstcsh to send jobs to other hosts in the cluster without needing to learn any new commands. You can run lstcsh from the command line, or use the chsh command to set it as your login shell. Refer to 'Using lstcsh' for a more detailed description.

Parallel Processing With lsmake

lsmake is a load-sharing, parallel version of GNU make. It is compatible with makefiles for most versions of make. lsmake uses the LSF load information to choose the best group of hosts for your make job. Targets in the makefile are processed in parallel on the chosen hosts using the LSF remote execution facilities. You do not need to modify your makefile to use lsmake. By default, lsmake chooses hosts that are all of the same type.

The following example uses the lsmake -V and -j 3 options to run on three hosts and produce verbose output:

% lsmake -V -j 3
[hostA] [hostD] [hostK]
<< Execute on local host >>
cc -O -c arg.c -o arg.o
<< Execute on remote host hostA >>
cc -O -c dev.c -o dev.o
<< Execute on remote host hostK >>
cc -O -c main.c -o main.o
<< Execute on remote host hostD >>
cc -O arg.o dev.o main.o

lsmake includes control over parallelism for recursive makes, which are often used for source code trees that are organized into subdirectories. Parallelism can also be controlled by the load on the NFS file server, so that parallel makes do not overload the server and slow everyone else down. See 'Using lsmake' for details.

Batch Processing

Listing Hosts

LSF Batch uses some (or all) of the hosts in an LSF cluster as batch server hosts. The host list is configured by the LSF administrator. The bhosts command displays information about these hosts.

% bhosts
HOST_NAME     STATUS    JL/U  MAX   NJOBS  RUN  SSUSP USUSP  RSV
hostA           ok       -     2     1     1     0     0     0
hostB           ok       -     3     2     1     0     0     1
hostC           ok       -     32   10     9     0     1     0
hostD           ok       -     32   10     9     0     1     0
hostM        unavail     -     3     3     1     1     1     0

STATUS gives the status of sbatchd. If a host is down or its sbatchd is not up, its STATUS is 'unavail'. The JL/U column shows the maximum number of job slots a single user can use on each host at one time. MAX gives the maximum number of job slots that are configured for each host. The RUN, SSUSP, and USUSP columns display the number of job slots in use by jobs in RUN state, suspended by the system, and suspended by the user, respectively. The field RSV shows job slots that are reserved by LSF Batch for some jobs. The NJOBS field shows the sum of field RUN, SSUSP, USUSP, and RSV.

For a more detailed description of the bhosts command see 'Batch Hosts'.

Submitting a Job

To submit a job to the LSF Batch system, use the bsub command.

For example, submit the job sleep 30. This command does nothing, and takes 30 seconds to do it. The LSF administrator configures one queue to be the default job queue; if you submit a job without specifying a queue, the job goes to the default queue.

% bsub sleep 30
Job <1234> is submitted to default queue <normal>

In the above example, 1234 is the job ID assigned by LSF Batch to this job, and normal is the name of the default job queue.

Your batch job remains pending until all conditions for its execution are met. Each batch queue has execution conditions that apply to all jobs in the queue, and you can specify additional conditions when you submit the job.

The -m "host1 host2 ..." option specifies that the job must run on one of the specified hosts. By specifying a single host, you can force your job to wait until that host is available and then run on that host.

For a detailed description of the bsub command see 'Submitting Batch Jobs'.

Selecting a Job Queue

Job queues represent different job scheduling and control policies. All jobs submitted to the same queue share the same scheduling and control policy. Each job queue can use a configured subset of the server hosts in the LSF cluster; the default is to use all server hosts.

System administrators can configure job queues to control resource access by different users and types of application. Users select the job queue that best fits each job.

The bqueues command lists the available LSF Batch queues:

% bqueues
QUEUE_NAME     PRIO NICE    STATUS     MAX  JL/U JL/P NJOBS  PEND  RUN  SUSP
owners          49   10   Open:Active    -    -    -     1     0     1     0
priority        43   10   Open:Active   10    -    -     8     5     3     0
night           40   10  Open:Inactive   -    -    -    44    44     0     0
short           35   20   Open:Active   20    -    2     4     0     4     0
license         33   10   Open:Active   40    -    -     1     1     0     0
normal          30   20   Open:Active    -    2    -     0     0     0     0
idle            20   20   Open:Active    -    2    1     2     0     0     2

A dash '-' in any entry means that the column does not apply to the row. In this example some queues have no per-queue, per-user or per-processor job limits configured, so the MAX, JL/U and JL/P entries are '-'.

You can submit jobs to a queue as long as its STATUS is Open. However, jobs are not dispatched unless the queue is Active.

Tracking Batch Jobs

The bjobs command reports the status of LSF Batch jobs. The -u all option specifies that jobs for all users should be listed; the default is to list only jobs you submitted. Running jobs are listed first. Pending jobs are listed in the order in which they will be scheduled. Jobs in high priority queues are listed before those in lower priority queues.

% bjobs -u all
JOBID USER  STAT  QUEUE    FROM_HOST EXEC_HOST JOB_NAME  SUBMIT_TIME
1004  user7 RUN   short      hostA     hostA     myjob0   Dec 16 09:23
1235  user2 PEND  priority hostM               sleep 30 Dec 11 13:55
1234  user2 SSUSP normal   hostD     hostM     sleep 30 Dec 11 10:09
1250  user1 PEND  short    hostA               myjob2   Dec 11 13:59

If you also want to see jobs that finished recently, enter:

% bjobs -a

All your jobs that are still in the LSF Batch system and jobs finished recently are displayed.

The bjobs command has many other options. See 'Batch Jobs'. Also refer to the bjobs(1) manual page for a complete description.

xbsub and xlsbatch GUI Applications

You can submit your job to the LSF Batch system using the X-windows graphical user interface application xbsub as shown in Figure 3.

Figure 3. xbsub Job Submission Window

xbsub Job Submission Window

The xlsbatch command is another X-windows application for LSF Batch (Figure 4). You can use it to monitor host, job, and queue status, and control your jobs.

Figure 4. xlsbatch Main Window

xlsbatch Main Window

Both xbsub and xlsbatch have extensive on-line help available through the Help menu of each application.

xbsub can be started either directly from the command line or from xlsbatch using the 'Submit' button.


1. These numbers were invented for the example, and do not necessarily correspond to the actual performance of these systems. These values can be changed by your LSF administrator.


[Contents] [Prev] [Next] [End]

doc@platform.com

Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.