[Contents] [Prev] [Next] [End]


Chapter 9. LSF Base Configuration Reference


This chapter contains a detailed description of the contents of the LSF Base configuration files. These include the installation file lsf.conf; the LIM configuration files lsf.shared, lsf.cluster.cluster, lsf.task, and lsf.task.cluster; and the optional LSF hosts file for additional host name information.

The lsf.conf File

Installation of and operation of LSF is controlled by the lsf.conf file. The lsf.conf file is created during installation, and records all the settings chosen when LSF is installed. This information is used by LSF daemons and commands to locate other configuration files, executables, and network services.

lsf.conf contains LSF installation settings as well as some system wide options. This file is initially created by the lsfsetup utility during LSF installation, and updated if necessary when you upgrade to a new version. Many of the parameters are set during the installation. This file can also be expended to include LSF application specific parameters.

LSB_CONFDIR

LSF Batch configuration directories are installed under LSB_CONFDIR. Configuration files for each LSF cluster are stored in a subdirectory of LSB_CONFDIR. This subdirectory contains several files that define the LSF Batch user and host lists, operation parameters, and batch queues.

All files and directories under LSB_CONFDIR must be readable from all hosts in the cluster. LSB_CONFDIR/cluster/configdir must be owned by the LSF administrator.

Default: LSF_CONFDIR/lsbatch

You should not try to redefine this parameter once LSF has been installed. If you want to move these directories to another location, use lsfsetup utility and choose Component Install option to install configuration files.

LSB_DEBUG

If this is defined, LSF Batch will run in single user mode. In this mode, no security checking is performed, so the LSF Batch daemons should not run as root. When LSB_DEBUG is defined, LSF Batch will not look in the system services database for port numbers. Instead, it uses port number 40000 for mbatchd and port number 40001 for sbatchd unless LSB_MBD_PORT/LSB_SBD_PORT are defined in the file lsf.conf. The valid values for LSB_DEBUG are 1 and 2. You should always choose 1 unless you are testing LSF Batch.

Default: undefined

LSB_MAILPROG

LSF Batch normally uses /usr/lib/sendmail as the mail transport agent to send mail to users. If your site does not use sendmail, configure LSB_MAILPROG with the name of a sendmail-compatible transport program. LSF Batch calls LSB_MAILPROG with the following arguments:

LSB_MAILPROG -F "LSF Batch system" -f Manager@host dest_addr

The -F "LSF Batch System" argument sets the full name of the sender; the -f Manager@host argument gives the return address for LSF Batch mail, which is the LSF administrator's mailbox. dest_addr is the destination address, generated by the rules given for LSB_MAILTO above.

LSB_MAILPROG must read the body of the mail message from the standard input. The end of the message is marked by end-of-file. Any program or shell script that accepts the arguments and input and delivers the mail correctly can be used. LSB_MAILPROG must be executable by any user.

If this parameter is modified, the LSF administrator must restart the sbatchd daemons on all hosts to pick up the new value.

Default: /usr/lib/sendmail

LSB_MAILTO

LSF Batch sends electronic mail to users when their jobs complete or have errors, and to the LSF administrator in the case of critical errors in the LSF Batch system. The default is to send mail to the user who submitted the job, on the host where the daemon is running; this assumes that your electronic mail system forwards messages to a central mailbox.

The LSB_MAILTO parameter changes the mailing address used by LSF Batch. LSB_MAILTO is a format string that is used to build the mailing address. The substring !U, if found, is replaced with the user's account name; the substring !H is replaced with the name of the submission host. All other characters (including any other '!') are copied exactly. Common formats are:

!U
Mail is sent to the submitting user's account name on the local host.
!U@!H
Mail is sent to user@submission_hostname
!U@company_name.com
Mail is sent to user@company_name.com

If this parameter is modified, the LSF administrator must restart the sbatchd daemons on all hosts to pick up the new value.

Default: !U

LSB_SHAREDIR

LSF Batch keeps job history and accounting log files for each cluster. These files are necessary for correct operation of the system. Like the organization under LSB_CONFDIR, there is one subdirectory for each cluster.

The LSB_SHAREDIR/cluster/logdir directory must be owned by the LSF administrator.

Default: LSF_INDEP/work

Note
All files and directories under LSB_SHAREDIR must allow read and write access from the LSF master host. See 'Fault Tolerance' and 'Using LSF without Shared File Systems'.

LSF_AFS_CELLNAME

This must be defined to AFS cell name if the AFS file system is in use.

Default: undefined

LSF_AUTH

This is an optional definition. By default, LSF uses privileged ports for user authentication.

If LSF_AUTH is defined as ident, RES uses the RFC 1413 identification protocol to verify the identity of the remote user. RES is also compatible with the older RFC 931 authentication protocol. The name, ident, must be registered in the system services database. See 'Registering LSF Service Ports' for instructions on registering service names.

If LSF_AUTH is defined to be eauth, external user authentication is used. See 'External Authentication' for details.

If LSF_AUTH is not defined, LSF commands must be installed setuid to root to operate correctly. If the LSF commands are installed in an NFS mounted shared file system, the file system must be mounted with setuid execution allowed (that is, without the nosuid option). See the manual page for mount for more details.

If LSF_AUTH is defined, programs need not be setuid. In this case the installation mode should be 0755.

If this parameter is changed, all the LSF daemons must be shut down and restarted by running lsf_daemons start on each of the LSF server hosts so that the daemons will use the new authentication method.

Default: privileged port authentication

LSF_BINDIR

Directory where all user commands are installed.

Default: LSF_MACHDEP/bin

LSF_CONFDIR

The directory where all LIM configuration files are installed. These files are shared throughout the system and should be readable from any host. This directory can contain configuration files for more than one cluster.

Default: LSF_INDEP/conf

LSF_ENVDIR

LSF normally installs the lsf.conf file in the /etc directory. lsf.conf is installed by creating a shared copy in LSF_SERVERDIR and adding a symbolic link from /etc/lsf.conf to the shared copy. If LSF_ENVDIR is set, the symbolic link is installed in LSF_ENVDIR/lsf.conf.

Default: /etc

LSF_INCLUDEDIR

Directory under which the LSF API header file <lsf/lsf.h> is installed.

Default: LSF_INDEP/include

LSF_INDEP

Specifies the default top-level directory for all host type independent LSF files. This includes manual pages, configuration files, working directories, and examples. For example, defining LSF_INDEP as /usr/local/lsf places manual pages in /usr/local/lsf/man, configuration files in /usr/local/lsf/conf, and so on.

Default: /usr/local/lsf

LSF_LIBDIR

Directory where the LSF application programming interface library liblsf.a is installed.

Default: LSF_MACHDEP/lib

LSF_LICENSE_FILE

The full path name of the FLEXlm license file used by LSF. If this variable is not defined, the LIM looks for the license in /usr/local/flexlm/licenses/license.dat.

Default: LSF_CONFDIR/license.dat

LSF_LIM_DEBUG

If LSF_LIM_DEBUG is defined, the Load Information Manager (LIM) will operate in single user mode. No security checking is performed, so LIM should not run as root. LIM will not look in the services database for the LIM service port number. Instead, it uses port number 36000 unless LSF_LIM_PORT has been defined. The valid values for LSF_LIM_DEBUG are 1 and 2. You should always choose 1 unless you are testing LSF.

Default: undefined

LSF_LIM_PORT,
LSF_RES_PORT,
LSB_MBD_PORT,
LSB_SBD_PORT

Internet port numbers to use for communication with the LSF daemons. The port numbers are normally obtained by looking up the LSF service names in the /etc/services file or the services YP map. If it is not possible to modify the service database, these variables can be defined to set the port numbers.

With careful use of these settings along with the LSF_ENVDIR and PATH environment variables, it is possible to run two versions of the LSF software on a host, selecting between the versions by setting the PATH environment variable to include the correct version of the commands and the LSF_ENVDIR environment variable to point to the directory containing the appropriate lsf.conf file.

Default: get port numbers from services database

LSF_LOGDIR

This is an optional definition.

If LSF_LOGDIR is defined, error messages from all servers are logged into files in this directory. If a server is unable to write in this directory, then the error logs are created in /tmp.

If LSF_LOGDIR is not defined, then syslog is used to log everything to the system log using the LOG_DAEMON facility. The syslog facility is available by default on most UNIX systems. The /etc/syslog.conf file controls the way messages are logged, and the files they are logged to. See the manual pages for the syslogd daemon and the syslog function for more information.

Default: log messages go to syslog

LSF_LOG_MASK

The syslog(3) log level for LSF daemons. This definition applies both if the LSF daemons are logging messages to syslog and if the daemons are logging to files. All messages logged at the specified level or higher are recorded; lower level messages are discarded. The log levels in order from highest to lowest are:

Most important LSF log messages are at the LOG_ERR or LOG_WARNING level. Messages at the LOG_INFO and LOG_DEBUG level are only useful for debugging.

Default: LOG_WARNING

LSF_MACHDEP

Specifies the directory where host type dependent files are installed. In clusters with a single host type, LSF_MACHDEP is usually the same as LSF_INDEP. The machine dependent files are the user programs, daemons, and libraries.

Default: /usr/local/lsf

LSF_MANDIR

Directory under which all manual pages are installed. The manual pages are placed in the man1, man3, man5, and man8 subdirectories of the LSF_MANDIR directory.

Default: LSF_INDEP/man

Note
Manual pages are installed in a format suitable for BSD style man commands.

LSF_MISC

Directory where miscellaneous machine independent files such as LSF example source programs and scripts are installed.

Default: LSF_CONFDIR/misc

LSF_RES_ACCT

If defined, RES will log task information by default (see lsf.acct(5)). If this parameter is not defined, the LSF administrator must use the lsadmin command (see lsadmin(8)) to turn task logging on after the RES has started up. A CPU time (in msec) can be specified for the value for this parameter; only tasks that have consumed more than the specified CPU time will be logged. If it is defined as LSF_RES_ACCT=0, all tasks will be logged.

Default: undefined

LSF_RES_ACCTDIR

The directory where the RES task log file is stored. If LSF_RES_ACCTDIR is not defined, log file is stored in the /tmp directory.

Default: /tmp

LSF_RES_DEBUG

If LSF_RES_DEBUG is defined, the Remote Execution Server (RES) will operate in single user mode. No security checking is performed, so RES should not run as root. RES will not look in the services database for the RES service port number. Instead, it uses port number 36002 unless LSF_RES_PORT has been defined. The valid values for LSF_RES_DEBUG are 1 and 2. You should always choose 1 unless you are testing RES.

Default: undefined

LSF_ROOT_REX

This is an optional definition.

If LSF_ROOT_REX is defined, RES accepts requests from the superuser (root) on remote hosts, subject to identification checking. If LSF_ROOT_REX is undefined, remote execution requests from user root are refused. Sites that have separate root accounts on different hosts within the cluster should not define LSF_ROOT_REX. Otherwise, this setting should be based on local security policies. If the value of this parameter is defined to 'all', then root remote execution across cluster is enabled. This applies to LSF MultiCluster only. Setting LSF_ROOT_REX to any other values only enables root remote execution within the local cluster.

Default: undefined - Root execution is not allowed

LSF_SERVERDIR

Directory where all server binaries are installed. These include lim, res, nios, sbatchd, mbatchd, and eeventd (for LSF JobScheduler only). If you use elim, eauth, eexec, esub, etc, they should also be installed in this directory.

Default: LSF_MACHDEP/etc

LSF_SERVER_HOSTS

This defines one or more LSF server hosts that the application should contact to get in touch with a Load Information Manager (LIM). This is used on client hosts where no LIM is running on the local host. The LSF server hosts are hosts that run LSF daemons and provide loading sharing services. Client hosts are hosts that only run LSF commands or applications but do not provide services to any hosts.

If LSF_SERVER_HOSTS is not defined, the application tries to contact the LIM on the local host. See 'Setting Up LSF Client Hosts' for more details about server and client hosts.

The host names in LSF_SERVER_HOSTS must be enclosed in quotes and separated by white space; for example:

LSF_SERVER_HOSTS="hostA hostD hostB"

Default: undefined

LSF_STRIP_DOMAIN

This is an optional definition.

If all the hosts in your cluster can be reached using short host names, you can configure LSF to use the short host names by specifying the portion of the domain name to remove. If your hosts are in more than one domain, or have more than one domain name, you can specify more than one domain suffix to remove, separated by a colon ':'.

For example, given this definition of LSF_STRIP_DOMAIN:

LSF_STRIP_DOMAIN=.foo.com:.bar.com

LSF accepts hostA, hostA.foo.com, and hostA.bar.com as names for host hostA, and uses the name hostA in all output. The leading period '.' is required.

Default: undefined

LSF_USE_HOSTEQUIV

This is an optional definition.

If LSF_USE_HOSTEQUIV is defined, RES and mbatchd call the ruserok(3) function to decide if a user is allowed to run remote jobs. If LSF_USE_HOSTEQUIV is not defined, all normal users in the cluster can execute remote jobs on any host. If LSF_ROOT_REX is set, root can also execute remote jobs with the same permission test as for normal users.

Default: undefined

XLSF_APPDIR

The directory where X application defaults files for LSF products are installed. The LSF commands that use X look in this directory to find the application defaults. Users do not need to set environment variables to use the LSF X applications. The application defaults files are platform independent.

Default: LSF_INDEP/misc

XLSF_UIDDIR

The directory where Motif User Interface Definition files are stored. These files are platform specific.

Default: LSF_LIBDIR/uid

LSF_RES_RLIMIT_UNLIM

By default, the RES sets the hard limits for a remote task to be the same as the hard limits of the local process. This parameter specifies those hard limits which are to be set to unlimited, instead of inheriting those of the local process. Valid values are cpu, fsize, data, stack, core, and vmem, for cpu, file size, data size, stack, core size, and virtual memory limits, respectively.

For example:

LSF_RES_RLIMIT_UNLIM="cpu core stack"

will set the cpu, core size, and stack hard limits to be unlimited for all remote tasks.

Default: undefined

The lsf.shared File

The lsf.shared file contains definitions that are used by all load sharing clusters. This includes lists of cluster names, host types, host models, the special resources available, and external load indices.

Clusters

The mandatory Cluster section defines all cluster names recognized by the LSF system, with one line for each cluster.

The ClusterName keyword is mandatory. All cluster names referenced anywhere in the LSF system must be defined here. The file names of cluster-specific configuration files must end with the associated cluster name.

Begin Cluster
ClusterName
clus1
clus2
End Cluster

Host Types

The mandatory HostType section lists the valid host type names in the cluster. Each host is assigned a host type in the lsf.cluster.cluster file. All hosts that can run the same binary programs should have the same host type, even if they have different models of processor. LSF uses the host type as a default requirement for task placement. Unless specified otherwise, jobs are always run on hosts of the same type.

The TYPENAME keyword is mandatory. Host types are usually based on a combination of the hardware name and operating system. For example, a DECStation system runs the ULTRIX operating system and has a MIPS CPU, so you assign the host type UMIPS. If your site already has a system for naming host types, you can use the same names for LSF.

Begin HostType
TYPENAME
SUN41
UMIPS
ALPHA
HPPA
End HostType

Host Models

The mandatory HostModel section lists the various models of machines and gives the relative CPU speed for each model. LSF uses the relative CPU speed to normalize the CPU load indices so that jobs are more likely to be sent to faster hosts. The MODELNAME and CPUFACTOR keywords are mandatory.

It is up to you to identify the different host models in your system, but generally you need to identify first the distinct host types, such as MIPS and SPARC, and then the machine models within each, such as SparcIPC, Sparc1, Sparc2, and Sparc10.

Though it is not required, you would typically assign a CPU factor of 1.0 to the slowest machine model in your system, and higher numbers for the others. For example, for a machine model that executes at twice the speed of your slowest model, a factor of 2.0 should be assigned.

Begin HostModel
MODELNAME  CPUFACTOR
SparcIPC   1.0
Sparc10    2.0
End HostModel

The CPU factor affects the calculation of job execution time limits and accounting. Using large values for the CPU factor can cause confusing results when CPU time limits or accounting are used. See 'Resource Limits' for more information.

Resources

The optional Resource section contains a list of boolean resource names. Boolean resource names are character strings chosen by the LSF administrator. You can use any name other than the reserved resource names. The keywords RESOURCENAME and DESCRIPTION are mandatory.

For a more general discussion of boolean resources see the 'Resources' chapter of the LSF User's Guide.

Boolean resource names must be strings of numbers and letters, beginning with a letter and no more than 29 characters long. You can define up to 32 boolean resource names in lsf.shared.

This sample Resource section defines boolean resources to represent processor types, operating system versions, and software licenses:

Begin Resource
RESOURCENAME  DESCRIPTION
sparc         (Sparc CPU)
sunos4        (Running SunOS 4.x)
solaris       (Running Solaris 2.x)
frame         (FrameMaker license)
End Resource

External Load Indices

LSF supports hundreds of site-specific load indices, defined in an optional NewIndex section. The load levels for these indices are obtained from an External Load Information Manager process, or ELIM. External load indices and the ELIM are described in 'Changing LIM Configuration'. The four mandatory keywords in the NewIndex section are:

NAME
The name of the load index. The name is displayed by the lsinfo and lsload commands, and used in resource requirement strings and to define load thresholds. External load index names have the same restrictions as boolean resource names; the name is a string of letters and numbers, beginning with a letter. The maximum length of an external load index name is 29 characters.
This parameter is required.

Note
The name of the load index must not be one of the resource name aliases cpu, idle, logins, or swap. To override one of these indices, use its formal name: r1m, it, ls, or swp. Otherwise, use another name.

INTERVAL
The time interval (in seconds) between updates of this load index. The interval is reported by the ls_info() function and the lsinfo -l command. Note that the actual update interval is controlled by the external load information program. This number is for user information only.
This parameter is optional.
INCREASING
If INCREASING=Y, the load level is higher when the number is higher. When the load index is above the threshold, the host is considered busy. For example, r1m (1 minute average CPU run queue length) is an increasing load index.
If INCREASING=N, the load level is higher when the number is lower. When the load index is below the threshold, the host is considered busy. For example, tmp (free space on the /tmp file system) is a decreasing load index. The less free space, the busier the host.
This parameter is required.
DESCRIPTION
The description of the load index to be displayed by the lsinfo command.
This parameter is optional but is highly recommended.

The example NewIndex section below shows a user-defined index to report the space available on the /usr/tmp file system, in megabytes:

Begin NewIndex
NAME  INTERVAL  INCREASING  DESCRIPTION
usrtmp      90           N  (Disk space in /usr/tmp in megabytes)
End NewIndex

The lsf.cluster.cluster File

This is the load sharing cluster configuration file. There is one such file for each load sharing cluster in the system. The cluster suffix must agree with the name defined in the Cluster section of the lsf.shared file.

Parameters

The Parameters section is optional. This section contains miscellaneous parameters for the LIM.

FEATURES

The FEATURES line specifies which LSF component(s) will be enabled in the cluster. The FEATURES line can specify any combination of the strings 'lsf_base', 'lsf_batch', 'lsf_js', and 'lsf_mc' to enable the operation of LSF Base, LSF Batch, LSF JobScheduler and LSF MultiCluster, respectively. If any of 'lsf_batch', 'lsf_js', or 'lsf_mc' are specified then 'lsf_base' is automatically enabled as well. Specifying the FEATURES line enables the feature for all hosts in the cluster. Individual hosts can be configured to run as LSF Batch servers or LSF JobScheduler servers within the same cluster. LSF MultiCluster is either enabled or disabled for multicluster operation for the entire cluster.

The FEATURES line is created automatically by the installation program lsfsetup. For example:

Begin Parameters
FEATURES=lsf_base lsf_batch
End Parameters

If the FEATURES line is not specified, the default is to enable the operation of 'lsf_base' and 'lsf_batch'.

Note
The features defined by the FEATURES line must match the license file used to serve the cluster. A host will be unlicensed if the license is unavailable for the component it was configured to run. For example, if you configure a cluster to run LSF JobScheduler on all hosts, and the license file does not contain the LSF JobScheduler feature, than the hosts will be unlicensed, even if there are licenses for LSF Base or LSF Batch

Default: lsf_base lsf_batch

ELIMARGS

The ELIMARGS parameter specifies any necessary command line arguments for the external LIM. This parameter is ignored if no external load indices are configured.

Default: none

EXINTERVAL

The time interval in seconds at which the LIM daemons exchange load information. On extremely busy hosts or networks, load may interfere with the periodic communication between LIM daemons. Setting EXINTERVAL to a longer interval can reduce network load and slightly improve reliability, at the cost of slower reaction to dynamic load changes.

Default: 15 seconds

ELIM_POLL_INTERVAL

The time interval in seconds in which the LIM daemon samples load information. This parameter only needs to be set if an ELIM is being used to report information more frequently than every 5 seconds.

Default: 5 seconds.

HOST_INACTIVITY_LIMIT

An integer reflecting a multiple of EXINTERVAL. This parameter controls the maximum time a slave LIM will take to send its load information to the master LIM as well as the frequency at which the master LIM will send a heartbeat message to its slaves. A slave LIM can send its load information any time from EXINTERVAL to (HOST_INACTIVITY_LIMIT-2)*EXINTERVAL seconds. A master LIM will send a master announce to each host at least every EXINTERVAL*HOST_INACTIVITY_LIMIT seconds.

Default: 5

MASTER_INACTIVITY_LIMIT

An integer reflecting a multiple of EXINTERVAL.

A slave will attempt to become master if it does not hear from the previous master after (HOST_INACTIVITY_LIMIT+hostNo*MASTER_INACTIVITY_LIMIT)*EXINTERVAL seconds where hostNo is the position of the host in the lsf.cluster.cluster file.

Default: 2

PROBE_TIMEOUT

Before taking over as the master, a slave LIM will try to connect to the last known master via TCP. This parameter specifies the time-out in seconds to be used for the connect(2) system call.

Default: 2 seconds

RETRY_LIMIT

An integer reflects a multiple of EXINTERVAL. This parameter controls the number of retries a master (slave) LIM makes before assuming the slave (master) is unavailable. If the master does not hear from a slave for HOST_INACTIVITY_LIMIT exchange intervals, it will actively poll the slave for RETRY_LIMIT exchange intervals before it will declare the slave as unavailable. If a slave does not hear from the master for HOST_INACTIVITY_LIMIT exchange intervals, it will actively poll the master for RETRY_LIMIT intervals before assuming the master is down.

Default: 2

LSF Administrators

The ClusterAdmins section defines the LSF administrator(s) for this cluster. Both UNIX user and group names may be specified with the ADMINISTRATORS keyword. The LIM will expand the definition of a group name using the getgrnam(3) call. The first administrator of the expanded list is considered the primary LSF administrator. The primary administrator is the owner of the LSF configuration files, as well as the working files under LSB_SHAREDIR/cluster. If the primary administrator is changed, make sure the owner of the configuration files and the files under LSB_SHAREDIR/cluster are changed as well. All LSF administrators have the same authority to perform actions on LSF daemons, jobs, queues, or hosts in the system.

For backwards compatibility, ClusterManager and Manager are synonyms for ClusterAdmins and ADMINISTRATOR respectively. It is possible to have both sections present in the same lsf.cluster.cluster file to allow daemons from different LSF versions to share the same file.

If this section is not present, the default LSF administrator is root. For flexibility, each cluster may have its own LSF administrator(s), identified by a user name, although the same administrator(s) can be responsible for several clusters.

The ADMINISTRATOR parameter is normally set during the installation procedure.

Use the -l option of the lsclusters command to display all the administrators within a cluster.

The following gives an example of a cluster with three LSF administrators. The user listed first, user2, is the primary administrator.

Begin ClusterAdmins
ADMINISTRATORS = user2 lsfgrp user7
End ClusterAdmins

Hosts

The Host section is the last section in lsf.cluster.cluster and is the only required section. It lists all the hosts in the cluster and gives configuration information for each host.

The order in which the hosts are listed in this section is important. The LIM on the first host listed becomes the master LIM if this host is up; otherwise, that on the second becomes the master if its host is up, and so on.

Since the master LIM makes all placement decisions for the cluster, you want it on a fast machine. Also, to avoid the delays involved in switching masters if the first machine goes down, you want the master to be on a reliable machine. It is desirable to arrange the list such that the first few hosts in the list are always in the same subnet. This avoids the situation where the second host takes over the master when there are communication problems between subnets.

Configuration information is of two types. Some fields in a host entry simply describe the machine and its configuration. Other fields set thresholds for various resources. Both types are listed below.

Descriptive Fields

The HOSTNAME, model, type, and RESOURCES fields must be defined in the Host section. The server, nd, RUNWINDOW, and REXPRI fields are optional.

HOSTNAME
The official name of the host as returned by hostname(1). Must be listed in lsf.shared as belonging to this cluster.
model
Host model. Must be one of those defined in the lsf.shared file. This determines the CPU speed scaling factor applied in load and placement calculations.
type
A host type as defined in the HostType section of lsf.shared. The strings used for host types are decided by the system administrator. For example, SPARC, DEC, HPPA, etc. The host type is used to identify binary-compatible hosts.
The host type is used as the default resource requirement. That is, if no resource requirement is specified in a placement request then the task is run on a host of the same type as the sending host.
Often one host type can be used for many machine models. For example, the host type name SUN41 might be used for any computer with a SPARC processor running SunOS 4.1. This would include many Sun models and quite a few from other vendors as well.
server
If server is set to 0, the host is an LSF client. Client hosts do not run the LSF daemons. Client hosts can submit interactive and batch jobs to an LSF cluster, but cannot execute jobs sent from other hosts. It is set to 1 if the host can receive jobs from other hosts. If this field is not defined, then the default is 1.
nd
The number of local disks. This corresponds to the ndisks static resource. On most host types LSF automatically determines the number of disks, and the nd parameter is ignored.
nd should only count local disks with file systems on them. Do not count either disks used only for swapping or disks mounted with NFS.
Default: the number of disks determined by the LIM, or 1 if the LIM cannot determine this.
RESOURCES
Boolean resources available on this host. The resource names are strings defined in the Resource section of the file lsf.shared. You may list any number of resources, enclosed in parentheses and separated by blanks or tabs. For example: (fs frame hpux)
RUNWINDOW
Dispatch window during which the LIM recommends this host for task execution. When the host is not available for remote execution, the host status is lockW (locked by run window). LIM does not schedule interactive tasks on hosts locked by dispatch windows. Note that LSF Batch uses its own (optional) host dispatch windows to control batch job processing on batch server hosts.
A dispatch window consists of one or more time windows. See 'Time Windows' for a description of the format of time window specifications.
Default: always accept remote jobs.
REXPRI
The default execution priority for interactive remote jobs run under the RES. Range: -20 to 20. REXPRI corresponds to the BSD style nice value used for remote jobs. For hosts with System V style nice values with the range 0 - 39, a REXPRI of -20 corresponds to a nice value of 0 and +20 corresponds to 39. Higher values of REXPRI correspond to lower execution priority; -20 gives the highest priority, 0 is the default priority for login sessions, and +20 is the lowest priority.
Default: 0.

Threshold Fields

The LIM uses these thresholds in determining whether to place remote jobs on a host. If one or more LSF load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy and LIM will not recommend jobs to that host.

Note
The CPU run queue length threshold values (r15s, r1m, and r15m) are taken as effective queue lengths as reported by lsload -E.

All of these fields are optional; you only need to configure thresholds for load indices you wish to use for determining whether hosts are busy. Fields that are not configured are not considered when determining host status.

Thresholds can be set for any load index supported internally by the LIM, and for any external load index (see 'External Load Indices').

This example Host section contains descriptive and threshold information for two hosts.

Begin Host
HOSTNAME model    type   server r1m pg tmp RESOURCES     RUNWINDOW
hostA    SparcIPC Sparc      1  3.5 15   0 (sunos)       ()
hostD    Sparc10  Sparc      1  3.5 15   0 (sunos frame) (18:00-08:00)
End Host

The lsf.task and lsf.task.cluster Files

These two files are optional, but at least one of them should be present. The lsf.task file applies to all load sharing clusters. The lsf.task.cluster file applies only to the named cluster. These files define the default resource requirements for commonly used commands.

Users can also have their own lists defined in the file .lsftask in their home directory. That file is also optional and uses a similar format.

LSF combines the lists from these files before using them; any task listed in any of these files is on the combined lists. In the event of conflict (for example, if a task is listed as remotely executable in one file but as local in another), the most specific file wins: cluster-specific data from lsf.task.cluster overrides system-wide data from lsf.task, and user-specific data from $HOME/.lsftask overrides both.

Each file contains two lists of task names, LocalTasks and RemoteTasks, to indicate which tasks should be executed locally and remotely, respectively. The following sections describe the two lists:

Local Tasks

The local task list is typically short, containing tasks that depend on local resources (for example, ps and hostname) or tasks that are too small to repay the overhead of setting up remote connections (for example, date).

Begin LocalTask
who
uptime
date
ls
End LocalTask

Remote Tasks

The remote task list is more complex than the local list. It not only names tasks but also describes their resource requirements so that they may be properly placed.

Often this list is quite small when LSF is first installed. It may be, and usually is, expanded as you gain experience using load sharing.

Proper configuration of the remote task list is important for correct operation of LSF. For example, you do not want to send a compile job to the wrong host and produce a binary that does not run on the machine you need it on.

The remote task list also plays an important role in managing network resource accessibility. For example, suppose the program maple is available only on some of the SUN hosts. It can be transparently invoked from any host in the system if the following entry is added to the RemoteTasks section:

maple/mserver

This says that maple should be executed remotely on a machine that has the mserver resource configured in the lsf.cluster.cluster file. That file should indicate the resource only for SUN hosts with maple installed. The file lsf.shared must define the mserver resource name for this to work.

Once the files lsf.task, lsf.cluster.cluster, and lsf.shared all have the correct entries, all the user needs to do is enter maple and the program automatically runs on an appropriate host.

For details about the task file format, see lsrtasks(1). This example shows remote task list entries for four common commands:

Begin RemoteTasks
cc/order[cpu:mem]
"troff/type==any order[cpu]"
"compress/type==any order[cpu:mem]"
"latex/type==any order[cpu:mem]"
End RemoteTasks

The hosts File

If your LSF clusters include hosts that have more than one interface and are configured with more than one official host name, you must either modify the host name configuration or create a private hosts file for LSF to use. The LSF hosts file is stored in LSF_CONFDIR. The format of LSF_CONFDIR/hosts is the same as for the /etc/hosts file.

For every host that has more than one official name, you must duplicate the hosts database information except that all entries for the host should use the same official name. Configure all the other names for the host as aliases so that people can still refer to the host by any name. For example, if your /etc/hosts file contains:

AA.AA.AA.AA  host-AA host # first interface
BB.BB.BB.BB  host-BB      # second interface

then the LSF_CONFDIR/hosts file should contain:

AA.AA.AA.AA  host host-AA # first interface
BB.BB.BB.BB  host host-BB # second interface

The LSF hosts file should only contain entries for host with more than one official name. All other hosts names and addresses are resolved using the default method for your hosts. See 'Host Naming' for a detailed discussion of official host names.

The lsf.sudoers File

This file allows a list of permitted users to perform certain privileged operations in the LSF cluster as either the superuser or any other designated user. This file is optional.

The lsf.sudoers file must be located in /etc and it must be owned by root.

The format of this file is very similar to that of the lsf.conf file (see 'The lsf.conf File'). Each line of the file is a NAME=VALUE statement, where NAME describes an authorized operation and VALUE is a single string or multiple strings enclosed in quotes. Lines starting with '#' are comments and are ignored.

The currently recognized variables in this file include:

LSF_STARTUP_USERS
By default, the superuser is the only user who can startup the LSF as root. This parameter is used to enable a list of specified users to start up LSF daemons as root using LSF administrative commands lsadmin and badmin.
Note that lsadmin and badmin must be installed as setuid root programs for this to work. Possible values for this variable include:
all_admins
This allows all LSF administrators configured in the lsf.cluster.cluster file to start up LSF daemons as root by running the lsadmin and badmin commands.
user1 user2 ...
This allows listed user(s) to perform the startup operations. If this list contains more than one user, it must be enclosed with quotes. For example:
LSF_STARTUP_USERS="user1 user2".

CAUTION!
Defining LSF_STARTUP_USERS as all_admins incurs some security risk because administrators can be configured by a primary LSF administrator who is not root. You should explicitly list the login names of all authorized administrators here so that you have full control of who can start daemons as root.

LSF_STARTUP_PATH
The absolute pathname of the directory where the server binaries, namely, lim, res, sbatchd, and mbatchd are installed. This is normally LSF_SERVERDIR as defined in your lsf.conf file. LSF will allow the users defined in LSF_STARTUP_USERS to start the daemons installed in the LSF_STARTUP_PATH directory as root.
Both LSF_STARTUP_USERS and LSF_STARTUP_PATH must be defined for this feature to work.
LSB_PRE_POST_EXEC_USER
This parameter defines the authorized user for the LSF Batch queue level pre-execution and post-execution commands. These commands can be configured at the queue level by the LSF administrator. If LSB_PRE_POST_EXEC_USER is defined, the queue level pre-execution and post-execution commands will be run as the user defined. If this parameter is not defined, the commands will be run as the user who submitted the job. In particular, you can define this variable if you need to run commands as root.
See 'Pre- and Post-execution Commands' for details of pre-execution and post-execution.
You can only define a single user name in this parameter.
LSF_EAUTH_USER
This defines the user name to run the external authentication executable, eauth. If this is parameter is not defined, then eauth will be run as the primary LSF administrator. See 'External Authentication' for an explanation of external authentication.
LSF_EEXEC_USER
This defines the user name to run the external execution command, eexec. If this parameter is not defined, then eexec will be run as the user who submitted the job. See 'External Submission and Execution Executables' for an explanation of external execution.


[Contents] [Prev] [Next] [End]

doc@platform.com

Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.