Configuring for end-to-end scheduling with fault tolerance capabilities in a SYSPLEX environment
In a configuration with a controller and no stand-by controllers, define the end-to-end server work directory in a file system mounted under either a system-specific HFS or a system-specificzFS.
Then configure the Byte Range Lock Manager (BRLM) server in a distributed form (see following considerations about BRLM). In this way the server will not be affected by the failure of other systems in the sysplex.
Having a shared HFS or zFS in a sysplex configuration means that all file systems are available to all systems participating in the shared HFS or zFS support. With the shared HFS or zFS support there is no I/O performance reduction for an HFS or zFS read-only (R/O). However, the intersystem communication (XCF) required for shared HFS or zFS might affect the response time on read/write (R/W) file systems being shared in a sysplex. For example, assume that a user on system SYS1 issued a read request to a file system owned R/W on system SYS2. Using shared HFS or zFS support, the read request message is sent via an XCF messaging function. After SYS2 receives the message, it gathers the requested data from the file and returns the data using the same request message.
In many cases, when accessing data on a system which owns a file system, the file I/O time is only the path length to the buffer manager to retrieve the data from the cache. On the contrary, file I/O to a shared HFS or zFS from a client which does not own the mount, requires additional path length to be considered, plus the time involved in the XCF messaging function. Increased XCF message traffic is a factor which can contribute to performance degradation. For this reason, it is recommended for system files to be owned by the system where the end-to end server runs.
In a configuration with an active controller and several stand-by controllers, make sure that all the related end-to-end servers running on the different systems in the Sysplex have access to the same work directory.
On z/OS® systems, the shared ZFS capability is available: all file systems that are mounted by a system participating in shared ZFS are available to all participating systems. When allocating the work directory in a shared ZFS you can decide to define it in a file system mounted under the system-specific ZFS or in a file system mounted under the sysplex root. A system-specific file system becomes unreachable if the system is not active. To make good use of the takeover process, define the work directory in a file system mounted under the sysplex root and defined as automove.
- With a central BRLM server running on one member of the sysplex and managing locks for all processes running in the sysplex.
- In a distributed form, where each system in the sysplex has its own BRLM server responsible for handling lock requests for all regular files in a file system which is mounted and owned locally (see APARs OW48204 and OW52293).
- The work directory is owned by the system to be closed
- The df –v command on OMVS displays the owners of the mounted file systems
- The system hosts the central BRLM server
- The console command DISPLAY OMVS,O can be used to display the name of the system where the BRLM runs. If the BRLM Server becomes unavailable, then the distributed BRLM is implemented. In this case the E2E server needs to be stopped only if the system which owns the work directory is stopped.
To minimize the risk of filling up the IBM Workload Scheduler internal queues while the server is down, schedule the closure of the system when the workload is low.
A separate file system data set is recommended for each stdlist directory mounted in R/W on /var/TWS/inst/stdlist, where inst varies depending on your configuration.
When you calculate the size of a file, consider that you need 10 MB for each of the following files: Intercom.msg, Mailbox.msg, pobox/tomaster.msg, and pobox/CPUDOMAIN.msg.
You need 512 bytes for each record in the Symphony, Symold, Sinfonia, and Sinfold files. Consider a record for each CPU, schedule, and job/recovery job.
You can specify the number of days that the trace files are kept on the file system using the parameter TRCDAYS in the TOPOLOGY statement.