End-to-end scheduling with fault tolerance capabilities architecture

End-to-end scheduling with fault tolerance capabilities architecture.

The end-to-end scheduling with fault tolerance capabilities is implemented by directly connecting one or more IBM Workload Scheduler agents or domain managers to IBM Z Workload Scheduler.

The domain managers function as the broker systems for the entire distributed network by resolving all dependencies for their subordinate managers and agents. They send their updates (in the form of events) to IBM Z Workload Scheduler so that it can update the plan accordingly.

IBM Z Workload Scheduler handles its own jobs and notifies the domain managers of all the status changes of the IBM Z Workload Scheduler jobs that involve the IBM Workload Scheduler plan.

In this configuration the domain managers, and all the distributed agents, recognize IBM Z Workload Scheduler as the master domain manager and notify it of all the changes occurring in their own plan.

Also, the agents are not permitted to interfere with the IBM Z Workload Scheduler jobs, because they are viewed as running on the master that is the only node that manages them.

Example of end-to-end with fault tolerance capabilities configuration shows an example of a fault-tolerant end-to-end configuration. It also describes the distribution of the Symphony file from the master domain manager to directly connected agents and to domain managers and their subordinate agents.

Figure 1. Example of end-to-end with fault tolerance capabilities configuration

Figure showing an example of end-to-end with fault tolerance capabilities configuration. It also describes the flow of data from the master domain manager to the distributed agents through the domain managers.
Before the start of a production period, the master domain manager creates a production control file, named Symphony. This file contains:
  • Jobs to be run on IBM Workload Scheduler distributed agents
  • z/OS® (mainframe) jobs that are predecessors to IBM Workload Scheduler distributed jobs
  • Job streams that have at least one job in the Symphony file

IBM Workload Scheduler is then restarted in the network, and the master domain manager sends a copy of the new Symphony file to each of its agents and subordinate domain managers. The domain managers, in turn, send copies to their agents and subordinate domain managers. This enables the distributed agents throughout the network to continue processing even if the network connection to their domain managers is down.

Stopping the network is necessary to allow for the generation and distribution of the new Symphony file. Optionally, you can configure the end-to-end environment with fault tolerance capabilities to reduce the downtime of each agent to that strictly necessary to receive the new plan by gradually stopping the network. For details, see the POSTPONE keyword of the TOPOLOGY statement.

The end-to-end process is viewed by IBM Z Workload Scheduler as a subtask with a task ID named TWS. It is activated by the TPLGYSRV keyword of the OPCOPTS statement. It handles events to and from fault-tolerant workstations (using the IBM Z Workload Scheduler server).