Recovery procedure on a fault-tolerant agent or lower domain manager
If the Symphony file is corrupt on a lower level domain manager, or on a fault-tolerant agent, it can be replaced.
Complete removal and replacement of the Symphony file causes some loss of data. The following procedure minimizes that loss and indicates what is lost.
The procedure involves two agents, the agent where the Symphony file
is corrupt and its domain manager.
Note: Where the agent is a top level domain manager (below
the master), or a fault-tolerant agent in
the master domain, the manager is the master domain manager.
The procedure is as follows:
- On the domain manager, unlink the agent which is having the Symphony file problem.
- On the agent do the following:
- Stop the agent if it has not yet failed. You do not need to shut it down.
- Delete the Symphony and the Sinfonia files from the agent workstation. Alternatively you can move them to a different location on the agent workstation, or rename them.
- On the domain manager do
the following:
- Back up the Sinfonia file if you want to be able to restore the original situation after completion. This is not an obligatory step, and no problems have been reported from not performing it.
- Ensure that no agent is linking with the domain manager, optionally stopping the domain manager agent.
- Copy the Symphony file on the domain manager to the Sinfonia file, replacing the existing version.
- Restart the domain manager agent if necessary.
- Link the agent and wait for the Symphony file to copy from the domain manager to the agent. The agent automatically starts.
- Optionally restore the Sinfonia file from the backup you took in step 3.a. This restores the original situation, but with the agent now having an uncorrupted Symphony file. This is not an obligatory step, and no problems have been reported from not performing it.
Following this procedure some information is lost, in particular,
the contents of the Mailbox.msg message and the tomaster.msg message
queues. If state information about a job was contained in those queues,
such that the Symphony file on the domain manager was
not updated by the time the Sinfonia file is
replaced (step 3.c), that
job is rerun. To avoid that event, add these steps to the procedure
immediately before step 3.a:
- Make a list of jobs that ran recently on the agent.
- At the domain manager, change their states to either SUCC or ABEND,
or even cancel them on the domain manager. Note: if you set the states of jobs to SUCC, or cancel them, any successor jobs would be triggered to start. Ensure that this is the acceptable before performing this action.