Runtime Fault Handling with the Fault Management Framework

Fault handling allows a SOA suite component to handle error situations caused by outside web services. The error situations can be both business (e.g. invalid data value) and runtime faults (service unavailable). I’m aiming to handle business fault as much as possible in the composite (catch) while handle runtime faults outside the composite.
In the remaining of this blog I will describe an implementation of the Fault Management Framework to handle runtime faults.

I have implemented the following policy:
1) RemoteFault (invocation of a service fails)

  • Start a retry cycle
    standaard retryCount:                 5
    standaard retryInterval:              300 (seconden)
    standaard exponentialBackoff:   2
    Retry will take place after 5, 10, 20, 40 en 80 minutes.
  • If it still fails, start a human intervention

2) All other unhandled (runtime) faults

  • Start a human intervention

The specification of these fault policy is located in the Fault Policy Files.
The fault policy files are loaded at startup, so when any changes are made to them a server restart is required. The two fault policy files we’re  using are stored in the MDS. To use them a reference to them is required in the composite.xml file.  For this, add the following properties:

image1

Policy Files

The content of the policy files is show below:

Fault-binding.xml

<?xml version="1.0" encoding="ISO-8859-1"?>
<faultPolicyBindings version="1.0.0"
    xmlns:xsi=<a href="http://www.w3.org/2001/XMLSchema-instance">http://www.w3.org/2001/XMLSchema-instance</a>
    xsi:schemaLocation=<a href="http://schemas.oracle.com/bpel/faultpolicy xsd/fault-bindings.xsd">http://schemas.oracle.com/bpel/faultpolicy xsd/fault-bindings.xsd</a>
    xmlns="http://schemas.oracle.com/bpel/faultpolicy">
    <composite faultPolicy="MOA-FaultPolicy"/>
</faultPolicyBindings>

Fault-policies.xml

<?xml version="1.0" encoding="ISO-8859-1"?>
<faultPolicies
    xmlns:xsi=<a href="http://www.w3.org/2001/XMLSchema-instance">http://www.w3.org/2001/XMLSchema-instance</a>
    xsi:schemaLocation=<a href="http://schemas.oracle.com/bpel/faultpolicy xsd/fault-policies.xsd">http://schemas.oracle.com/bpel/faultpolicy xsd/fault-policies.xsd</a>
    xmlns="http://schemas.oracle.com/bpel/faultpolicy">

    <faultPolicy version="1.0.0" id="MOA-FaultPolicy"
                 xmlns:env=<a href="http://schemas.xmlsoap.org/soap/envelope/">http://schemas.xmlsoap.org/soap/envelope/</a>
                 xmlns:xs=<a href="http://www.w3.org/2001/XMLSchema">http://www.w3.org/2001/XMLSchema</a>
                 xmlns=<a href="http://schemas.oracle.com/bpel/faultpolicy">http://schemas.oracle.com/bpel/faultpolicy</a>
                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

        <Conditions>
            <faultName xmlns:bpelx=<a href="http://schemas.oracle.com/bpel/extension">http://schemas.oracle.com/bpel/extension</a> name="bpelx:remoteFault">
                <condition>
                    <action ref="ora-retry"/>
                </condition>
            </faultName>
            <faultName>
                <condition>
                    <action ref="ora-human-intervention"/>
                </condition>
            </faultName>
        </Conditions>

        <Actions>
            <Action id="ora-retry">
               <retry>
                  <retryCount&gt;5&lt;/retryCount>
                  <retryInterval&gt;300&lt;/retryInterval>
                  <exponentialBackoff&gt;2&lt;/exponentialBackoff>
                  <retryFailureAction ref="ora-human-intervention"/>
               </retry>
            </Action>      
            <Action id="ora-human-intervention">
                <humanIntervention/>
            </Action>
        </Actions>

    </faultPolicy>
</faultPolicies>

Human Intervention with the Enterprise Manager

The following screenshot show a Medewerker (Employee) process instance that invokes the unavailable AccountService.

image2

After three retries, the instance is waiting for a human intervention.
After the first Error Message at the top of the screen the recovery flag is set. Click on the message and then on the recover now button or immediately on the recovery flag.

image3

After selecting the message, the following screen appears.

image4

In this screen you have the possibility to select a Recovery action, change process variables, and invoke the recover operation.

In situation when services are down, after bringing it up again, a retry is quite common (without changing any content).

image5

After recovery the process continues normally.

Also check Greg Mally’s blog post ‘Fault Management Framework by Example’ on Aug 22, 2011 to get a good impression of the functionality of the Framework (https://blogs.oracle.com/ateamsoab2b/entry/fault_management_framework_by_example).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s