Adjust the situation that from a business process various service endpoints are invoked, where one of them is shown unstable behavior. The unstable service is regularly for short and sometimes long time not available. This may result in a large number of instances end in the error hospital. At the time that the endpoint is not available, resources are used to process instances which will not end successfully. Failure resiliency solution helps to avoid this situation. The failure resiliency solution is to suspend upstream inbound adapters, or EDN subscriptions, or web services. In the case of web services, clients are given errors right away. In the case of EDN and Adapters, messages wait in directory/queue/topic etc. until the endpoint is resumed.
IWS can also help in this situation (see also my IWS blog). Use IWS to examine what happened at various points in the business process to analyze the behavior.
By default, failure resiliency is switched off. Zo to start using it, the first step is to enable failure resiliency. You can enable it globally (as shown below). Each downstream endpoint inherits this configuration, but you can override it for an endpoint.
You can set the failure rate (the number of failures in a given time period). Once the upstream endpoint is suspended, one or a few messages are let in periodically, and if the downstream endpoint is detected to be fine, then the upstream endpoint automatically resumes. Set the Retry Interval, or the periodic interval at which trickle feed is enabled for the upstream endpoint.
If you want to be notified when the upstream endpoints suspend/resume, you can enter the notification info.
During the OFM Summer Camps in Lisbon last August we were doing an exercise with resilience. In a sample application we took down an endpoint and see the failure resiliency kick in and suspend upstream endpoints. We also saw the upstream endpoint automatically resume when the downstream endpoint comes back up. The excursive proved the value of the failure resiliency for both upstream EDN and adapter. We didn’t had time to find out all the nitty-gritty details of resiliency but for what I have seen so far, this kind of functionality really makes me happy.