Learning Recovery Strategies for Dynamic
Self-healing in Reactive Systems

Mateo Sanabria, Ivana Dusparic, Nicolás Cardozo

SEAMS 2024, Lisbon Portugal

Traditional self-healing systems automatically detect/diagnose, and recover from failures

Require predefined knowledge of faults

Require predefined healing strategies

Why settle for a rigid system when a flexible, self-healing one can adapt and autonomously heal using atomic actions?

First example

Suppose we have a basic system based on a simple GUI, in which the user moves (movement method) the pointer:

Up, Down, Left, Right

The tessellation of such GUI defines two states:

White region: Correct state
Blue region: Failure state

first example

Suppose we have a basic system based on a simple GUI, in which the user moves (movement method) the pointer:

Up, Down, Left, Right

The tessellation of such GUI defines two states:

White region: Correct state
Blue region: Failure state

For a traditional self-healing system in this context

The detection of failure states requires the explicit definition of the blue region
What about the definition of healing strategies?
What happens if the tessellation changes?

For a traditional self-healing system in this context

The detection of failure states requires the explicit definition of the blue region
What about the definition of healing strategies?
What happens if the tessellation changes?

For a traditional self-healing system in this context

The detection of failure states requires the explicit definition of the blue region
What about the definition of healing strategies?
What happens if the tessellation changes?

We propose a novel self-healing framework that learns recovery strategies for fine grained system healing behavior at run time.

The framework has three main components

These are instantiated as needed in the application.

Fault detection

learning model

The monitor uses the information given by the fired predicate to detect the atomic action that trigger the activation.
The monitor uses a Q-Learning agent to establish the reward for the action taken at the current state based on the predicated current value.
The learning processes maps states to high value atomic actions

Variation Manager

The learning model provides a map of individual actions, while the variation manager constructs healing strategies for each failure state based on these individual actions

DEltaiot Exemplar

DeltaIoT is a communication network composed of 25 IoT motes deployed in different physical locations.
DeltaIoT is used as an exemplar to evaluate different self-adaptation strategies to manage motes’ tradeoff between energy consumption and packet loss

https://people.cs.kuleuven.be/~danny.weyns/software/DeltaIoT/

we create a predicate,
analyzeLinkSettings, measuring links’ state with respect
to the two metrics.
When the predicate is satisfied, it
triggers the deployment of the variations generated by the
learned healing strategies.

simulation steps

simulation steps

Our framework leverages monitors for flexible fault detection and automatic healing strategy generation at runtime.
The framework eliminates the need for predefined knowledge of faults and healing strategies, simplifying development effort and reducing complexity.
The approach is validated on a reactive application and the DeltaIoT exemplar, demonstrating effectiveness in learning self-healing behavior.

Learning Recovery Strategies for Dynamic
Self-healing in Reactive Systems

First example

first example

Fault detection

learning model

Variation Manager

DEltaiot Exemplar

Conclusion

Learning Recovery Strategies for Dynamic Self-Healing In Reactive Systems

Learning Recovery Strategies for Dynamic Self-Healing In Reactive Systems

Mateo Sanabria Ardila

Learning Recovery Strategies for Dynamic Self-healing in Reactive Systems

First example

first example

Fault detection

learning model

Variation Manager

DEltaiot Exemplar

Conclusion

Learning Recovery Strategies for Dynamic Self-Healing In Reactive Systems

More from Mateo Sanabria Ardila

Learning Recovery Strategies for Dynamic
Self-healing in Reactive Systems