Learning Recovery Strategies for Dynamic
Self-healing in Reactive Systems
Mateo Sanabria, Ivana Dusparic, Nicolás Cardozo
SEAMS 2024, Lisbon Portugal
Traditional self-healing systems automatically detect/diagnose, and recover from failures
Require predefined knowledge of faults
Require predefined healing strategies
Why settle for a rigid system when a flexible, self-healing one can adapt and autonomously heal using atomic actions?
First example
Suppose we have a basic system based on a simple GUI, in which the user moves (movement method) the pointer:
- Up, Down, Left, Right
The tessellation of such GUI defines two states:
- White region: Correct state
- Blue region: Failure state
first example
Suppose we have a basic system based on a simple GUI, in which the user moves (movement method) the pointer:
- Up, Down, Left, Right
The tessellation of such GUI defines two states:
- White region: Correct state
- Blue region: Failure state
For a traditional self-healing system in this context
- The detection of failure states requires the explicit definition of the blue region
- What about the definition of healing strategies?
- What happens if the tessellation changes?
For a traditional self-healing system in this context
- The detection of failure states requires the explicit definition of the blue region
- What about the definition of healing strategies?
- What happens if the tessellation changes?
For a traditional self-healing system in this context
- The detection of failure states requires the explicit definition of the blue region
- What about the definition of healing strategies?
- What happens if the tessellation changes?
We propose a novel self-healing framework that learns recovery strategies for fine grained system healing behavior at run time.
The framework has three main components
These are instantiated as needed in the application.
Fault detection
learning model
- The monitor uses the information given by the fired predicate to detect the atomic action that trigger the activation.
- The monitor uses a Q-Learning agent to establish the reward for the action taken at the current state based on the predicated current value.
- The learning processes maps states to high value atomic actions
Variation Manager
The learning model provides a map of individual actions, while the variation manager constructs healing strategies for each failure state based on these individual actions
DEltaiot Exemplar
- DeltaIoT is a communication network composed of 25 IoT motes deployed in different physical locations.
- DeltaIoT is used as an exemplar to evaluate different self-adaptation strategies to manage motes’ tradeoff between energy consumption and packet loss
https://people.cs.kuleuven.be/~danny.weyns/software/DeltaIoT/
- we create a predicate,
analyzeLinkSettings, measuring links’ state with respect
to the two metrics. - When the predicate is satisfied, it
triggers the deployment of the variations generated by the
learned healing strategies.
simulation steps
simulation steps
- Our framework leverages monitors for flexible fault detection and automatic healing strategy generation at runtime.
- The framework eliminates the need for predefined knowledge of faults and healing strategies, simplifying development effort and reducing complexity.
- The approach is validated on a reactive application and the DeltaIoT exemplar, demonstrating effectiveness in learning self-healing behavior.
Conclusion
Learning Recovery Strategies for Dynamic Self-Healing In Reactive Systems
By Mateo Sanabria Ardila
Learning Recovery Strategies for Dynamic Self-Healing In Reactive Systems
- 45