Learning Recovery Strategies for Dynamic
Self-healing in Reactive Systems

Mateo Sanabria, Ivana Dusparic, Nicolás Cardozo

SEAMS 2024, Lisbon Portugal

Traditional self-healing systems  automatically detect/diagnose, and recover from failures  

Require predefined knowledge of faults

Require  predefined healing strategies

Why settle for a rigid system when a flexible, self-healing one can adapt and autonomously heal using atomic actions?

First example

Suppose we have a basic system based on a simple GUI, in which the user moves (movement method) the pointer:

  • Up, Down, Left, Right

The tessellation of such GUI defines two states:

  • White region: Correct  state
  • Blue region: Failure state

first example

Suppose we have a basic system based on a simple GUI, in which the user moves (movement method) the pointer:

  • Up, Down, Left, Right

The tessellation of such GUI defines two states:

  • White region: Correct  state
  • Blue region: Failure state

For a traditional self-healing system in this context

  • The detection of failure states requires the explicit definition of the blue region
  • What about the definition of healing strategies?
  • What happens if the tessellation changes?

For a traditional self-healing system in this context

  • The detection of failure states requires the explicit definition of the blue region
  • What about the definition of healing strategies?
  • What happens if the tessellation changes?

For a traditional self-healing system in this context

  • The detection of failure states requires the explicit definition of the blue region
  • What about the definition of healing strategies?
  • What happens if the tessellation changes?

We propose a novel self-healing framework that learns recovery strategies for fine grained system healing behavior at run time.

The framework has three main components

These are instantiated as needed in the application.

Fault detection

learning model

  • The monitor uses the information given by the fired predicate to detect the atomic action that trigger the activation.
  • The monitor uses a Q-Learning agent to establish the reward for the action taken at the current state based on the predicated current value.
  • The learning processes maps states to high value atomic actions

Variation Manager

The learning model provides a map of individual actions, while the variation manager constructs healing strategies for each failure state based on these individual actions

DEltaiot Exemplar

  • DeltaIoT  is a communication network composed of 25 IoT motes deployed in different physical locations.
  • DeltaIoT is used as an exemplar to evaluate different self-adaptation strategies to manage motes’ tradeoff between energy consumption and packet loss

https://people.cs.kuleuven.be/~danny.weyns/software/DeltaIoT/

  • we create a predicate,
    analyzeLinkSettings, measuring links’ state with respect
    to the two metrics.
  • When the predicate is satisfied, it
    triggers the deployment of the variations generated by the
    learned healing strategies.
simulation steps
simulation steps
  • Our framework leverages monitors for flexible fault detection and automatic healing strategy generation at runtime.
  • The framework eliminates the need for predefined knowledge of faults and healing strategies, simplifying development effort and reducing complexity.
  • The approach is validated on a reactive application and the DeltaIoT exemplar, demonstrating effectiveness in learning self-healing behavior.

Conclusion