Assessing Single-Objective Performance Convergence and Time Complexity for Refactoring Detection

by

D. Nader-Palacio, D. Rodriguez -Cardenas, J. Gomez

Universidad Nacional de Colombia

Research Group on Artificial Life (Alife)

GECCO 2018 Kyoto

Terminology

[Software Refactoring] consists of re-constructing the code design of a software system without affecting the behavior functionality (Fowler & Beck, 1999)

[Refactoring & Reconstruction] Refactoring is a subset of Reconstructions (Mens & Tourwe, 2004)

[Refactorings] atomic refactoring operations (Fowler & Beck, 1999)

The refactoring process is still an issue (ACM October 2017)

State-of-the-art

The authors propose informal optimization models for the Refactoring Detection Problem (RDP), making the approaches difficult to compare and reproduce

A classical perspective of the Refactoring Detection Problem

A proposed perspective of the Refactoring Detection Problem

Proposed Technique

The Artificial Refactoring Generation (ARGen) is...

Everything starts with a very precise statement of metrics

[Theoretical Refactoring] Software Formalization

System Under Analysis (SUA) is an information system or program composed of classes, methods and attributes

A single class is a Cartesian product represented by

c_{\alpha} = str \times str^* \times str^*

c_{\alpha} = str \times str^* \times str^*

A single Class is a Cartesian product represented by

c_{\alpha} = str \times str^* \times str^*

c_{\alpha} = str \times str^* \times str^*

identifier

fld(s)

mtd(s)

M = str^* = \bigcup_{n = 0} (str)^n

M = str^* = \bigcup_{n = 0} (str)^n

A = str^* = \bigcup_{n = 0} (str)^n

A = str^* = \bigcup_{n = 0} (str)^n

M \subseteq \mathbb{N}

M \subseteq \mathbb{N}

A \subseteq \mathbb{N}

A \subseteq \mathbb{N}

What about Methods and Fields?

R_{\delta}: \Omega \longrightarrow (Code Modification)

R_{\delta}: \Omega \longrightarrow (Code Modification)

A Refactoring is a math function that maps from a Cartesian set to a Code Modification

\delta:

\delta:

Specific Refactoring Operation

R_{\delta}: \Omega \longrightarrow (Code Modification)

R_{\delta}: \Omega \longrightarrow (Code Modification)

How to create the parameters?

R_{\delta}: \Omega \longrightarrow (Code Modification)

R_{\delta}: \Omega \longrightarrow (Code Modification)

From Source to Target Classes

R_{\delta}: c_s \times A_s \times M_s \times c_t \longrightarrow (Code Modification)

R_{\delta}: c_s \times A_s \times M_s \times c_t \longrightarrow (Code Modification)

\eta_j:c_\alpha \longrightarrow \mathbb{R}

\eta_j:c_\alpha \longrightarrow \mathbb{R}

A Quality Metric is a Function that receives a class and return a real value

\eta_j:c_\alpha \longrightarrow \mathbb{R}

\eta_j:c_\alpha \longrightarrow \mathbb{R}

We can compute any specific metric (e.g., LOC, CYCLO, LCOM2)

H_\alpha \in \mathbb{R}^j

H_\alpha \in \mathbb{R}^j

H_{\alpha}=\{\eta_{1}(c_\alpha),\eta_{2}(c_\alpha),...,\eta_{j}(c_\alpha)\}

H_{\alpha}=\{\eta_{1}(c_\alpha),\eta_{2}(c_\alpha),...,\eta_{j}(c_\alpha)\}

[Theoretical Refactoring] Software Formalization

The Refactoring Impact Prediction is a technique to estimate the value of a software metric after performing a refactoring operation (Chaparro, 2014)

Prediction_{\delta,j}(c_\alpha) = \tilde\eta_{\delta,j}(c_\alpha)

Prediction_{\delta,j}(c_\alpha) = \tilde\eta_{\delta,j}(c_\alpha)

Lines of Code metric impacted by Move Method Refactoring Operation

LOC_p(c_s) = LOC_b(c_s) - LOC(m_k)

LOC_p(c_s) = LOC_b(c_s) - LOC(m_k)

LOC_p(c_t) = LOC_b(c_t) + LOC(m_k)

LOC_p(c_t) = LOC_b(c_t) + LOC(m_k)

We use the concept of a typical metric, which is represent the actual value, and the impacted metric, which is an estimation, after accounting the refactoring operation

\eta_j:{c_\alpha}\longrightarrow \mathbb{R}

\eta_j:{c_\alpha}\longrightarrow \mathbb{R}

\tilde{\eta_{\delta,j}}:c_\alpha \longrightarrow \mathbb{R}

\tilde{\eta_{\delta,j}}:c_\alpha \longrightarrow \mathbb{R}

Actual Metrics

Forecasted Metrics

[Theoretical Refactoring] Combinatorial Optimization

The search space has actionable (can perform the refactoring operation) and feasible regions (fulfill the metric's constraints)

ARGen is a NP-Complete Combinatorial Problem

Subset-sum-problem

\propto

\propto

ARGen

e.g. a system with 10 classes, 10 attributes and 10 methods would have a size of 10,000 whether the sequence is composed of one refactoring

(C^2*A*M)^ r

(C^2*A*M)^ r

The objective function is a rate that compares a Predicted from Actual System in terms of software metrics

Set of refactoring operations

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Min-max normalization

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Developers' information

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Actual Metrics

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Estimated Metrics

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Penalization

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

The refactoring repair functions account a catalog of 10 constraints based on object-oriented guidelines to perform repairs on the individuals

We design 6 genetic operators for the Hybrid Optimization employed

[Empirical Refactoring] Computational Technique Design

Building a metaphor of the system

Compute "Actual Metrics"

Configure Individuals given Fowler's Catalog

Use unalcol

Use Estimation (RIPE) and Compute Fitness

Report in a Json File

Technique Validation

Preliminary Experiment: Do-ability using a Shapiro-Wilk Test (non-normal distribution)

Algorithm	CCODEC[2000]	ACRA[60000]
Hill Climbing	0.0049	0.0015
Simulated A.	0.0222	0.0157
HaEa	0.0144	0.0340

Large Evaluations ACRA[60000]

Hill Climbing

Large Evaluations ACRA[60000]

Simulated Annealing

Large Evaluations ACRA[60000]

Large Evaluations ACRA[60000] for HaEa

Discussion

Key Findings

Hybrid Evolutionary Approach has lower values results in large iterations (0.95 +/- 0.003 [60000])
Evolutionary Algorithms seems to work better than baseline (p-value <= 0.0002)
The main inconvenient during execution was latency, we had to redeploy and test with new parameters

Strengths

Unified Mathematical Approximation
A definition, a development, and an evaluation of ARGen
Performance and complexity validation

Limitations

Future Work

The refactoring consistency metric is based on Archipelago (Zarras, et al. 2015)

RCM = \frac {\pi} { \Pi + \frac{\lambda} { \Lambda} + \Theta}

RCM = \frac {\pi} { \Pi + \frac{\lambda} { \Lambda} + \Theta}

Assessing Single-Objective Performance Convergence and Time Complexity for Refactoring Detection

Terminology

[Software Refactoring] consists of re-constructing the code design of a software system without affecting the behavior functionality (Fowler & Beck, 1999)

[Refactoring & Reconstruction] Refactoring is a subset of Reconstructions (Mens & Tourwe, 2004)

[Refactorings] atomic refactoring operations (Fowler & Beck, 1999)

The refactoring process is still an issue (ACM October 2017)

State-of-the-art

The authors propose informal optimization models for the Refactoring Detection Problem (RDP), making the approaches difficult to compare and reproduce

A classical perspective of the Refactoring Detection Problem

A proposed perspective of the Refactoring Detection Problem

Proposed Technique

The Artificial Refactoring Generation (ARGen) is...

Everything starts with a very precise statement of metrics

[Theoretical Refactoring] Software Formalization

System Under Analysis (SUA) is an information system or program composed of classes, methods and attributes

A single class is a Cartesian product represented by

A single Class is a Cartesian product represented by

What about Methods and Fields?

A Refactoring is a math function that maps from a Cartesian set to a Code Modification

Specific Refactoring Operation

How to create the parameters?

From Source to Target Classes

A Quality Metric is a Function that receives a class and return a real value

We can compute any specific metric (e.g., LOC, CYCLO, LCOM2)

[Theoretical Refactoring] Software Formalization

The Refactoring Impact Prediction is a technique to estimate the value of a software metric after performing a refactoring operation (Chaparro, 2014)

Lines of Code metric impacted by Move Method Refactoring Operation

We use the concept of a typical metric, which is represent the actual value, and the impacted metric, which is an estimation, after accounting the refactoring operation

Actual Metrics

Forecasted Metrics

[Theoretical Refactoring] Combinatorial Optimization

The search space has actionable (can perform the refactoring operation) and feasible regions (fulfill the metric's constraints)

ARGen is a NP-Complete Combinatorial Problem

Subset-sum-problem

ARGen

e.g. a system with 10 classes, 10 attributes and 10 methods would have a size of 10,000 whether the sequence is composed of one refactoring

The objective function is a rate that compares a Predicted from Actual System in terms of software metrics

Set of refactoring operations

Min-max normalization

Developers' information

Actual Metrics

Estimated Metrics

Penalization

The refactoring repair functions account a catalog of 10 constraints based on object-oriented guidelines to perform repairs on the individuals

We design 6 genetic operators for the Hybrid Optimization employed

[Empirical Refactoring] Computational Technique Design

Technique Validation

Preliminary Experiment: Do-ability using a Shapiro-Wilk Test (non-normal distribution)

Large Evaluations ACRA[60000]

Large Evaluations ACRA[60000]

Large Evaluations ACRA[60000]

Large Evaluations ACRA[60000] for HaEa

Large Evaluations ACRA[60000] for HaEa

Discussion

Key Findings

Strengths

Limitations

Future Work

The refactoring consistency metric is based on Archipelago (Zarras, et al. 2015)

Involving self-organization and artificial minds

Conclusion

Thank you! :)

Convergence_Refactoring

More from David Nader Palacio