Assessing Single-Objective Performance Convergence and Time Complexity for Refactoring Detection

by

D. Nader-Palacio, D. Rodriguez -Cardenas, J. Gomez

Universidad Nacional de Colombia

Research Group on Artificial Life (Alife)

GECCO 2018 Kyoto

Terminology

[Software Refactoring] consists of re-constructing the code design of a software system without affecting the behavior functionality (Fowler & Beck, 1999)

[Refactoring & Reconstruction] Refactoring is a subset of Reconstructions (Mens & Tourwe, 2004)

[Refactorings] atomic refactoring operations (Fowler & Beck, 1999)

The refactoring process is still an issue (ACM October 2017)

State-of-the-art

The authors propose informal optimization models for the Refactoring Detection Problem (RDP), making the approaches difficult to compare and reproduce

A classical perspective of the Refactoring Detection Problem

A proposed perspective of the Refactoring Detection Problem

Proposed Technique

The Artificial Refactoring Generation (ARGen) is... 

Everything starts with a very precise statement of metrics

[Theoretical Refactoring] Software Formalization

System Under Analysis (SUA) is an information system or program composed of classes, methods and attributes

A single class is a Cartesian product represented by

c_{\alpha} = str \times str^* \times str^*
cα=str×str×strc_{\alpha} = str \times str^* \times str^*

A single Class is a Cartesian product represented by

c_{\alpha} = str \times str^* \times str^*
cα=str×str×strc_{\alpha} = str \times str^* \times str^*

identifier

fld(s)

mtd(s)

M = str^* = \bigcup_{n = 0} (str)^n
M=str=n=0(str)nM = str^* = \bigcup_{n = 0} (str)^n
A = str^* = \bigcup_{n = 0} (str)^n
A=str=n=0(str)nA = str^* = \bigcup_{n = 0} (str)^n
M \subseteq \mathbb{N}
MNM \subseteq \mathbb{N}
A \subseteq \mathbb{N}
ANA \subseteq \mathbb{N}

What about Methods and Fields?

R_{\delta}: \Omega \longrightarrow (Code Modification)
Rδ:Ω(CodeModification)R_{\delta}: \Omega \longrightarrow (Code Modification)

A Refactoring is a math function that maps from a Cartesian set to a Code Modification

\delta:
δ:\delta:

Specific Refactoring Operation

R_{\delta}: \Omega \longrightarrow (Code Modification)
Rδ:Ω(CodeModification)R_{\delta}: \Omega \longrightarrow (Code Modification)

How to create the parameters?

R_{\delta}: \Omega \longrightarrow (Code Modification)
Rδ:Ω(CodeModification)R_{\delta}: \Omega \longrightarrow (Code Modification)

From Source to Target Classes

R_{\delta}: c_s \times A_s \times M_s \times c_t \longrightarrow (Code Modification)
Rδ:cs×As×Ms×ct(CodeModification)R_{\delta}: c_s \times A_s \times M_s \times c_t \longrightarrow (Code Modification)
\eta_j:c_\alpha \longrightarrow \mathbb{R}
ηj:cαR\eta_j:c_\alpha \longrightarrow \mathbb{R}

A Quality Metric is a Function that receives a class and return a real value

\eta_j:c_\alpha \longrightarrow \mathbb{R}
ηj:cαR\eta_j:c_\alpha \longrightarrow \mathbb{R}

We can compute any specific metric (e.g., LOC, CYCLO, LCOM2)

H_\alpha \in \mathbb{R}^j
HαRjH_\alpha \in \mathbb{R}^j
H_{\alpha}=\{\eta_{1}(c_\alpha),\eta_{2}(c_\alpha),...,\eta_{j}(c_\alpha)\}
Hα={η1(cα),η2(cα),...,ηj(cα)}H_{\alpha}=\{\eta_{1}(c_\alpha),\eta_{2}(c_\alpha),...,\eta_{j}(c_\alpha)\}

[Theoretical Refactoring] Software Formalization

The Refactoring Impact Prediction is a technique to estimate the value of a software metric after performing a refactoring operation (Chaparro, 2014)

Prediction_{\delta,j}(c_\alpha) = \tilde\eta_{\delta,j}(c_\alpha)
Predictionδ,j(cα)=η~δ,j(cα)Prediction_{\delta,j}(c_\alpha) = \tilde\eta_{\delta,j}(c_\alpha)

Lines of Code metric impacted by Move Method Refactoring Operation

LOC_p(c_s) = LOC_b(c_s) - LOC(m_k)
LOCp(cs)=LOCb(cs)LOC(mk)LOC_p(c_s) = LOC_b(c_s) - LOC(m_k)
LOC_p(c_t) = LOC_b(c_t) + LOC(m_k)
LOCp(ct)=LOCb(ct)+LOC(mk)LOC_p(c_t) = LOC_b(c_t) + LOC(m_k)

We use the concept of a typical metric, which is represent the actual value, and the impacted metric, which is an estimation, after accounting the refactoring operation

\eta_j:{c_\alpha}\longrightarrow \mathbb{R}
ηj:cαR\eta_j:{c_\alpha}\longrightarrow \mathbb{R}
\tilde{\eta_{\delta,j}}:c_\alpha \longrightarrow \mathbb{R}
ηδ,j~:cαR\tilde{\eta_{\delta,j}}:c_\alpha \longrightarrow \mathbb{R}

Actual Metrics

Forecasted Metrics

[Theoretical Refactoring] Combinatorial Optimization

The search space has actionable (can perform the refactoring operation) and feasible regions (fulfill the metric's constraints)  

ARGen is a NP-Complete Combinatorial Problem 

Subset-sum-problem

\propto
\propto

ARGen

e.g. a system with 10 classes, 10 attributes and 10 methods would have a size of 10,000 whether the sequence is composed of one refactoring

(C^2*A*M)^ r
(C2AM)r(C^2*A*M)^ r

The objective function is a rate that compares a Predicted from Actual System in terms of software metrics

Set of refactoring operations

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1J(wjΥH~(Φj)min(ΥH~(Φj))max(ΥH~(Φj))min(ΥH~(Φj)))j=1J(wjΥH(ηj)min(ΥH(ηj))max(ΥH(ηj))min(ΥH(ηj)))+ρ(Φ)Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Min-max normalization

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1J(wjΥH~(Φj)min(ΥH~(Φj))max(ΥH~(Φj))min(ΥH~(Φj)))j=1J(wjΥH(ηj)min(ΥH(ηj))max(ΥH(ηj))min(ΥH(ηj)))+ρ(Φ)Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Developers' information 

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1J(wjΥH~(Φj)min(ΥH~(Φj))max(ΥH~(Φj))min(ΥH~(Φj)))j=1J(wjΥH(ηj)min(ΥH(ηj))max(ΥH(ηj))min(ΥH(ηj)))+ρ(Φ)Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Actual Metrics

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1J(wjΥH~(Φj)min(ΥH~(Φj))max(ΥH~(Φj))min(ΥH~(Φj)))j=1J(wjΥH(ηj)min(ΥH(ηj))max(ΥH(ηj))min(ΥH(ηj)))+ρ(Φ)Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Estimated Metrics

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1J(wjΥH~(Φj)min(ΥH~(Φj))max(ΥH~(Φj))min(ΥH~(Φj)))j=1J(wjΥH(ηj)min(ΥH(ηj))max(ΥH(ηj))min(ΥH(ηj)))+ρ(Φ)Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

Penalization

Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1J(wjΥH~(Φj)min(ΥH~(Φj))max(ΥH~(Φj))min(ΥH~(Φj)))j=1J(wjΥH(ηj)min(ΥH(ηj))max(ΥH(ηj))min(ΥH(ηj)))+ρ(Φ)Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)

The refactoring repair functions account a catalog of 10 constraints based on object-oriented guidelines to perform repairs on the individuals

We design 6 genetic operators for the Hybrid Optimization employed

[Empirical Refactoring] Computational Technique Design

Building a metaphor of the system

Compute "Actual Metrics"

Configure Individuals given Fowler's Catalog

Use unalcol

Use Estimation (RIPE) and Compute Fitness

Report in a Json File

Technique Validation

Preliminary Experiment: Do-ability using a Shapiro-Wilk Test (non-normal distribution)

Algorithm CCODEC[2000] ACRA[60000]
Hill Climbing 0.0049 0.0015
Simulated A. 0.0222 0.0157
HaEa 0.0144 0.0340

Large Evaluations ACRA[60000]

Hill Climbing

Large Evaluations ACRA[60000]

Simulated Annealing

Large Evaluations ACRA[60000]

Large Evaluations ACRA[60000] for HaEa

Large Evaluations ACRA[60000] for HaEa

Discussion

Key Findings

  • Hybrid Evolutionary Approach has lower values results in large iterations (0.95 +/- 0.003 [60000])
  • Evolutionary Algorithms seems to work better than baseline (p-value <= 0.0002)
  • The main inconvenient during execution was latency, we had to redeploy and test with new parameters

Strengths 

  • Unified Mathematical Approximation
  • A definition, a development, and an evaluation of ARGen
  • Performance and complexity validation

Limitations

Future Work

The refactoring consistency metric is based on Archipelago (Zarras, et al. 2015)

RCM = \frac {\pi} { \Pi + \frac{\lambda} { \Lambda} + \Theta}
RCM=πΠ+λΛ+ΘRCM = \frac {\pi} { \Pi + \frac{\lambda} { \Lambda} + \Theta}

Involving self-organization and artificial minds

Conclusion

Thank you! :)