Assessing Single-Objective Performance Convergence and Time Complexity for Refactoring Detection
by
D. Nader-Palacio, D. Rodriguez -Cardenas, J. Gomez
Universidad Nacional de Colombia
Research Group on Artificial Life (Alife)
GECCO 2018 Kyoto
Terminology
[Software Refactoring] consists of re-constructing the code design of a software system without affecting the behavior functionality (Fowler & Beck, 1999)
[Refactoring & Reconstruction] Refactoring is a subset of Reconstructions (Mens & Tourwe, 2004)
[Refactorings] atomic refactoring operations (Fowler & Beck, 1999)
The refactoring process is still an issue (ACM October 2017)
State-of-the-art
The authors propose informal optimization models for the Refactoring Detection Problem (RDP), making the approaches difficult to compare and reproduce
A classical perspective of the Refactoring Detection Problem
A proposed perspective of the Refactoring Detection Problem
Proposed Technique
The Artificial Refactoring Generation (ARGen) is...
Everything starts with a very precise statement of metrics
[Theoretical Refactoring] Software Formalization
System Under Analysis (SUA) is an information system or program composed of classes, methods and attributes
A single class is a Cartesian product represented by
c_{\alpha} = str \times str^* \times str^*
cα=str×str∗×str∗
A single Class is a Cartesian product represented by
c_{\alpha} = str \times str^* \times str^*
cα=str×str∗×str∗
identifier
fld(s)
mtd(s)
M = str^* = \bigcup_{n = 0} (str)^n
M=str∗=⋃n=0(str)n
A = str^* = \bigcup_{n = 0} (str)^n
A=str∗=⋃n=0(str)n
M \subseteq \mathbb{N}
M⊆N
A \subseteq \mathbb{N}
A⊆N
What about Methods and Fields?
R_{\delta}: \Omega \longrightarrow (Code Modification)
Rδ:Ω⟶(CodeModification)
A Refactoring is a math function that maps from a Cartesian set to a Code Modification
\delta:
δ:
Specific Refactoring Operation
R_{\delta}: \Omega \longrightarrow (Code Modification)
Rδ:Ω⟶(CodeModification)
How to create the parameters?
R_{\delta}: \Omega \longrightarrow (Code Modification)
Rδ:Ω⟶(CodeModification)
From Source to Target Classes
R_{\delta}: c_s \times A_s \times M_s \times c_t \longrightarrow (Code Modification)
Rδ:cs×As×Ms×ct⟶(CodeModification)
\eta_j:c_\alpha \longrightarrow \mathbb{R}
ηj:cα⟶R
A Quality Metric is a Function that receives a class and return a real value
\eta_j:c_\alpha \longrightarrow \mathbb{R}
ηj:cα⟶R
We can compute any specific metric (e.g., LOC, CYCLO, LCOM2)
H_\alpha \in \mathbb{R}^j
Hα∈Rj
H_{\alpha}=\{\eta_{1}(c_\alpha),\eta_{2}(c_\alpha),...,\eta_{j}(c_\alpha)\}
Hα={η1(cα),η2(cα),...,ηj(cα)}
[Theoretical Refactoring] Software Formalization
The Refactoring Impact Prediction is a technique to estimate the value of a software metric after performing a refactoring operation (Chaparro, 2014)
Prediction_{\delta,j}(c_\alpha) = \tilde\eta_{\delta,j}(c_\alpha)
Predictionδ,j(cα)=η~δ,j(cα)
Lines of Code metric impacted by Move Method Refactoring Operation
LOC_p(c_s) = LOC_b(c_s) - LOC(m_k)
LOCp(cs)=LOCb(cs)−LOC(mk)
LOC_p(c_t) = LOC_b(c_t) + LOC(m_k)
LOCp(ct)=LOCb(ct)+LOC(mk)
We use the concept of a typical metric, which is represent the actual value, and the impacted metric, which is an estimation, after accounting the refactoring operation
\eta_j:{c_\alpha}\longrightarrow \mathbb{R}
ηj:cα⟶R
\tilde{\eta_{\delta,j}}:c_\alpha \longrightarrow \mathbb{R}
ηδ,j~:cα⟶R
Actual Metrics
Forecasted Metrics
[Theoretical Refactoring] Combinatorial Optimization
The search space has actionable (can perform the refactoring operation) and feasible regions (fulfill the metric's constraints)
ARGen is a NP-Complete Combinatorial Problem
Subset-sum-problem
\propto
∝
ARGen
e.g. a system with 10 classes, 10 attributes and 10 methods would have a size of 10,000 whether the sequence is composed of one refactoring
(C^2*A*M)^ r
(C2∗A∗M)r
The objective function is a rate that compares a Predicted from Actual System in terms of software metrics
Set of refactoring operations
Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1∑J(wjmax(ΥH(ηj))−min(ΥH(ηj))ΥH(ηj)−min(ΥH(ηj)))j=1∑J(wjmax(ΥH~(Φj))−min(ΥH~(Φj))ΥH~(Φj)−min(ΥH~(Φj)))+ρ(Φ)
Min-max normalization
Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1∑J(wjmax(ΥH(ηj))−min(ΥH(ηj))ΥH(ηj)−min(ΥH(ηj)))j=1∑J(wjmax(ΥH~(Φj))−min(ΥH~(Φj))ΥH~(Φj)−min(ΥH~(Φj)))+ρ(Φ)
Developers' information
Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1∑J(wjmax(ΥH(ηj))−min(ΥH(ηj))ΥH(ηj)−min(ΥH(ηj)))j=1∑J(wjmax(ΥH~(Φj))−min(ΥH~(Φj))ΥH~(Φj)−min(ΥH~(Φj)))+ρ(Φ)
Actual Metrics
Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1∑J(wjmax(ΥH(ηj))−min(ΥH(ηj))ΥH(ηj)−min(ΥH(ηj)))j=1∑J(wjmax(ΥH~(Φj))−min(ΥH~(Φj))ΥH~(Φj)−min(ΥH~(Φj)))+ρ(Φ)
Estimated Metrics
Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1∑J(wjmax(ΥH(ηj))−min(ΥH(ηj))ΥH(ηj)−min(ΥH(ηj)))j=1∑J(wjmax(ΥH~(Φj))−min(ΥH~(Φj))ΥH~(Φj)−min(ΥH~(Φj)))+ρ(Φ)
Penalization
Obj(\Phi)= \frac{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_{\tilde{H}}(\Phi_j)-min(\Upsilon_{\tilde{H}}(\Phi_j))}{max(\Upsilon_{\tilde{H}}(\Phi_j))-min(\Upsilon_{\tilde{H}}(\Phi_j))}\right)}{\displaystyle\sum_{j=1}^{J} \left( w_j \frac{\Upsilon_H(\eta_j)-min(\Upsilon_H(\eta_j))}{max(\Upsilon_H(\eta_j))-min(\Upsilon_H(\eta_j))}\right)} + \rho(\Phi)
Obj(Φ)=j=1∑J(wjmax(ΥH(ηj))−min(ΥH(ηj))ΥH(ηj)−min(ΥH(ηj)))j=1∑J(wjmax(ΥH~(Φj))−min(ΥH~(Φj))ΥH~(Φj)−min(ΥH~(Φj)))+ρ(Φ)
The refactoring repair functions account a catalog of 10 constraints based on object-oriented guidelines to perform repairs on the individuals
We design 6 genetic operators for the Hybrid Optimization employed
[Empirical Refactoring] Computational Technique Design
Building a metaphor of the system
Compute "Actual Metrics"
Configure Individuals given Fowler's Catalog
Use unalcol
Use Estimation (RIPE) and Compute Fitness
Report in a Json File
Technique Validation
Preliminary Experiment: Do-ability using a Shapiro-Wilk Test (non-normal distribution)
Algorithm | CCODEC[2000] | ACRA[60000] |
---|---|---|
Hill Climbing | 0.0049 | 0.0015 |
Simulated A. | 0.0222 | 0.0157 |
HaEa | 0.0144 | 0.0340 |
Large Evaluations ACRA[60000]
Hill Climbing
Large Evaluations ACRA[60000]
Simulated Annealing
Large Evaluations ACRA[60000]
Large Evaluations ACRA[60000] for HaEa
Large Evaluations ACRA[60000] for HaEa
Discussion
Key Findings
- Hybrid Evolutionary Approach has lower values results in large iterations (0.95 +/- 0.003 [60000])
- Evolutionary Algorithms seems to work better than baseline (p-value <= 0.0002)
- The main inconvenient during execution was latency, we had to redeploy and test with new parameters
Strengths
- Unified Mathematical Approximation
- A definition, a development, and an evaluation of ARGen
- Performance and complexity validation
Limitations
Future Work
The refactoring consistency metric is based on Archipelago (Zarras, et al. 2015)
RCM = \frac {\pi} { \Pi + \frac{\lambda} { \Lambda} + \Theta}
RCM=Π+Λλ+Θπ
Involving self-organization and artificial minds
Conclusion
Thank you! :)
Convergence_Refactoring
By David Nader Palacio
Convergence_Refactoring
GECCO 2018
- 176