Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities
Martin White*, Michele Tufano*, Matías Martínez†, Martin Monperrus‡, Denys Poshyvanyk*
*College of William and Mary, Williamsburg, Virginia, USA
†Université Polytechnique Hauts-de-France, Valenciennes, France
‡KTH Royal Institute of Technology, Stockholm, Sweden
Keywords
-
Software engineering research
-
Failure
-
Error
-
Fault
-
Suspicious elements
-
Repair ingredients
-
Fix space
-
Abstract syntax tree
(Software) bugs are obnoxious.
Automated Program Repair
Transformation of an unacceptable behavior of a program execution into an acceptable one according to a specification.
Automated program repair is hard.
Automated program repair is really hard.
Background
Generate-and-validate repair techniques (generally) search for statement-level modifications and validate patches against the test suit.
Correct-by-construction repair techniques use program analysis and program synthesis to construct code with particular properties.
public final class MathUtils {
public static boolean equals(double x, double y) {
return (Double.isNaN(x) && Double.isNaN(y)) || x == y;
}
}
public final class MathUtils {
public static boolean equals(double x, double y) {
return equals(x, y, 1);
}
}
if (max_range_endpoint < eol_range_start)
max_range_endpoint = eol_range_start;
printable_field = xzalloc(max_range_endpoint/CHAR_BIT+1);
if (max_range_endpoint < eol_range_start)
max_range_endpoint = eol_range_start;
if (1)
printable_field = xzalloc(max_range_endpoint/CHAR_BIT+1);
if (AAA)
max_range_endpoint = BBB;
if (CCC)
printable_field = xzalloc(max_range_endpoint/CHAR_BIT+1);
if (0)
max_range_endpoint = eol_range_start;
if (!(max_range_endpoint == 0))
printable_field = xzalloc(max_range_endpoint/CHAR_BIT+1);
S. Mechtaev, J. Yi, and A. Roychoudhury, Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis, ICSE 2016.
The Redundancy Assumption
Large programs contain the seeds of their own repair [Martinez'14,Barr'14]
- Line-level. Most redundancy is localized in the same file [Martinez'14]
- Token-level. Repairs need never invent a new token [Martinez'14]
On the problem of navigating complex fix spaces, we use code similarities to intelligently select and adapt program repair ingredients.
Why?
Patches that use novel expressions are unattainable with existing redundancy-based repair techniques.
Technical Approach
-
Recognition
-
Learning
-
Repair
Recognition
-
Build source model
-
Build corpora
-
Normalize corpora
package org.apache.commons.math.util;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Arrays;
import org.apache.commons.math.MathRuntimeException;
import org.apache.commons.math.exception.util.Localizable;
import org.apache.commons.math.exception.util.LocalizedFormats;
import org.apache.commons.math.exception.NonMonotonousSequenceException;
/**
* Some useful additions to the built-in functions in {@link Math}.
* @version $Revision$ $Date$
*/
public final class MathUtils {
/** Smallest positive number such that 1 - EPSILON is not numerically equal to 1. */
public static final double EPSILON = 0x1.0p-53;
/** Safe minimum, such that 1 / SAFE_MIN does not overflow.
* <p>In IEEE 754 arithmetic, this is also the smallest normalized
* number 2<sup>-1022</sup>.</p>
*/
public static final double SAFE_MIN = 0x1.0p-1022;
MathUtils.java
Learning
-
Train language model
-
Encode fragments
-
Cluster identifiers
Repair
-
Core repair loop
-
Sorting ingredients
-
Transforming ingredients
Recognition
-
Build source model
-
Build corpora
-
Normalize corpora
org apache commons math ode events public EventHandler int STOP 0 int RESET_STATE 1 int RESET_DERIVATIVES 2
org apache commons math ode nonstiff public MidpointIntegrator RungeKuttaIntegrator private static final double STATIC_C
org apache commons math distribution org apache commons math MathException public ContinuousDistribution Distribution
org apache commons math distribution org apache commons math MathException public HasDensity P double density P x
org apache commons math genetics java util List public PermutationChromosome T List T decode List T sequence
org apache commons math optimization java io Serializable public GoalType Serializable MAXIMIZE MINIMIZE
org apache commons math linear public AnyMatrix boolean isSquare int getRowDimension int getColumnDimension
org apache commons math stat ranking public TiesStrategy SEQUENTIAL MINIMUM MAXIMUM AVERAGE RANDOM
org apache commons math genetics public CrossoverPolicy ChromosomePair crossover Chromosome first Chromosome second
org apache commons math distribution public DiscreteDistribution Distribution double probability double x
org apache commons math stat ranking public NaNStrategy MINIMAL MAXIMAL REMOVED FIXED
org apache commons math stat ranking public RankingAlgorithm double rank double data
org apache commons math genetics public SelectionPolicy ChromosomePair select Population population
org apache commons math genetics public StoppingCondition boolean isSatisfied Population population
org apache commons math genetics public MutationPolicy Chromosome mutate Chromosome original
org apache commons math public Field T T getZero T getOne
org apache commons math optimization general public ConjugateGradientFormula FLETCHER_REEVES POLAK_RIBIERE
org apache commons math random public RandomVectorGenerator double nextVector
org apache commons math random public NormalizedRandomGenerator double nextNormalizedDouble
org apache commons math genetics public Fitness double fitness
File-level corpus
Learning
-
Train language model
-
Encode fragments
-
Cluster identifiers
Repair
-
Core repair loop
-
Sorting ingredients
-
Transforming ingredients
Recognition
-
Build source model
-
Build corpora
-
Normalize corpora
org apache commons math ode events public EventHandler int STOP <INT> int RESET_STATE <INT> int RESET_DERIVATIVES <INT>
org apache commons math ode nonstiff public MidpointIntegrator RungeKuttaIntegrator private static final double STATIC_C
org apache commons math distribution org apache commons math MathException public ContinuousDistribution Distribution
org apache commons math distribution org apache commons math MathException public HasDensity P double density P x
org apache commons math genetics java util List public PermutationChromosome T List T decode List T sequence
org apache commons math optimization java io Serializable public GoalType Serializable MAXIMIZE MINIMIZE
org apache commons math linear public AnyMatrix boolean isSquare int getRowDimension int getColumnDimension
org apache commons math stat ranking public TiesStrategy SEQUENTIAL MINIMUM MAXIMUM AVERAGE RANDOM
org apache commons math genetics public CrossoverPolicy ChromosomePair crossover Chromosome first Chromosome second
org apache commons math distribution public DiscreteDistribution Distribution double probability double x
org apache commons math stat ranking public NaNStrategy MINIMAL MAXIMAL REMOVED FIXED
org apache commons math stat ranking public RankingAlgorithm double rank double data
org apache commons math genetics public SelectionPolicy ChromosomePair select Population population
org apache commons math genetics public StoppingCondition boolean isSatisfied Population population
org apache commons math genetics public MutationPolicy Chromosome mutate Chromosome original
org apache commons math public Field T T getZero T getOne
org apache commons math optimization general public ConjugateGradientFormula FLETCHER_REEVES POLAK_RIBIERE
org apache commons math random public RandomVectorGenerator double nextVector
org apache commons math random public NormalizedRandomGenerator double nextNormalizedDouble
org apache commons math genetics public Fitness double fitness
Normalized file-level corpus
Learning
-
Train language model
-
Encode fragments
-
Cluster identifiers
Repair
-
Core repair loop
-
Sorting ingredients
-
Transforming ingredients
Neural network language model
Recognition
-
Build source model
-
Build corpora
-
Normalize corpora
Learning
-
Train language model
-
Encode fragments
-
Cluster identifiers
Repair
-
Core repair loop
-
Sorting ingredients
-
Transforming ingredients
return (Double.isNaN(x) && Double.isNaN(y)) || x == y;
Recognition
-
Build source model
-
Build corpora
-
Normalize corpora
Learning
-
Train language model
-
Encode fragments
-
Cluster identifiers
Repair
-
Core repair loop
-
Sorting ingredients
-
Transforming ingredients
Math-63 Identifiers' Embeddings
vecAbsoluteTolerance
vecRelativeTolerance
maxStep
minStep
nSteps
scalRelativeTolerance
scalAbsoluteTolerance
blockColumn
blockEndRow
blockStartColumn
columnsShift
iRow
jColumn
blockRow
absAsinh
cosaa
defaultMaximalIterationCount
tolerance
y3
x3
cosab
sinb
absAtanh
dstWidth
srcEndRow
pBlock
srcBlock
srcWidth
absoluteAccuracy
functionValueAccuracy
yMin
relativeAccuracy
oldt
oldx
oldDelta
delta
tol1
steadyStateThreshold
maxDenominator
upperBounds
SAFE_MIN
MIN_VALUE
stop
NEGATIVE_INFINITY
DEFAULT_EPSILON
accuracy
maxAbsoluteValue
tol
stepEnd
dstPos
srcPos
mIndex
srcRow
srcStartRow
cosa
sina
cotanFlag
cosb
lastTime
blockEndColumn
blockStartRow
nextGeneration
population
populationLimit
rln10b
rln10a
absSinh
endIndex
rowsShift
maxColSum
minRatioPositions
errfac
stopTime
eps
iterationCount
chromosomes
maxDegree
outBlock
totalEvaluations
Recognition
-
Build source model
-
Build corpora
-
Normalize corpora
Learning
-
Train language model
-
Encode fragments
-
Cluster identifiers
Repair
-
Core repair loop
-
Sorting ingredients
-
Transforming ingredients
Stmt 1
Stmt 2
Stmt 3
Pass/Fail
Entity
T
1
T
2
T
3
T
4
T
5
P
F
P
F
P
Test Cases
Fault Localization
Repair Operators
- InsertOp
- RemoveOp
- ReplaceOp
Recognition
-
Build source model
-
Build corpora
-
Normalize corpora
Learning
-
Train language model
-
Encode fragments
-
Cluster identifiers
Repair
-
Core repair loop
-
Sorting ingredients
-
Transforming ingredients
MathUtils::equals(double, double)
public final class MathUtils {
/** Safe minimum, such that 1 / SAFE_MIN does not overflow.
* <p>In IEEE 754 arithmetic, this is also the smallest normalized
* number 2<sup>-1022</sup>.</p>
*/
public static final double SAFE_MIN = 0x1.0p-1022;
/**
* Returns true iff they are equal as defined by
* {@link #equals(double,double,int) equals(x, y, 1)}.
*
* @param x first value
* @param y second value
* @return {@code true} if the values are equal.
*/
public static boolean equals(double x, double y) {
return (Double.isNaN(x) && Double.isNaN(y)) || x == y;
}
public static boolean equals(double x, double y, double eps) {
return equals(x, y, 1) || FastMath.abs(y - x) <= eps;
}
}
Recognition
-
Build source model
-
Build corpora
-
Normalize corpora
Learning
-
Train language model
-
Encode fragments
-
Cluster identifiers
Repair
-
Core repair loop
-
Sorting ingredients
-
Transforming ingredients
DeepRepair Patch
--- a/src/main/java/org/apache/commons/math/util/MathUtils.java
+++ b/src/main/java/org/apache/commons/math/util/MathUtils.java
@@ -181,7 +181,7 @@
}
public static boolean equals(double x, double y) {
- return ((Double.isNaN(x)) && (Double.isNaN(y))) || (x == y);
+ return (equals(x, y, 1)) || ((FastMath.abs((y - x))) <= (SAFE_MIN));
}
public static boolean equalsIncludingNaN(double x, double y) {
Human-written Patch
--- a/src/main/java/org/apache/commons/math/util/MathUtils.java
+++ b/src/main/java/org/apache/commons/math/util/MathUtils.java
@@ -414,7 +414,7 @@ public final class MathUtils {
* @return {@code true} if the values are equal.
*/
public static boolean equals(double x, double y) {
+ return equals(x, y, 1);
- return (Double.isNaN(x) && Double.isNaN(y)) || x == y;
}
Recognition
-
Build source model
-
Build corpora
-
Normalize corpora
Learning
-
Train language model
-
Encode fragments
-
Cluster identifiers
Repair
-
Core repair loop
-
Sorting ingredients
-
Transforming ingredients
Empirical Validation
- Research questions (see paper for specifics)
- RQ1. Evaluated sorting in isolation.
- RQ2. Evaluated transforming in isolation.
- RQ3. Evaluated sorting with the ability to transform.
- RQ4. Conducted a quality study.
- Data collection procedure
- Analysis procedure
Data Collection Procedure
- Recognition
- Spoon
- File-, type-, and executable-level corpora
- Normalized chars, floats, ints, and strings
- Learning
- word2vec
- Recursive autoencoders
- k-means and simulated annealing
- Repair
- Defects4J: 6 Java projects including 374 buggy program revisions
- GZoltar (Ochiai); Astor 3-hour evolutionary loop
- 20,196 trials (374 revisions, 6 strategies, 3 scopes, 3 seeds)
- 2,616 days (62,784 hours) of computation time
Analysis Procedure
- Quantitative (Effectiveness)
- Compare # test-adequate patches using Wilcoxon with Bonferroni
- Compute difference between sets of test-adequate patches
- Compare # attempts to generate test-adequate patches using Mann-Whitney with Bonferroni
- Compute # attempts to generate a compilable ingredient
- Qualitative (Correctness)
- Correctness
- Confidence
- Readability
Empirical Results
- Six bugs were unlocked by DeepRepair configurations
- DeepRepair finds compilable ingredients faster than jGenProg
- Neither yields test-adequate patches in fewer attempts (on average)
- Nor finds significantly more patches than jGenProg
- Notable differences between DeepRepair and jGenProg patches
- No significant difference in quality
Conclusion
- Patches that use novel expressions are unattainable with existing redundancy-based repair techniques.
- We use code similarities to intelligently select and adapt ingredients.
Recognition
-
Build source model
-
Build corpora
-
Normalize corpora
Learning
-
Train language model
-
Encode fragments
-
Cluster identifiers
Repair
-
Core repair loop
-
Sorting ingredients
-
Transforming ingredients
- Key results
- DeepRepair finds patches that cannot be found by existing redundancy-based repair techniques.
- We conducted a computationally intensive empirical study that introduced new metrics.
Backups
deeprepair
By martingwhite
deeprepair
SANER 2019
- 1,028