Martin White advised by Denys Poshyvanyk
Searching for patterns in data
Analysis, design, implementation, maintenance, and evolution of software systems
How I want to represent...
What I want to represent...
How can we improve representation power?
Martin White advised by Denys Poshyvanyk
Artifacts that are produced and archived during the software development lifecycle
Improve representation power
Toward deep learning software repositories
Deep learning code fragments for code clone detection
Sorting and transforming program repair ingredients via deep learning
Automated Software Engineering (ASE)
Empirical Software Engineering (EMSE)
Foundations of Software Engineering (FSE)
International Conference on Program Comprehension (ICPC)
International Conference on Software Engineering (ICSE)
International Conference on Software Maintenance and Evolution (ICSME)
Mining Software Repositories (MSR)
Transactions on Software Engineering (TSE)
Transactions on Software Engineering and Methodology (TOSEM)
Exploring the use of deep learning for feature location, ICSME'15
Combining deep learning with information retrieval to localize buggy files for bug reports, ASE'15
Automatically learning semantic features for defect prediction, ICSE'16
Guided code synthesis using deep neural networks, FSE'16
DeepSoft: A vision for a deep model of software, FSE'16
Deep API learning, FSE'16
SimilarTech: Automatically recommend analogical libraries across different programming languages, ASE'16
Learning a dual-language vector space for domain-specific cross-lingual question retrieval, ASE'16
Deep learning code fragments for code clone detection, ASE'16
Predicting semantically linkable knowledge in developer online forums via convolutional neural network, ASE'16
Replicating parser behavior using neural machine translation, ICPC'17
Bug localization with combination of deep learning and information retrieval, ICPC'17
Semantically enhanced software traceability using deep learning techniques, ICSE'17
Domain adaptation for test report classification in crowdsourced testing, ICSE-SEIP'17
Easy over hard: A case study on deep learning, FSE'17
Are deep learning models the best for source code?, FSE'17
Towards accurate duplicate bug retrieval using deep learning, ICSME'17
CCLearner: A deep learning-based clone detection approach, ICSME'17
Understanding diverse usage patterns from large-scale appstore-service profiles, TSE'17
Predicting the delay of issues with due dates in software projects, EMSE'17
modality
statistical language models are probability distributions over sentences
statistical language models are probability distributions over sentences
statistical language models are probability distributions over sentences
statistical language models are probability distributions over sentences
Neural networks comprise neuron-like processing units
Keywords. Deep Architecture, Deep Learning
Model sequences of terms in a source code corpus
static File ClassLoaderHelper
static File mapAlternativeName
static File _localctx
static File predindex
static File _sharedContextCache
RQ1. Do deep learners significantly outperform state-of-the-practice models at code suggestion on our dataset?
RQ2. Are there any distinguishing characteristics of the test documents on which the deep learners achieve considerably better performance?
@Extension
public class CustomMatrixListener extends ... {
public CustomMatrixListener() {
...
@Override
public void onCompleted ... {
Model | Top-1 | Top-5 | Top-10 |
---|---|---|---|
Interpolated 8-gram | 49.7 | 71.3 | 78.1 |
Interpolated 8-gram 100-cache | 04.8 | 69.5 | 78.5 |
Static (400, 5) | 61.1 | 78.4 | 81.4 |
Dynamic (300, 20) | 72.2 | 88.4 | 92.0 |
Toward deep learning software repositories
Deep learning code fragments for code clone detection
Sorting and transforming program repair ingredients via deep learning
A code fragment is a contiguous segment of source code. Code clones are two or more fragments that are similar with respect to a clone type.
Identical up to variations in comments, whitespace, or layout [Roy'07]
if (a >= b) {
c = d + b; // Comment1
d = d + 1;}
else
c = d - a; //Comment2
if (a>=b) {
// Comment1'
c=d+b;
d=d+1;}
else // Comment2'
c=d-a;
if (a>=b)
{ // Comment1''
c=d+b;
d=d+1;
}
else // Comment2''
c=d-a;
Identical up to variations in names and values, comments, etc. [Roy'07]
if (a >= b) {
c = d + b; // Comment1
d = d + 1;}
else
c = d - a; //Comment2
if (m >= n)
{ // Comment1'
y = x + n;
x = x + 5; //Comment3
}
else
y = x - m; //Comment2'
A parameterized clone for this fragment is
Modifications include statement(s) changed, added, or deleted [Roy'07]
public int getSoLinger() throws SocketException {
Object o = impl.getOption(SocketOptions.SO_LINGER);
if (o instanceof Integer) {
return((Integer) o).intValue();
}
else return -1;
}
public synchronized int getSoTimeout() // This statement is changed
throws SocketException {
Object o = impl.getOption(SocketOptions.SO_TIMEOUT);
if (o instanceof Integer) {
return((Integer) o).intValue();
}
else return -0;
}
Syntactically dissimilar fragments with similar functionality [Roy'07]
int i, j=1;
for (i=1; i<=VALUE; i++)
j=j*i;
int factorial(int n) {
if (n == 0) return 1 ;
else return n * factorial(n-1) ;
}
Now consider a recursive function that calculates the factorial
Techniques can be classified by their source code representation
public int getSoLinger() throws SocketException {
Object o = impl.getOption(SocketOptions.SO_LINGER);
if (o instanceof Integer) {
return((Integer) o).intValue();
}
else return -1;
Our new set of techniques fuse and use
Couples deep learners to front end compiler stages
We use a recurrent neural network to map terms to embeddings
What we would like to have is not only embeddings for terms but also embeddings for fragments
Generalize recurrent neural networks by modeling structures
We use a recursive autoencoder to encode sequences of embeddings
AST-based encoding
Greedy encoding
System | Files | LOC | Tokens | Vocab. |
ANTLR 4 | 1,514 | 104,225 | 1,701,807 | 15,826 |
Apache Ant 1.9.6 | 1,218 | 136,352 | 1,888,424 | 16,029 |
ArgoUML 0.34 | 1,908 | 177,493 | 1,172,058 | 17,205 |
CAROL 2.0.5 | 1,184 | 112,022 | 1,180,947 | 12,210 |
dnsjava 2.0.0 | 1,196 | 124,660 | 1,169,219 | 13,012 |
Hibernate 2 | 1,555 | 151,499 | 1,365,256 | 15,850 |
JDK 1.4.2 | 4,129 | 562,120 | 3,512,807 | 45,107 |
JHotDraw 6 | 1,984 | 158,130 | 1,377,652 | 14,803 |
System | Files | LOC | Tokens | Vocab. |
ANTLR 4 | 1,514 | 104,225 | 1,701,807 | 15,826 |
Apache Ant 1.9.6 | 1,218 | 136,352 | 1,888,424 | 16,029 |
ArgoUML 0.34 | 1,908 | 177,493 | 1,172,058 | 17,205 |
CAROL 2.0.5 | 1,184 | 112,022 | 1,180,947 | 12,210 |
dnsjava 2.0.0 | 1,196 | 124,660 | 1,169,219 | 13,012 |
Hibernate 2 | 1,555 | 151,499 | 1,365,256 | 15,850 |
JDK 1.4.2 | 4,129 | 562,120 | 3,512,807 | 45,107 |
JHotDraw 6 | 1,984 | 158,130 | 1,377,652 | 14,803 |
System | Files | LOC | Tokens | Vocab. |
ANTLR 4 | 1,514 | 104,225 | 1,701,807 | 15,826 |
Apache Ant 1.9.6 | 1,218 | 136,352 | 1,888,424 | 16,029 |
ArgoUML 0.34 | 1,908 | 177,493 | 1,172,058 | 17,205 |
CAROL 2.0.5 | 1,184 | 112,022 | 1,180,947 | 12,210 |
dnsjava 2.0.0 | 1,196 | 124,660 | 1,169,219 | 13,012 |
Hibernate 2 | 1,555 | 151,499 | 1,365,256 | 15,850 |
JDK 1.4.2 | 4,129 | 562,120 | 3,512,807 | 45,107 |
JHotDraw 6 | 1,984 | 158,130 | 1,377,652 | 14,803 |
Toward deep learning software repositories
Deep learning code fragments for code clone detection
Sorting and transforming program repair ingredients via deep learning
Transformation of an unacceptable behavior of a program execution into an acceptable one according to a specification.
Our learning-based approach couples learners to front-end compiler stages
Automated program repair is hard.
Automated program repair is really hard.
Generate-and-validate repair techniques (generally) search for statement-level modifications and validate patches against the test suit
Correct-by-construction repair techniques use program analysis and program synthesis to construct code with particular properties
public final class MathUtils {
public static boolean equals(double x, double y) {
return (Double.isNaN(x) && Double.isNaN(y)) || x == y;
}
}
public final class MathUtils {
public static boolean equals(double x, double y) {
return equals(x, y, 1);
}
}
if (max_range_endpoint < eol_range_start)
max_range_endpoint = eol_range_start;
printable_field = xzalloc(max_range_endpoint/CHAR_BIT+1);
if (max_range_endpoint < eol_range_start)
max_range_endpoint = eol_range_start;
if (1)
printable_field = xzalloc(max_range_endpoint/CHAR_BIT+1);
if (AAA)
max_range_endpoint = BBB;
if (CCC)
printable_field = xzalloc(max_range_endpoint/CHAR_BIT+1);
if (0)
max_range_endpoint = eol_range_start;
if (!(max_range_endpoint == 0))
printable_field = xzalloc(max_range_endpoint/CHAR_BIT+1);
S. Mechtaev, J. Yi, and A. Roychoudhury, Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis, ICSE 2016.
Large programs contain the seeds of their own repair [Martinez'14,Barr'14]
On the problem of navigating complex fix spaces, we use code similarities to intelligently select and adapt program repair ingredients
Recognition
Learning
Repair
Build source code model
Build corpora
Normalize corpora
package org.apache.commons.math.util;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Arrays;
import org.apache.commons.math.MathRuntimeException;
import org.apache.commons.math.exception.util.Localizable;
import org.apache.commons.math.exception.util.LocalizedFormats;
import org.apache.commons.math.exception.NonMonotonousSequenceException;
/**
* Some useful additions to the built-in functions in {@link Math}.
* @version $Revision$ $Date$
*/
public final class MathUtils {
/** Smallest positive number such that 1 - EPSILON is not numerically equal to 1. */
public static final double EPSILON = 0x1.0p-53;
/** Safe minimum, such that 1 / SAFE_MIN does not overflow.
* <p>In IEEE 754 arithmetic, this is also the smallest normalized
* number 2<sup>-1022</sup>.</p>
*/
public static final double SAFE_MIN = 0x1.0p-1022;
Build source code model
Build corpora
Normalize corpora
org apache commons math ode events public EventHandler int STOP 0 int RESET_STATE 1 int RESET_DERIVATIVES 2
org apache commons math ode nonstiff public MidpointIntegrator RungeKuttaIntegrator private static final double STATIC_C
org apache commons math distribution org apache commons math MathException public ContinuousDistribution Distribution
org apache commons math distribution org apache commons math MathException public HasDensity P double density P x
org apache commons math genetics java util List public PermutationChromosome T List T decode List T sequence
org apache commons math optimization java io Serializable public GoalType Serializable MAXIMIZE MINIMIZE
org apache commons math linear public AnyMatrix boolean isSquare int getRowDimension int getColumnDimension
org apache commons math stat ranking public TiesStrategy SEQUENTIAL MINIMUM MAXIMUM AVERAGE RANDOM
org apache commons math genetics public CrossoverPolicy ChromosomePair crossover Chromosome first Chromosome second
org apache commons math distribution public DiscreteDistribution Distribution double probability double x
org apache commons math stat ranking public NaNStrategy MINIMAL MAXIMAL REMOVED FIXED
org apache commons math stat ranking public RankingAlgorithm double rank double data
org apache commons math genetics public SelectionPolicy ChromosomePair select Population population
org apache commons math genetics public StoppingCondition boolean isSatisfied Population population
org apache commons math genetics public MutationPolicy Chromosome mutate Chromosome original
org apache commons math public Field T T getZero T getOne
org apache commons math optimization general public ConjugateGradientFormula FLETCHER_REEVES POLAK_RIBIERE
org apache commons math random public RandomVectorGenerator double nextVector
org apache commons math random public NormalizedRandomGenerator double nextNormalizedDouble
org apache commons math genetics public Fitness double fitness
Build source code model
Build corpora
Normalize corpora
org apache commons math ode events public EventHandler int STOP <INT> int RESET_STATE <INT> int RESET_DERIVATIVES <INT>
org apache commons math ode nonstiff public MidpointIntegrator RungeKuttaIntegrator private static final double STATIC_C
org apache commons math distribution org apache commons math MathException public ContinuousDistribution Distribution
org apache commons math distribution org apache commons math MathException public HasDensity P double density P x
org apache commons math genetics java util List public PermutationChromosome T List T decode List T sequence
org apache commons math optimization java io Serializable public GoalType Serializable MAXIMIZE MINIMIZE
org apache commons math linear public AnyMatrix boolean isSquare int getRowDimension int getColumnDimension
org apache commons math stat ranking public TiesStrategy SEQUENTIAL MINIMUM MAXIMUM AVERAGE RANDOM
org apache commons math genetics public CrossoverPolicy ChromosomePair crossover Chromosome first Chromosome second
org apache commons math distribution public DiscreteDistribution Distribution double probability double x
org apache commons math stat ranking public NaNStrategy MINIMAL MAXIMAL REMOVED FIXED
org apache commons math stat ranking public RankingAlgorithm double rank double data
org apache commons math genetics public SelectionPolicy ChromosomePair select Population population
org apache commons math genetics public StoppingCondition boolean isSatisfied Population population
org apache commons math genetics public MutationPolicy Chromosome mutate Chromosome original
org apache commons math public Field T T getZero T getOne
org apache commons math optimization general public ConjugateGradientFormula FLETCHER_REEVES POLAK_RIBIERE
org apache commons math random public RandomVectorGenerator double nextVector
org apache commons math random public NormalizedRandomGenerator double nextNormalizedDouble
org apache commons math genetics public Fitness double fitness
Train language model
Encode fragments
Cluster identifiers
Train language model
Encode fragments
Cluster identifiers
return (Double.isNaN(x) && Double.isNaN(y)) || x == y;
Train language model
Encode fragments
Cluster identifiers
vecAbsoluteTolerance
vecRelativeTolerance
maxStep
minStep
nSteps
scalRelativeTolerance
scalAbsoluteTolerance
blockColumn
blockEndRow
blockStartColumn
columnsShift
iRow
jColumn
blockRow
absAsinh
cosaa
defaultMaximalIterationCount
tolerance
y3
x3
cosab
sinb
absAtanh
dstWidth
srcEndRow
pBlock
srcBlock
srcWidth
absoluteAccuracy
functionValueAccuracy
yMin
relativeAccuracy
oldt
oldx
oldDelta
delta
tol1
steadyStateThreshold
maxDenominator
upperBounds
SAFE_MIN
MIN_VALUE
stop
NEGATIVE_INFINITY
DEFAULT_EPSILON
accuracy
maxAbsoluteValue
tol
stepEnd
dstPos
srcPos
mIndex
srcRow
srcStartRow
cosa
sina
cotanFlag
cosb
lastTime
blockEndColumn
blockStartRow
nextGeneration
population
populationLimit
rln10b
rln10a
absSinh
endIndex
rowsShift
maxColSum
minRatioPositions
errfac
stopTime
eps
iterationCount
chromosomes
maxDegree
outBlock
totalEvaluations
Core repair loop
Sorting ingredients
Transforming ingredients
Stmt 1
Stmt 2
Stmt 3
Pass/Fail
Entity
T
1
T
2
T
3
T
4
T
5
P
F
P
F
P
Test Cases
Fault Localization
Repair Operators
Core repair loop
Sorting ingredients
Transforming ingredients
MathUtils::equals(double, double)
public final class MathUtils {
/** Safe minimum, such that 1 / SAFE_MIN does not overflow.
* <p>In IEEE 754 arithmetic, this is also the smallest normalized
* number 2<sup>-1022</sup>.</p>
*/
public static final double SAFE_MIN = 0x1.0p-1022;
/**
* Returns true iff they are equal as defined by
* {@link #equals(double,double,int) equals(x, y, 1)}.
*
* @param x first value
* @param y second value
* @return {@code true} if the values are equal.
*/
public static boolean equals(double x, double y) {
return (Double.isNaN(x) && Double.isNaN(y)) || x == y;
}
public static boolean equals(double x, double y, double eps) {
return equals(x, y, 1) || FastMath.abs(y - x) <= eps;
}
}
Core repair loop
Sorting ingredients
Transforming ingredients
DeepRepair Patch
--- a/src/main/java/org/apache/commons/math/util/MathUtils.java
+++ b/src/main/java/org/apache/commons/math/util/MathUtils.java
@@ -181,7 +181,7 @@
}
public static boolean equals(double x, double y) {
- return ((Double.isNaN(x)) && (Double.isNaN(y))) || (x == y);
+ return (equals(x, y, 1)) || ((FastMath.abs((y - x))) <= (SAFE_MIN));
}
public static boolean equalsIncludingNaN(double x, double y) {
Human-written Patch
--- a/src/main/java/org/apache/commons/math/util/MathUtils.java
+++ b/src/main/java/org/apache/commons/math/util/MathUtils.java
@@ -414,7 +414,7 @@ public final class MathUtils {
* @return {@code true} if the values are equal.
*/
public static boolean equals(double x, double y) {
+ return equals(x, y, 1);
- return (Double.isNaN(x) && Double.isNaN(y)) || x == y;
}
Improve representation power
Martin White advised by Denys Poshyvanyk
Toward deep learning software repositories
Deep learning code fragments for code clone detection
Sorting and transforming program repair ingredients via deep learning
Deep Learning Software Repositories, NSF SHF: Small.
M. White, C. Vendome, M. Linares-Vasquez, and D. Poshyvanyk. Toward Deep Learning Software Repositories. Proceedings of the 12th Working Conference on Mining Software Repositories (MSR'15).
M. White. Deep Representations for Software Engineering. Proceedings of the 37th International Conference on Software Engineering (ICSE'15) - Volume 2.
C. Corley, K. Damevski, N. Kraft, Exploring the use of deep learning for feature location, ICSME'15
A. Lam, A. Nguyen, H. Nguyen, T. Nguyen, Combining deep learning with information retrieval to localize buggy files for bug reports, ASE'15
S. Wang, T. Liu, L. Tan, Automatically learning semantic features for defect prediction, ICSE'16
C. Alexandru, Guided code synthesis using deep neural networks, FSE'16
H. Dam, T. Tran, J. Grundy, A. Ghose, DeepSoft: A vision for a deep model of software, FSE'16
X. Gu, H. Zhang, D. Zhang, S. Kim, Deep API learning, FSE'16
C. Chen, Z. Xing, SimilarTech: Automatically recommend analogical libraries across different programming languages, ASE'16
G. Chen, C. Chen, Z. Xing, B. Xu, Learning a dual-language vector space for domain-specific cross-lingual question retrieval, ASE'16
M. White, M. Tufano, C. Vendome, D. Poshyvanyk, Deep learning code fragments for code clone detection, ASE'16
B. Xu, D. Ye, Z. Xing, X. Xia, G. Chen, S. Li, Predicting semantically linkable knowledge in developer online forums via convolutional neural network, ASE'16
C. Alexandru, S. Panichella, H. Gall, Replicating parser behavior using neural machine translation, ICPC'17
A. Lam, A. Nguyen, H. Nguyen, T. Nguyen, Bug localization with combination of deep learning and information retrieval, ICPC'17
J. Guo, J. Cheng, J. Cleland-Huang, Semantically enhanced software traceability using deep learning techniques, ICSE'17
A. Sankaran, R. Aralikatte, S. Mani, S. Khare, N. Panwar, N. Gantayat, DARVIZ: Deep abstract representation, visualization, and verification of deep learning models, ICSE-NIER'17
J. Wang, Q. Cui, S. Wang, Q. Wang, Domain adaptation for test report classification in crowdsourced testing, ICSE-SEIP'17
X. Liu, X. Lu, H. Li, T. Xie, Q. Mei, H. Mei, F. Feng, Understanding diverse usage patterns from large-scale appstore-service profiles, TSE'17
M. Choetkiertikul, H. Dam, T. Tran, A. Ghose, Predicting the delay of issues with due dates in software projects, EMSE'17
statistical language models are probability distributions over sentences
statistical language models are probability distributions over sentences
statistical language models are probability distributions over sentences
statistical language models are probability distributions over sentences
statistical language models are probability distributions over sentences
Smoothing adjusts MLEs, e.g., hallucinate data
Reconsider the example using this new distribution
Back-off and interpolation are two methods for redistributing mass
toward deep learning software deep
toward deep learning software learning
toward deep learning software expressive
toward deep learning software represent
toward deep learning software neural
toward deep learning software model
toward deep learning software artifacts
toward deep learning software language
toward deep learning software repositories
toward deep learning software engineering
toward deep learning software network
toward deep learning software context
toward deep learning software recurrent
toward deep learning software semantics
toward deep learning software probability
toward deep learning software deep
toward deep learning software learning
toward deep learning software expressive
toward deep learning software represent
toward deep learning software neural
toward deep learning software model
toward deep learning software artifacts
toward deep learning software language
toward deep learning software repositories
toward deep learning software engineering
toward deep learning software network
toward deep learning software context
toward deep learning software recurrent
toward deep learning software semantics
toward deep learning software probability
The average branching factor is a general proxy for quality
\data\
ngram 1=18
ngram 2=32
ngram 3=34
\1-grams:
-1.092146 </s>
-99 <s> -0.157377
-1.092146 ( -0.454680
-1.304235 ) -0.120574
-1.304235 + -0.120574
-1.304235 - -0.120574
-1.304235 1 -0.120574
-1.092146 2 -0.830539
-1.092146 ; -0.830539
-1.304235 < -0.120574
-1.304235 else -0.120574
-1.304235 fib
-1.304235 if -0.120574
-1.092146 int -0.120574
-1.304235 n -0.120574
-1.304235 return -0.120574
-1.304235 {
-1.304235 } -0.120574
\2-grams:
-1.188102 <s> if -0.076840
-1.062487 <s> int -0.076840
-0.606226 <s> return -0.076840
-1.188102 <s> { -0.076840
-1.461612 <s> } -0.024134
-0.961783 ( int -0.076840
-0.232397 ( n -0.024134
-0.914066 ) </s>
-1.007861 ) + -0.076840
-0.914066 ) ; -0.076840
-1.007861 ) { -0.076840
-0.552804 + fib -0.076840
-0.799116 - 1 -0.076840
-0.738769 - 2 -0.076840
-0.552804 1 ) -0.076840
-0.065701 2 ) -0.076840
-0.063375 ; </s>
-0.517557 < 2 -0.076840
-0.552804 else { -0.076840
-1.092146 fib ( -0.024134
-0.517557 if ( -0.076840
-0.799116 int fib -0.076840
-0.799116 int n -0.076840
-1.007861 n ) -0.076840
-1.007861 n - -0.076840
-0.914066 n ; -0.076840
-1.007861 n < -0.076840
-0.799116 return fib -0.076840
-0.799116 return n -0.076840
-1.092146 { </s>
-0.738769 } </s>
-0.799116 } else -0.076840
\3-grams:
-0.380268 <s> if (
-0.529852 <s> int fib
-0.669302 <s> return fib
-0.669302 <s> return n
-0.638407 <s> { </s>
-0.762903 <s> } </s>
-0.689770 <s> } else
-0.529852 ( int n
-1.031994 ( n -
-0.832825 ( n <
-0.401453 ) + fib
-0.052449 ) ; </s>
-0.638407 ) { </s>
-0.638407 + fib (
-0.401453 - 1 )
-0.054348 - 2 )
-0.611822 1 ) +
-0.737081 2 ) ;
-0.786849 2 ) {
-0.054348 < 2 )
-0.638407 else { </s>
-0.803133 fib ( int
-0.256530 fib ( n
-0.185218 if ( n
-0.638407 int fib (
-0.611822 int n )
-0.577938 n ) </s>
-0.669302 n - 1
-0.630830 n - 2
-0.052449 n ; </s>
-0.380268 n < 2
-0.638407 return fib (
-0.577938 return n ;
-0.401453 } else {
\end\
Improving capacity improves representation power. Improving representation power may improve the performance at real tasks.
Toward deep learning software repositories
Deep learning code fragments for code clone detection
Sorting and transforming program repair ingredients via deep learning
M. White, M. Tufano, C. Vendome, and Denys Poshyvanyk. Deep Learning Code Fragments for Code Clone Detection. Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE'16).
Case I
Case II
Use a grammar to handle nodes with degree greater than two
Establish a precedence to handle nodes with degree one
System | AST-based | Greedy | AST-based | Greedy |
ANTLR | 197 | 100 | 100 | 100 |
Apache Ant | 192 | 193 | 100 | 100 |
ArgoUML | 190 | 100 | 100 | 100 |
CAROL | 100 | 100 | 100 | 100 |
dnsjava | 147 | 100 | 173 | 187 |
Hibernate | 100 | 100 | 153 | 170 |
JDK | 190 | 100 | 100 | 100 |
JHotDraw | 100 | 100 | 100 | 100 |
File-level
Method-level
Precision Results (%)
Toward deep learning software repositories
Deep learning code fragments for code clone detection
Sorting and transforming program repair ingredients via deep learning
M. White, M. Tufano, M. Martinez, M. Monperrus, and D. Poshyvanyk. Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities. Under Review.
Automated program repair is hard.
Automated program repair is really hard.
Why did I pick automated program repair?
Core Repair Loop
Sorting Ingredients
Transforming Ingredients
Fault Localization
J. Xuan and M. Monperrus, Learning to Combine Multiple Ranking Metrics for Fault Localization, ICSME 2014.
ED
(E)xecutable-level similarity ingredient sorting
(D)efault ingredient application (no ingredient transformation)
(T)ype-level similarity ingredient sorting
(D)efault ingredient application (no ingredient transformation)
TD
(R)andom ingredient sorting
(E)mbeddings-based ingredient transformation
RE
(E)xecutable-level similarity ingredient sorting
(E)mbeddings-based ingredient transformation
EE
(T)ype-level similarity ingredient sorting
(E)mbeddings-based ingredient transformation
TE
ED
(E)xecutable-level similarity ingredient sorting
(D)efault ingredient application (no ingredient transformation)
(T)ype-level similarity ingredient sorting
(D)efault ingredient application (no ingredient transformation)
TD
(R)andom ingredient sorting
(E)mbeddings-based ingredient transformation
RE
(E)xecutable-level similarity ingredient sorting
(E)mbeddings-based ingredient transformation
EE
(T)ype-level similarity ingredient sorting
(E)mbeddings-based ingredient transformation
TE