David N. Palacio
assigned
new Webex requirement
checks
implements
repos
architecture team
assigned
new Webex requirement
checks
implements
repos
affects a set of artifacts
architecture team
inspects
new Webex requirement
checks
repos
affects a set of artifacts
Source Code
Test Cases
Bug Reports
Requirements
Software Team Assessment
Source File
Requirement (Issue) File
How deep are the artifacts correlated?
Source File
Requirement (Issue) File
Test File
Source File
Requirement (Issue) File
Test File
Trace Link (Similarity) Value [0,1]
Source Artifacts (i.e., requirement files)
Target Artifacts (i.e., source code files)
IR on Security Req
0
COMET
CI T-Miner
1
1
IR on Security Req
0
SecureReqNet
COMET
CI T-Miner
1
2
1
IR on Security Req
0
SecureReqNet
COMET
CI T-Miner
1
2
1
IR on Security Req
0
Deep Unsupervised Traceability
3
SecureReqNet
COMET
T-Miner
CI T-Miner
1
2
1
5
IR on Security Req
0
Deep Unsupervised Traceability
3
SecureReqNet
COMET
T-Miner
CI T-Miner
1
2
1
5
IR on Security Req
0
Deep Unsupervised Traceability
3
SecureReqNet
COMET
T-Miner
CI T-Miner
Why-Trace
1
2
1
5
IR on Security Req
0
Deep Unsupervised Traceability
3
4
Research
Dev
SecureReqNet
COMET
T-Miner
CI T-Miner
Why-Trace
1
2
1
5
IR on Security Req
0
Deep Unsupervised Traceability
3
4
Research
Dev
Issue Tracker
Security Related
non-Security Related
Issue Tracker
(Shallow) SecureReqNet
α-SecureReqNet
SecureReqNet
COMET
T-Miner
CI T-Miner
Why-Trace
1
2
1
5
IR on Security Req
0
Deep Unsupervised Traceability
3
4
Research
Dev
Source File
Requirement File
Source File
Requirement File
Test File
Source File
Requirement File
Test File
Source File
Requirement File
Test File
Test File
Use the model to predict the outcomes for new data points
Use the model to learn about the data generation process
Statistical Inference Methods:
Learning Process:
Source File
Requirement File
Test File
Test File
The likelihood is a fitted distribution for the IR outcomes or observations O, given the probability of H. H is the hypothesis that the link exists.
The prior probability of H. It can be drawn from the factors that influence the traceability: transitive links, other observations of IR values, or developers' feedback.
The marginal likelihood or "model evidence". This factor does not affect the hypothesis H.
The posterior probability that a trace link exits; it can be interpreted as the impact of an observation O on the probability of H
[COMET]
SecureReqNet
COMET
T-Miner
CI T-Miner
Why-Trace
1
2
1
5
IR on Security Req
0
Deep Unsupervised Traceability
3
4
Research
Dev
Model | Observation | Linked? |
---|---|---|
VSM | 0.085 | 0 |
JS | 0.446 | 1 |
LDA | 0.01 | 0 |
Textual Similarities
Developers' Feedback
Transtive Links
The BETA distribution is fitted from distinct observations of IR techniques
A different BETA distribution is fitted from distinct observations of Developers' feedback from the link under study
Source File
Test File
Test File
A BETA mixture model is employed to model all transitive (probabilistic) links
How do we compute a posterior probability given the traceability hyperpriors?
LSTM-based
IRs
COMET
Bounding the effectiveness of Unsupervised Software Traceability with Information Decomposition
Bounding the effectiveness of Unsupervised Software Traceability with Information Decomposition
We need more datasets to perform an empirical evaluation that supports our claims
Paragragh Vector and Embeddings
SecureReqNet
COMET
T-Miner
CI T-Miner
Why-Trace
1
2
1
5
IR on Security Req
0
Deep Unsupervised Traceability
3
4
Research
Dev
Word Neural Models (skip-gram)
Paragraph Neural Models (doc2vec)
Input Layer
word
samples:
1, 20/100
Merge (dot)
Embedding Layer
context
samples: 1
Reshaping
samples:
20/100
samples: 1
Sigmoid
Unsupervised Embedding
{'attack': ['network', 'exploit', 'unauthor'], 'code': ['execut', 'inform', 'special'], 'exploit': ['success', 'network', 'attack']}
Word Vectors (or skip-gram)
AUC = 0.66
Paragraph Vectors
AUC = 0.62
Word Vectors (or skip-gram)
auprg= 0.38
Information Analysis, Transmitted Information, and Clustering
SecureReqNet
COMET
T-Miner
CI T-Miner
Why-Trace
1
2
1
5
IR on Security Req
0
Deep Unsupervised Traceability
3
4
Research
Dev
Entropy
Extropy
COMET + SecureReqNet + Interpretability
SecureReqNet
COMET
T-Miner
CI T-Miner
Why-Trace
1
2
1
5
IR on Security Req
0
Deep Unsupervised Traceability
3
4
Research
Dev
Adapting IR/ML Approaches
Introducing the Probabilistic Nature of the Traceability Problem
Using Information Science Theory to Understand Traceability Models
Mining Software Artifacts for Continious Integration
Relative Frequencies
Relative Frequencies
Word Vectors (or skip-gram) [WMD]
Paragraph Vectors [COS]
REQUIREMENT 4: OBTAINING CA CERTIFICATES:
The EST client can request a copy of the current EST CA certificate(s) from the EST server. The EST client is assumed to perform this operation before performing other operations. |
I am 3th year PhD Student at William and Mary in Computer Science.
I was born in Bogota, Colombia. I did my undergrad in Computer Engineer at The National University of Colombia (UNAL). My master was in CS between The Technical University of Munich (TUM) and UNAL.
Research interest: Deep Learning for SE, Natural Computation, Causal Inference for SE, Code Generation and Representation
Hobbies: Kayaking, Hiking, Movies,
Mentor: Chris Shenefiel
Manager: Jim Warren
UNAL'17
W&M'17
Textual Similarities
Developers' Feedback
Transtive Links
The BETA distribution is fitted from distinct observations of IR techniques
A different BETA distribution is fitted from distinct observations of Developers' feedback from the link under study
Source File
Test File
Test File
A BETA mixture model is employed to model all transitive (probabilistic) links
The link recovery problem is the fundamental problem in Software Traceability; it consists in automatically establishing the relationships of artifacts allowing for the evolution of the system and the nature of the data
The link recovery problem: How would you compute theta?
Source File
Requirement File
How do we enhance link recovery with recent IR/ML approaches?
Trace Link (Similarity) Value [0,1]
Trace Link from Requirement to Test Case
Execution Trace from Source Code to Test Case
Source Artifacts (i.e., requirement files)
Target Artifacts (i.e., source code files)
How do we enhance link recovery with recent IR/ML approaches?
Source File
Requirement File
Test File
What if we compute a second theta for Req to Tc? Is the initial theta affected?
Source File
Requirement File
Test File
And what if we add more information?
Source File
Requirement File
Test File
Test File