Faizan Ahmad
I'm an undergraduate student in the computer science department, FAST NUCES.
(Code Similarity Check)
GROUP MEMBERS:
Ali Ghulam - - - - - - - - - - - - - - (P17-6009)
Faizan Ahmad - - - - - - - - - - - (P17-6020)
Muhammad Hafeez Ullah - - (P17-6144)
SUPERVISOR:
Shoaib Muhammad Khan
Assistant Professor
FAST NUCES, Peshawar Campus
slides.com/faizanf33/code-similarity-check-02
Distance Metric for Source code
Abstract Syntax trees
Replicating or altering code (immorality).
The original creator of source code.
Students coding ability drops.
Find similarities in one specific language.
Generate similarity reports for student code submissions by generating distance metric using abstract syntax trees.
Reference | Basic Idea | Method | Results | Limitations |
---|---|---|---|---|
[1] |
[BASE PAPER] Winnowing: Local Algorithms for Document Fingerprinting (MOSS) |
Uses winnowing algorithm to detect shortest match | Records fingerprints and position of the fingerprints in the document | Sequence of hashes generated by hashing k-grams is independent and uniformly random |
[2] | Comparing Python Programs Using Abstract Syntax Trees | Produces reports on the basis of similarity index | The model can detect code similarity using sub-tree (partial) indexing | Works on python language only |
[3] | Design pattern detection based on the graph theory | Detecting design patterns using a semantic graph | The model can detect similar patterns with high accuracy and efficiency |
Schleimer, Saul, Daniel S. Wilkerson, and Alex Aiken. "Winnowing: local algorithms for document fingerprinting." Proceedings of the 2003 ACM SIGMOD international conference on Management of data. 2003.
[1]
Salazar Paredes, Pedro. Comparing python programs using abstract syntax trees. BS thesis. Uniandes, 2020.
[2]
Bahareh Bafandeh Mayvan, Abbas Rasoolzadegan, Design pattern detection based on the graph theory, Knowledge-Based Systems (2017)
[3]
Reference | Basic Idea | Method | Results | Limitations |
---|---|---|---|---|
[4] | Using Latent Semantic Analysis to Identify Similarities in Source Code to Support Program Understanding | (SVD) Single Value Decomposition of a matrix derived from a corpus of natural text | Captures significant portions of the meaning not only of individual words | |
[5] | Euclidean Distance Matrices Essential Theory, Algorithms and Applications |
Design algorithms for completing and denoising distance data | Position calibration, room reconstruction from echoes and phase retrieval. |
Maletic, Jonathan I., and Andrian Marcus. "Using latent semantic analysis to identify similarities in source code to support program understanding." Proceedings 12th IEEE internationals conference on tools with artificial intelligence. ICTAI 2000. IEEE, 2000.
[4]
Dokmanic, Ivan, et al. "Euclidean distance matrices: essential theory, algorithms, and applications." IEEE Signal Processing Magazine 32.6 (2015): 12-30.
[5]
Upload high level source codes
Generate AST during syntax analysis
Analyse report
Create Distance Metric
Find similarity index using metric
Generate report
var area = PI * (radius ** 2);
Disassemble code
Generate abstract syntax tree
Create adjacent matrix
Calculate distance
Benefit: Insert, delete, substitute operations are allowed
Salazar Paredes, Pedro. Comparing python programs using abstract syntax trees. BS thesis. Uniandes, 2020.
[2]
Bahareh Bafandeh Mayvan, Abbas Rasoolzadegan, Design pattern detection based on the graph theory, Knowledge-Based Systems (2017)
[3]
Schleimer, Saul, Daniel S. Wilkerson, and Alex Aiken. "Winnowing: local algorithms for document fingerprinting." Proceedings of the 2003 ACM SIGMOD international conference on Management of data. 2003.
[1]
Dokmanic, Ivan, et al. "Euclidean distance matrices: essential theory, algorithms, and applications." IEEE Signal Processing Magazine 32.6 (2015): 12-30.
[5]
Maletic, Jonathan I., and Andrian Marcus. "Using latent semantic analysis to identify similarities in source code to support program understanding." Proceedings 12th IEEE internationals conference on tools with artificial intelligence. ICTAI 2000. IEEE, 2000.
[4]
slides.com/faizanf33/code-similarity-check-02
Thank you for your precious time.
Any Suggestions?
By Faizan Ahmad
Using graph theory and program disassembly to create abstract syntax trees from code. These will be used to generate similarity reports for student code submissions in different languages including Python, Java, and C++.
I'm an undergraduate student in the computer science department, FAST NUCES.