Code Similarity Check

GROUP MEMBERS:

Ali Ghulam - - - - - - - - - - - - - - (21078969)

SUPERVISOR:

Kufreh Sampson

Assistant Professor

Hertfordshire University 

Introduction

  • Replicating or altering code (immorality).
  • The original creator of source code?
  • Students coding ability drops.
  • Find similarities in different languages.

Motivation

  • MOSS
    (A System for Detecting Software Similarity)

[1]

  •    The analogy between fraud and plagiarism in the context of the Fraud Triangle.
[1]
#include <iostream>
using namespace std;

// Find fibonacci of a number 'n'
int fib(int n) {
    if (n <= 1)
        return n;
    
    return fib(n-1) + fib(n-2);
}
 
int main() {
    int n = 9;
    cout << fib(n) << endl;
    return 0;
}
### Find fibonacci of a number 'n'
def fib(n: int) -> int:
    if (n <= 1):
        return n
    
    return fib(n-1) + fib(n-2)
 

if __name__ == '__main__':
    n = 9
    print(fib(n))

Literature Review

[2]
  • Produces reports on the basis of similarity index.
  • The model can detect code similarity using sub-tree (partial) indexing.
Comparing Python Programs Using Abstract Syntax Trees
[3]
  • Detecting design patterns using a semantic graph.
  • The model can detect similar patterns with high accuracy and efficiency.
Design pattern detection based on the graph theory
[4]
  • Uses hashing technique to generate syntax tree.
  • Efficiently indexes AST representation and reduced false-posisitve collisions.
Syntax tree fingerprinting for source code similarity detection

Problem Statement

  • Generate similarity reports for student code submissions in different languages

Proposed Methodology

  • Disassemble code
  • Generate abstract syntax tree
  • Find similarity index

Objectives

  • Automate Code Plagiarism Check for teachers/instructors.
  • Advance students’ coding aptitudes by weakening duplicated code.
  • Improve research in code analysis area.

Gantt Chart

  • FYP-1:

Gantt Chart

  • FYP-2:

Flow Chart

References

Salazar Paredes, Pedro. Comparing python programs using abstract syntax trees. BS thesis. Uniandes, 2020.

[2]

Bahareh Bafandeh Mayvan, Abbas Rasoolzadegan, Design pattern detection
based on the graph theory, Knowledge-Based Systems (2017)

[3]

Chilowicz, Michel, Etienne Duris, and Gilles Roussel. "Syntax tree fingerprinting for source code similarity detection." 2009 IEEE 17th International Conference on Program Comprehension. IEEE, 2009.

[4]

Thank you for your precious time.

Any Suggestions?

Code Similarity Check

By Faizan Ahmad

Code Similarity Check

Using graph theory and program disassembly to create abstract syntax trees from code. These will be used to generate similarity reports for student code submissions in different languages including Python, Java, and C++.

  • 195