Copy/Paste detector

How to create copy/paste detector

Andrey Kucherenko

Areas to apply

Why?

Analogues

phpcpd

  • Reporters: json, pmd-xml, xslt, console
  • Blame authors of copy-paste
  • Skip comments
  • Extensions developed by other authors: glup-jscpd, html-reporter, grunt-jscpd etc.
jscpd --help                                                  ⏎
Usage:
  jscpd [OPTIONS]

Options: 
  -l, --min-lines NUMBER min size of duplication in code lines
  -t, --min-tokens NUMBERmim size of duplication in code tokens
  -c, --config FILE      path to config file
  -f, --files STRING     glob pattern for find code
  -e, --exclude STRING   directory to ignore
      --skip-comments    skip comments in code
  -b, --blame BOOLEAN    blame authors of duplications (get information 
                         about authors from git) 
      --languages-exts STRINGlist of languages with file extensions 
                             (language:ext1,ext2;language:ext3) 
  -g, --languages STRING list of languages which scan for duplicates, 
                         separated with comma 
  -o, --output PATH      path to report file
  -r, --reporter STRING  reporter to use
  -x, --xsl-href STRING  path to xsl for include to xml report
      --verbose          show full info about copies
  -d, --debug            show debug information(options list and selected 
                         files) 
  -p, --path PATH        path to code
      --limit NUMBER     limit of allowed duplications, if real duplications 
                         percent more then limit jscpd exit with error 
  -v, --version          Display the current version
  -h, --help             Display help and usage details

TODO

  • New Reporters (html)
  • Cross-Project Detection
  • Flexible API
  • Improve performance
  • Use different storages
  • Build reports for time period
  • Integrate with IDE
  • ...

How?

Rabin-Karp

/**
* Hello Copy Pasted World
*/
const name = 'jscpd';

if (name === 'pmd') {
    throw new Error('Use jscpd');
}
[
    {
        "type": "Keyword",
        "value": "const"
    },
    {
        "type": "Identifier",
        "value": "name"
    },
    {
        "type": "Punctuator",
        "value": "="
    },
    {
        "type": "String",
        "value": "'jscpd'"
    },
    {
        "type": "Punctuator",
        "value": ";"
    },
    {
        "type": "Keyword",
        "value": "if"
    },
    {
        "type": "Punctuator",
        "value": "("
    },
    {
        "type": "Identifier",
        "value": "name"
    },
    {
        "type": "Punctuator",
        "value": "==="
    },
    {
        "type": "String",
        "value": "'pmd'"
    },
    {
        "type": "Punctuator",
        "value": ")"
    },
    {
        "type": "Punctuator",
        "value": "{"
    },
    {
        "type": "Keyword",
        "value": "throw"
    },
    {
        "type": "Keyword",
        "value": "new"
    },
    {
        "type": "Identifier",
        "value": "Error"
    },
    {
        "type": "Punctuator",
        "value": "("
    },
    {
        "type": "String",
        "value": "'Use jscpd'"
    },
    {
        "type": "Punctuator",
        "value": ")"
    },
    {
        "type": "Punctuator",
        "value": ";"
    },
    {
        "type": "Punctuator",
        "value": "}"
    }
]

Esprima

Tokens Service
Tokens Service
Hash Service
Hash Service
Detector
Detector
Store
[Not supported by viewer]
Input Service
Input Service
Blamer
Service
[Not supported by viewer]
Reporter
Service
[Not supported by viewer]

JSCPD Architecture

Questions? Ideas?

Copy/paste detector

By Andrey Kucherenko

Copy/paste detector

How I develop copy paste detector

  • 517
Loading comments...

More from Andrey Kucherenko