
Semantic analysis and
plagiarism search
Strange project
Check the text for uniqueness

Who needs this solution?
- paper mill
- news feeds
- publishers
- universities

Main problems
- semantic analysis
- search plagiarism
- different text languages

Impasses

- internet searching
- semantic analysis
- plagiarism search algorithms
- lack of useful information
Internet
searching. Tasks
- choose search engine
- get keywords to search
- async requests
- save searching results to local storage

Semantic analysis
- TF-IDF ranking (term frequency - inverse document frequency)
- latent semantic analysis


Latent semantic analysis

- like a simple neural network
- singular value decomposition
Some math about SVD

Plagiarism search algorithms
- shingles algorithm
- NCD (normalized compression distance)
Shingles algorithms
- text normalization
- divide text on shingles
- calculate hashes for each shingle
- get random value of hashes
Normalized compression distance
You have to choose only the compression program!

How it works?


Compress two merged files.
Easy enough
Copy of deck
By Serg Vashchilin
Copy of deck
- 452