Data-Tag
Developed By:
Akshay Pratap Singh(2011ecs01)
Rishabh Shukla(2011ecs13)
Sunny Kumar(2011ecs43)
Guided By:
Dr. Sunanda Gupta
Data-Tag is a Natural Language Processing based Text Classification system, which utilizes Open Data from Wikipedia along with Word-Sense Disambiguation algorithm to find semantic similarity and outputs tags accordingly.
What problem does it solve?
- Provides an alternative for keyword based classifications
- Tags data according to their semantic meaning
- Makes text classifiers more artificially intelligent
What's new in Data-Tag?
- Semantic Nature
- Uses Wikipedia Data as gloss for finding semantic similarity
- Can tag textual as well as data from web pages
- extends capability of WSD to multiple classes
- No Database requirements - using existing data from internet
User Interface/Demo
Technologies/Frameworks Used
- Python - Flask
- JavaScript - AngularJS
- Natural Language Toolkit
- Wikipedia API
- Redis - noSQL Database
- Pattern Framework
- grrequests
- urllib3
Sequence Flow
- Send data from Angular Controller to Flask Back-end
- Tokenize text
- Apply Part-of-Speech Tagging
- Extract Nouns
- Get pages from Wikipedia based on extracted nouns
- Apply Word Sense Disambiguation
- Get matching scores
- Return top 3 tags with highest scores
Sequence Diagram
Shortcomings
- Slower processing on Slower networks
- may provide inaccurate results for larger number of tags
- No Memoization
- Supports only "English" language
Scope/Future Work
- Can be incorporated with a web crawler for semantically tagging web pages
- Can be used with Machine learning techniques to provide more accurate results
- May acts as a basis for NLP Query Formulation from web data
Conclusion
Systems like Data-Tag are the future of semantic text over internet. It aims in dropping age-old keyword based classification systems and welcomes the advent of more Artificially Intelligent systems. It further provides a basic structure to develop larger systems utilizing the similar concepts to classify data all over the internet and textual documents.
Thank you.
Data-Tag
By Rishabh Shukla
Data-Tag
- 1,480