Data-Tag

Developed By:

Akshay Pratap Singh(2011ecs01)

Rishabh Shukla(2011ecs13)

Sunny Kumar(2011ecs43)

Guided By:

Dr. Sunanda Gupta

Data-Tag is a Natural Language Processing based Text Classification system, which utilizes Open Data from Wikipedia along with Word-Sense Disambiguation algorithm to find semantic similarity and outputs tags accordingly.

What problem does it solve?

  • Provides an alternative for keyword based classifications
  • Tags data according to their semantic meaning
  • Makes text classifiers more artificially intelligent

What's new in Data-Tag?

  • Semantic Nature
  • Uses Wikipedia Data as gloss for finding semantic similarity
  • Can tag textual as well as data from web pages
  • extends capability of WSD to multiple classes
  • No Database requirements - using existing data from internet

User Interface/Demo

Technologies/Frameworks Used

  • Python - Flask
  • JavaScript - AngularJS
  • Natural Language Toolkit
  • Wikipedia API
  • Redis - noSQL Database
  • Pattern Framework
  • grrequests
  • urllib3

Sequence Flow

  • Send data from Angular Controller to Flask Back-end
  • Tokenize text
  • Apply Part-of-Speech Tagging
  • Extract Nouns
  • Get pages from Wikipedia based on extracted nouns
  • Apply Word Sense Disambiguation
  • Get matching scores
  • Return top 3 tags with highest scores

Sequence Diagram

Shortcomings

  • Slower processing on Slower networks
  • may provide inaccurate results for larger number of tags
  • No Memoization 
  • Supports only "English" language

Scope/Future Work

  • Can be incorporated with a web crawler for semantically tagging web pages
  • Can be used with Machine learning techniques to provide more accurate results
  • May acts as a basis for NLP Query Formulation from web data

Conclusion

Systems like Data-Tag are the future of semantic text over internet. It aims in dropping age-old keyword based classification systems and welcomes the advent of more Artificially Intelligent systems. It further provides a basic structure to develop larger systems utilizing the similar concepts to classify data all over  the internet and textual documents.

Thank you.

Data-Tag

By Rishabh Shukla

Data-Tag

  • 1,384