Text mining :
The state of the art and the challenges

Source : 2009 PAKDD

Author : Ah-Hwee Tan

Advisor : Tien-Chi Huang

Presenter : Chieh Hsu

Pacific-Asia Conference on Knowledge Discovery and Data Mining

Outline

  1. Introduction

  2. A Framework of text mining

  3. A Survey of text mining products

  4. Open problems and future directions

Introduction ( 1 / 4 )

  • Text mining, also known as

    • Text data mining

    • Knowledge discovery

  • Text mining

    • Extracting interesting

      and non-trivial patterns

    • Knowledge from unstructured documents

  • Extension of data mining

Introduction ( 2 / 4 )

  • Text

  • Complex task

    • Inherently unstructured and fuzzy

  • Commercial potential higher

  • 80% of a company's information

Introduction ( 3 / 4 )

  • Text Mining is a multidisciplinary field

    • Involving information retrieval

    • Tex analysis

    • Information extraction

    • Clustering, Categorization

    • Visualization

    • Database technology

    • Machine learning

    • Data mining

Introduction ( 4 / 4 )

  • Text Mining consisting of two componets

    • Text refining

    • Knowledge distillation

  • Products and Applications

    • Text refining

    • Knowledge distillation

    • Intermediate form

A Framework of text mining ( 1 / 3 )

  • Consisting of two phases

    • Text refining

    • Knowledge distillation

  • Intermediate form

    • Semi-structured

    • Structured

  • Intermediate form

    • Document-based

    • Concept-based

A Framework of text mining ( 2 / 3 )

  • Document-based

    • Patterns and relationship
      across documents

    • Clustering/visualization and categorization

  • Concept-based

    • Derives pattern and relationship across objects or concepts

    • Predictive modeling and associative discovery

A Framework of text mining ( 3 / 3 )

A Survey of text mining products ( 1 / 3 )

  • Information retrieval
  • Information Extraction
  • Categorization
  • Summarization
  • Document organization
  • Visualization
  • Navigation

A Survey of text mining products ( 2 / 3 )

3.1. Document visualization

A Survey of text mining products ( 3 / 3 )

3.2. Text analysis and understanding

Open problems and future directions ( 1 / 4 )

4.1. Intermediate form

  • Varying degrees of complexity
  • Sufficiently rich representation to capture the relationship
  • Semantic analysis methods are computationally expensive
    and often operate

Open problems and future directions ( 2 / 4 )

4.2. Multilingual text refining

  • Data mining is largely language independent, text
    mining involves a significant language component.
  • Text refining algorithms
  • Most text mining tools focus on processing English documents

Open problems and future directions ( 3 / 4 )

4.3. Domain knowledge intergration

  • Domain knowledge
  • Text mining
  • Knowledge distillation

Open problems and future directions ( 4 / 4 )

4.4. Personalized autonomous mining

  • Future text mining tools, as part of the
    knowledge management systems
  • Text mining tools
  • Interpret natural language queries and automatically
    perform the appropriate mining operations.

Thanks for listening

text_mining

By nicky30102

text_mining

  • 729