Text mining :
The state of the art and the challenges
Source : 2009 PAKDD
Author : Ah-Hwee Tan
Advisor : Tien-Chi Huang
Presenter : Chieh Hsu
Pacific-Asia Conference on Knowledge Discovery and Data Mining
Outline
-
Introduction
-
A Framework of text mining
-
A Survey of text mining products
-
Open problems and future directions
Introduction ( 1 / 4 )
-
Text mining, also known as
-
Text data mining
-
Knowledge discovery
-
-
Text mining
-
Extracting interesting
and non-trivial patterns
-
Knowledge from unstructured documents
-
-
Extension of data mining
Introduction ( 2 / 4 )
-
Text
-
Complex task
-
Inherently unstructured and fuzzy
-
-
Commercial potential higher
-
80% of a company's information
Introduction ( 3 / 4 )
-
Text Mining is a multidisciplinary field
-
Involving information retrieval
-
Tex analysis
-
Information extraction
-
Clustering, Categorization
-
Visualization
-
Database technology
-
Machine learning
-
Data mining
-
Introduction ( 4 / 4 )
-
Text Mining consisting of two componets
-
Text refining
-
Knowledge distillation
-
-
Products and Applications
-
Text refining
-
Knowledge distillation
-
Intermediate form
-
A Framework of text mining ( 1 / 3 )
-
Consisting of two phases
-
Text refining
-
Knowledge distillation
-
-
Intermediate form
-
Semi-structured
-
Structured
-
-
Intermediate form
-
Document-based
-
Concept-based
-
A Framework of text mining ( 2 / 3 )
-
Document-based
-
Patterns and relationship
across documents -
Clustering/visualization and categorization
-
-
Concept-based
-
Derives pattern and relationship across objects or concepts
-
Predictive modeling and associative discovery
-
A Framework of text mining ( 3 / 3 )

A Survey of text mining products ( 1 / 3 )
- Information retrieval
- Information Extraction
- Categorization
- Summarization
- Document organization
- Visualization
- Navigation
A Survey of text mining products ( 2 / 3 )
3.1. Document visualization

A Survey of text mining products ( 3 / 3 )
3.2. Text analysis and understanding
Open problems and future directions ( 1 / 4 )
4.1. Intermediate form
- Varying degrees of complexity
- Sufficiently rich representation to capture the relationship
- Semantic analysis methods are computationally expensive
and often operate
Open problems and future directions ( 2 / 4 )
4.2. Multilingual text refining
- Data mining is largely language independent, text
mining involves a significant language component.
- Text refining algorithms
- Most text mining tools focus on processing English documents
Open problems and future directions ( 3 / 4 )
4.3. Domain knowledge intergration
- Domain knowledge
- Text mining
- Knowledge distillation
Open problems and future directions ( 4 / 4 )
4.4. Personalized autonomous mining
- Future text mining tools, as part of the
knowledge management systems
- Text mining tools
- Interpret natural language queries and automatically
perform the appropriate mining operations.
Thanks for listening
text_mining
By nicky30102
text_mining
- 729