Karl Ho
School of Economic, Political and Policy Sciences
University of Texas at Dallas
Knowledge is experience. Everything else is just information.
- Albert Einstein
Knowledge is considered as a collection of experience, appropriate information and skilled insight
which offers a structure for estimating and integrating new experiences and information.
Knowledge's dynamic nature and relationship with experience and information go beyond mere facts, incorporating understanding, context, and the ability to apply information.
Type of Knowledge | Description | Example |
---|---|---|
Propositional Knowledge (knowledge-that) | Theoretical knowledge of facts that can be expressed in declarative sentences | "The Earth orbits the Sun" |
Procedural Knowledge (knowledge-how) | Practical skills or abilities | Knowing how to ride a bicycle |
Knowledge by Acquaintance | Direct familiarity or awareness gained through experience | Knowing what the taste of chocolate is like |
Logical Knowledge | Understanding of logical principles and relations | Knowing that if A implies B, and B implies C, then A implies C |
Semantic Knowledge | Knowledge of the meanings of words and concepts | Understanding what the word "democracy" means |
Empirical Knowledge | Knowledge derived from sensory experience and observation | Knowing that water boils at 100°C at sea level |
Source: Wei, Chih-Ping, Selwyn Piramuthu, and Michael J. Shaw. "Knowledge discovery and data mining." In Handbook on Knowledge Management, pp. 157-189. Springer, Berlin, Heidelberg, 2003.
Source: Wei, Chih-Ping, Selwyn Piramuthu, and Michael J. Shaw. "Knowledge discovery and data mining." In Handbook on Knowledge Management, pp. 157-189. Springer, Berlin, Heidelberg, 2003.
(roughly) if you look in more places for interesting patterns than your amount of data will support, you are bound to find crap.
- Rajaraman A., Leskovec J. and Ullman J. Mining of Massive Datasets
Today we live in a data rich, information driven, knowledge strained, and wisdom scant world.
- Graham Williams 2021
- Rajaraman A., Leskovec J. and Ullman J. Mining of Massive Datasets
Data mining overlaps with:
Different cultures:
Data Mining and Machine Learning
Data Mining methods
Field of study that gives computers the ability to learn without being explicitly programmed.
- Arthur Samuel 1959
A computer can be programmed so that it will learn to play a better game of checkers than can be played by the person who wrote the program.
Programming computers to learn from experience should eventually eliminate the need for much of this detailed programming effort.
The ultimate goal of data modeling is to explain and predict the variable of interest using data. Machine learning is to achieve this goal using computer algorithms in particular to make the prediction and solve the problem.
Source: Tom Mitchell website
According to Carnegie Mellon Computer Science professor Tom M. Mitchell,
"Machine learning is the study of computer algorithms that allow computer programs to automatically improve through experience."
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”
Source: Tom Mitchell website
Study the past if you would define the future.
– Confucius
Tom Mitchell. 1997. Machine Learning, McGraw Hill.
1928 – 2005
One assumes that the data are generated by a given stochastic data model. |
---|
The other uses algorithmic models and treats the data mechanism as unknown. |
---|
Data Model |
---|
Algorithmic Model |
---|
Small data |
---|
Complex, big data |
---|
Data are generated in many fashions. Picture this: independent variable x goes in one side of the box-- we call it nature for now-- and dependent variable y come out from the other side.
The analysis in this culture starts with assuming a stochastic data model for the inside of the black box. For example, a common data model is that data are generated by independent draws from response variables.
Response Variable= f(Predictor variables, random noise, parameters)
Reading the response variable is a function of a series of predictor/independent variables, plus random noise (normally distributed errors) and other parameters.
The values of the parameters are estimated from the data and the model then used for information and/or prediction.
The analysis in this approach considers the inside of the box complex and unknown. Their approach is to find a function f(x)-an algorithm that operates on x to predict the responses y.
The goal is to find algorithm that accurately predicts y.
Source: https://www.mathworks.com
Source: Attewell, Paul A. & Monaghan, David B. 2015. Data Mining for the Social Sciences: an Introduction, Table 2.1, p. 27
Regression | Classification | Clustering | Q-Learning |
Linear regression | Logistic regression | - K-Means Clustering | State Action Reward State Action (SARSA) |
Polynomial regression | K-Nearest Neighbors | - Hierarchical Clustering | Deep Q-Network |
Support vector regression | Support Vector Machines | Dimensionality Reduction | Markov Decision Processes |
Ridge Regression | Kernal Support Vector Machines | Principal Component Analysis | Deep Deterministic Policy Gradient (DDPG) |
Lasso | Naïve Bayes | Linear Discriminant Analysis | |
ElasticNet | Decision Tree | Kernal PCA | |
Decision tree | Random forest | ||
Random forest |
Supervised Learning | Unsupervised Learning | Reinforcement Learning |
---|
Aspect | Traditional Data Mining | AI Knowledge Mining |
---|---|---|
Data Handling | Primarily works with structured data, requiring preprocessing for unstructured formats. | Excels at processing both structured and unstructured data (e.g., text, images) using AI techniques like NLP and OCR[1][2]. |
Techniques | Relies on statistical methods such as clustering, classification, and regression. | Employs advanced AI techniques like machine learning, deep learning, and neural networks for nuanced insights[1][4]. |
Adaptability | Static models that do not evolve after deployment. | Dynamic models that adapt and improve over time through machine learning[3][5]. |
Complexity of Insights | Identifies straightforward patterns and correlations in data. | Uncovers complex relationships and hidden patterns, offering deeper contextual understanding[1][4]. |
Automation | Requires significant manual intervention in data preparation and analysis. | Automates many tasks, including knowledge extraction and modeling, reducing human effort[2][1]. |
Interpretability | Outputs are often easier to interpret due to simpler methodologies. | Outputs can be harder to interpret due to the “black-box” nature of AI models[1][2]. |
Aspect | Data Mining | Machine Learning | Knowledge Mining |
---|---|---|---|
Purpose | Extracts patterns and insights from large datasets | Develops algorithms that learn from and make predictions on data | Synthesizes and contextualizes information to generate actionable insights |
Scope | Focuses on discovering patterns and correlations | Emphasizes algorithm development and model training | Goes beyond pattern discovery to represent and apply knowledge |
Techniques | Uses statistical models and database management tools | Employs various algorithms like neural networks and decision trees | Incorporates AI, NLP, and semantic technologies |
Data Handling | Works primarily with structured, historical data | Can handle both structured and unstructured data, including real-time | Excels at processing unstructured data and integrating diverse sources |
Automation | Requires significant human intervention for interpretation | Can automate decision-making processes with minimal oversight | Combines automation with human expertise for knowledge creation |
Adaptability | Static process following pre-set rules | Algorithms can adapt and improve with new data | Dynamically updates and refines knowledge representations |
Outcomes | Produces patterns, trends, and correlations | Generates predictive models and classifications | Delivers contextualized insights and strategic intelligence |
Applications | Business intelligence, market analysis | Predictive analytics, image recognition | Strategic decision-making, innovation support |
Feature | Deep Research | DeepSeek |
---|---|---|
Real-time internet search | Can search and interpret information in real-time | Hybrid search engine and knowledge base |
Processing time | 5 to 30 minutes | Efficient processing speed |
Accuracy | 26.6% accuracy on 'Humanity's Last Exam' | Excels in mathematical and technical tasks |
Citation and documentation | Clear citations and thinking process summary | Can self-correct and research sources |
Accessibility | Limited to specific paid tiers with usage caps | Free and open-source, no prompt limitations |