電腦科學
機器學習
遊戲開發
演算法
資料存取
遊戲
測試
分支(branch)
節點(node)
根節點(root)
葉(leaf)
有沒有傘
不撐
有沒有下雨
不撐
雨大不大
不撐
撐
def decide(have, rain, heavy):
if not have:
return "don't hold an umbrella"
else:
if not rain:
return "don't hold an umbrella"
else:
if not heavy:
return "don't hold an umbrella"
else:
return "hold an umbrella"
data = [[False, False, False], [True, False, True], [True, True, True], [True, True, False]]
for i in data:
print(decide(i[0], i[1], i[2]))def decide(have, rain, heavy):
if not have:
return "don't hold an umbrella"
elif not rain:
return "don't hold an umbrella"
elif not heavy:
return "don't hold an umbrella"
else:
return "hold an umbrella"
data = [[False, False, False], [True, False, True], [True, True, True], [True, True, False]]
for i in data:
print(decide(i[0], i[1], i[2]))Output:
讓電腦自己從資料中學會規則或模式
觀察資料
找出模式或規則
預測
模擬、簡化人類思考方式
可以處理分類和回歸問題
分類: yes/no 回歸: 1~100
訓練比較快
容易理解
優點
缺點
| 天氣 | 溼度 | 氣溫 | 雲量 |
|---|---|---|---|
| 雨 | 70% | 20 | 多 |
| 晴 | 60% | 24 | 少 |
| 雨 | 90% | 27 | 多 |
| 雨 | 45% | 18 | 多 |
| 晴 | 20% | 23 | 少 |
| 晴 | 30% | 28 | 多 |
| 雨 | 80% | 30 | 多 |
濕度 > 50
雲量
氣溫>23
氣溫>25
初始狀態
濕度 > 50
T
F
初始狀態
濕度 > 50
濕度 <= 50
p(parent)
c(child)
pip install scikit-learn
pip install matplotlib
pip install pandasInstallation
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
dataset = pd.DataFrame(data = iris["data"], columns = data["feature_names"])
print(dataset)使用內建的資料(Iris)
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
x = dataset.copy()
y = iris["target"]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33)
classifier = DecisionTreeClassifier(criterion = "entropy", ccp_alpha = 0.05)
classifier = classifier.fit(x_train, y_train)# prediction = classifier.predict_proba(x_test)
prediction = classifier.predict(x_test)
print(prediction)from sklearn.metrics import accuracy_score, confusion_matrix
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))feature_importance = pd.DataFrame(classifier.feature_importances_, index = x.columns)
print(feature_importance)from sklearn.tree import plot_tree
from matplotlib import pyplot as plt
plt.figure(figsize = (20, 12))
plot_tree(classifier, feature_names = x.columns ,class_names = {0: "Setosa", 1: "Virginica", 2: "Versicolour"}, filled = True, fontsize = 12)
plt.show()import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, confusion_matrix
from matplotlib import pyplot as plt
iris = load_iris()
dataset = pd.DataFrame(data = iris["data"], columns = iris["feature_names"])
print(dataset)
x = dataset.copy()
y = iris["target"]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33)
classifier = DecisionTreeClassifier(criterion = "entropy", ccp_alpha = 0.05)
classifier = classifier.fit(x_train, y_train)
# prediction = classifier.predict_proba(x_test)
prediction = classifier.predict(x_test)
print(prediction)
print(accuracy_score(y_test, prediction))
print(confusion_matrix(y_test, prediction))
feature_importance = pd.DataFrame(classifier.feature_importances_, index = x.columns)
print(feature_importance)
plt.figure(figsize = (20, 12))
plot_tree(classifier, feature_names = x.columns, class_names = {0: "Setosa", 1: "Virginica", 2: "Versicolour"}, filled = True, fontsize = 12)
plt.show()