Sonny不讀不行: Applied Machine Learning in Python 10

2017年9月12日星期二

Applied Machine Learning in Python 10 - Decision Trees

Informative split

decision tree就是在尋求最好的split方法，所謂最好意思就是在那個split node可以把某個class完全的分出來。

例如上途中，root node的split條件是花瓣長度 <= 2.35公分者，則True的path可以完全把setosa 這個class分離出來 (homogeneous)，則這就是一個很好的split方法。

Pruning to overcome Overfitting problem

decision tree很容易overfitting，因為只要能有features把某個物體跟其他分開來的話，很可能一個物體最後就是一個leaf node ，這樣特徵就是很深的tree。

所以一個可能的方法就是避免tree長太深，稱為pruning。pruning有兩種，一個是一開始就限制tree的長度，另一個是完全長完之後，再prune，scikitlearn只implement了pre-pruning:

不過decision trees還是會傾向overfit，這是演算法使然。

Gradient-boosted Decision trees

這是一種ensemble方法，但是不採用randomness，而是藉由創造一系列的shallow trees (weak learners) ，後面的tree是改進前面tree而建立的（複雜度由learning rate 參數控制）。

這算是最好效果的supervised learning methods之一。
優缺點特性跟random forest一樣。

Sonny不讀不行

code

2017年9月12日星期二

Applied Machine Learning in Python 10 - Decision Trees

Informative split

Pruning to overcome Overfitting problem

Gradient-boosted Decision trees

沒有留言:

張貼留言

code

2017年9月12日 星期二

Applied Machine Learning in Python 10 - Decision Trees

Informative split

Pruning to overcome Overfitting problem

Gradient-boosted Decision trees

沒有留言:

張貼留言

2017年9月12日星期二