Sonny不讀不行: Applied Machine Learning in Python 3

2017年9月4日星期一

Applied Machine Learning in Python 3 - KNN regression

KNN regression

KNN也是可以拿來做regression problem，例如對以下2-D dataset來說:

k=1的時候，某個test set point就是找離自己最近的input feature值(x軸)的training set point(左圖綠色點)，藍色三角形代表某個x會被regressioner map到的target value的值。

k=3的時候，可以用input feature的鄰近的3個training set points的value mean，或是其他的計算方法，例如weighted by feature distance。

Important Parameters for KNN

第一個當然就是k，這控制了model complexity，k過小的話，decision boundary就會變得很複雜，造成overfitting，k過大的話則會造成underfitting。

第二個就是使用何種distance metrics，這邊就使用Euclidean，不多做討論了。

R-squared (coefficient of determination)用來評估Regressor

R-squared是介於0~1的數字，0代表此regressor只是預測出所有training set的mean value，意思就是沒有預測能力。1代表此regressor對dataset的fit最好。

k = 1的時候，overfitting，這可以由training set R^2 分數1.0，但是test set R^2分數很低得出:

k=15有不錯的 test set R^2結果，可以看到decision boundary大致上有抓到整個big picture:

k=55反而造成了underfitting，兩者個R^2值都下降了，decision boundary也過於平整:

Sonny不讀不行

code

2017年9月4日星期一

Applied Machine Learning in Python 3 - KNN regression

KNN regression

Important Parameters for KNN

R-squared (coefficient of determination)用來評估Regressor

沒有留言:

張貼留言

code

2017年9月4日 星期一

Applied Machine Learning in Python 3 - KNN regression

KNN regression

Important Parameters for KNN

R-squared (coefficient of determination)用來評估Regressor

沒有留言:

張貼留言

2017年9月4日星期一