Scikit-learn まとめ【Python】【機械学習】

sklearnの基本的な操作をまとめていきます。（随時更新）

この記事ではアルゴリズムの実装や理論には触れません。

データの前処理

One-hot encode

ユニークなカテゴリカルデータを数字で扱うことができるように、one hot エンコーダーを使用してデータフレームを整形します。

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

categorical_features = [<Feature1>,<Freature2>,<Feature3>]
encoder = OneHotEncoder(sparse=False,dtype=int,handle_unknown='ignore')

tf = ColumnTransformer([("one_hot",
                        encoder,
                        categorical_features)],
                        remainder='passthrough')

X = tf.fit_transform(X)

データの欠損の確認

df.isna().sum()

pandasを使用して欠損データを削除する方法もありますが、読み込み元がCSVなら加工はそこまで手間ではないので、わざわざpandasでやる必要もないと思います。

ランダムフォレスト(分類モデル）

モデルの作成

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

model = RandomForestClassifier()

データフレームオブジェクトから、説明変数と従属変数を決めてX２つの変数に割り当てる

X = df.drop(<Dependant Variable>,axis = 1)
y = df[<Dependant Variable>]

学習データとテストデータを分割

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.5)

学習データをもとに学習

model.fit(X_train,y_train)

説明変数のテストデータをパラメータに渡して、従属変数を予測

#prediction
y_preds = model.predict(X_test)
print(y_preds)

予測スコアを確認

print(model.score(X_train,y_train))
print(model.score(X_test,y_test))

予測スコア詳細の取得

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(classification_report(y_test,y_preds))

print(confusion_matrix(y_test,y_preds))

print(accuracy_score(y_test,y_preds))

モデルのチューニング

N_Est可変パラメータに応じた学習スコアを出力します。

for i in range(10,100,10):
    model = RandomForestClassifier(n_estimators=i)
    model.fit(X_train,y_train)
    print(model.score(X_test,y_test))

Pickleによるモデルのエクスポート

任意の学習モデルをエクスポートします。

import pickle

#Export model
pickle.dump(model,open('RFM_1.pkl','wb'))

#Load model
loaded_model = pickle.load(open('RFM_1.pkl','rb'))

ランダムフォレスト(回帰モデル）

モデルの作成

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()

フィッティング

X_train, X_test, y_train, y_test = train_test_split(transformed_X,y,test_size=0.2)

model.fit(X_train,y_train)

model.score(X_test,y_test)

データの前処理

One-hot encode

データの欠損の確認

ランダムフォレスト(分類モデル）

予測スコア詳細の取得

モデルのチューニング

Pickleによるモデルのエクスポート

ランダムフォレスト(回帰モデル）

モデルの作成

フィッティング

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル