Stack multiple processes into a single (scikit-learn) estimation.
Below sample codes come from this example.
1from sklearn.svm import SVC
2from sklearn.decomposition import PCA
3from sklearn.pipeline import make_pipeline
4
5pca = PCA(n_components=150, whiten=True, random_state=42)
6svc = SVC(kernel='rbf', class_weight='balanced')
7model = make_pipeline(pca, svc)
Difference between
Pipeline
and make_pipeline
:Pipeline
: you can name the steps.
make_pipeline
: no need to name the steps (use them directly).
1make_pipeline(PCA(), SVC())
1Pipeline(steps=[
2 ('principle_component_analysis', PCA()),
3 ('support_vector_machine', SVC())
4])
1# Using with GridSearch (to choose the best parameters)
2from sklearn.model_selection import GridSearchCV
3param_grid = {'svc__C': [1, 5, 10, 50], # "svc": name before, "C": param in svc
4 'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
5grid = GridSearchCV(model, param_grid, cv=5, verbose=1, n_jobs=-1)
6
7grid_result = grid.fit(X, y)
8best_params = grid_result.best_params_
9
10# predict with best params
11grid.predict(X_test)
In case you wanna use
best_params
1best_params['svc__C']
2best_params['svc__gamma']
Take care the cross validation (take a long time to run!!!
1from sklearn.model_selection import cross_val_score
2cv_scores = cross_val_score(grid, X, y)
3print('Accuracy scores:', cv_scores)
4print('Mean of score:', np.mean(cv_scores))
5print('Variance of scores:', np.var(cv_scores))