This is documentation for an old release of Scikit-learn (version 0.22). Try the latest stable release (version 1.6) or development (unstable) versions.
Note
Click here to download the full example code or to run this example in your browser via Binder
Pipeline Anova SVM¶
Simple usage of Pipeline that runs successively a univariate feature selection with anova and then a SVM of the selected features.
Using a sub-pipeline, the fitted coefficients can be mapped back into the original feature space.
Out:
precision recall f1-score support
0 0.75 0.50 0.60 6
1 0.67 1.00 0.80 6
2 0.67 0.80 0.73 5
3 1.00 0.75 0.86 8
accuracy 0.76 25
macro avg 0.77 0.76 0.75 25
weighted avg 0.79 0.76 0.76 25
[[-0.23912131 0. 0. 0. -0.3236911 0.
0. 0. 0. 0. 0. 0.
0.10836648 0. 0. 0. 0. 0.
0. 0. ]
[ 0.43878747 0. 0. 0. -0.51415652 0.
0. 0. 0. 0. 0. 0.
0.04845652 0. 0. 0. 0. 0.
0. 0. ]
[-0.65382998 0. 0. 0. 0.57962856 0.
0. 0. 0. 0. 0. 0.
-0.04736524 0. 0. 0. 0. 0.
0. 0. ]
[ 0.54403412 0. 0. 0. 0.58478491 0.
0. 0. 0. 0. 0. 0.
-0.11344659 0. 0. 0. 0. 0.
0. 0. ]]
from sklearn import svm
from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
print(__doc__)
# import some data to play with
X, y = make_classification(
n_features=20, n_informative=3, n_redundant=0, n_classes=4,
n_clusters_per_class=2)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
# ANOVA SVM-C
# 1) anova filter, take 3 best ranked features
anova_filter = SelectKBest(f_regression, k=3)
# 2) svm
clf = svm.LinearSVC()
anova_svm = make_pipeline(anova_filter, clf)
anova_svm.fit(X_train, y_train)
y_pred = anova_svm.predict(X_test)
print(classification_report(y_test, y_pred))
coef = anova_svm[:-1].inverse_transform(anova_svm['linearsvc'].coef_)
print(coef)
Total running time of the script: ( 0 minutes 0.341 seconds)
Estimated memory usage: 8 MB