sklearn APIs are organized on the lines of our ML framework.
Training data
Model
Loss function
Optimization
Evaluation
Scikit-learn
ML Framework
sklearn APIs are well designed with the following principles:
@sir, copied to 'Data Preprocessing' slide deck
Estimators
Predictors
Transformers
fit() methodpredict() method that takes dataset as an input and returns predictions.score() method to measure quality of predictions.transform() for transforming dataset.fit() learns parameters.
fit_transform() fits parameters and transform() the dataset.Data Preprocessing
Training
Inference
@sir, copied to 'Data Preprocessing' slide deck
Provides functionality for loading, generating and preprocessing the training and test data.
| Module | Functionality |
|---|---|
sklearn.datasets |
Loading datasets - custom as well as popular reference dataset. |
sklearn.preprocessing |
Scaling, centering, normalization and binarization methods |
sklearn.impute |
Filling missing values |
sklearn.feature_selection |
Implements feature selection algorithms |
sklearn.feature_extraction |
Implements feature extraction from raw data. |
@sir, copied to 'Data Preprocessing' slide deck
Implements supervised and unsupervised models
Regression
Classification
sklearn.linear_model (linear, ridge, lasso models)sklearn.trees
sklearn.linear_modelsklearn.svmsklearn.treessklearn.neighborssklearn.naive_bayessklearn.multiclasssklearn.multioutput implements multi-output classification and regression.
sklearn.cluster implements many popular clustering algorithms
sklearn.metrics implements different metrics for model evaluation.
sklearn.model_selection implements various model selection strategies like cross-validation, tuning hyper-parameters and plotting learning curves.
sklearn.model_inspection includes tools for model inspection.
import sklearn.linear_model import LogisticRegression
?LogisticRegression