Introduction to Scikit-Learn (sklearn)
sklearn APIs are organized on the lines of our ML framework.
- Training data and preprocessing
- Model subsumes loss function and optimization procedure
- Model selection and evaluation
- Model inspection
Training data
Model
Loss function
Optimization
Evaluation
Scikit-learn
ML Framework
API design principles
sklearn APIs are well designed with the following principles:
- Consistency: All APIs share a simple and consistent interface.
- Inspection: The learnable parameters as well as hyperparameters of all estimator's are accessible directly via public instance variables.
- Nonproliferation of classes: Datasets are represented as Numpy arrays or Scipy sparse matrix instead of custom designed classes.
- Composition: Existing building blocks are reduced as much as possible.
- Sensible defaults values are used for parameters that enables quick baseline building.
@sir, copied to 'Data Preprocessing' slide deck
Types of sklearn objects
Estimators
Predictors
Transformers
- Estimates model parameters based on training data and hyper parameters.
-
fit()method
- Makes prediction on dataset
-
predict()method that takes dataset as an input and returns predictions. -
score()method to measure quality of predictions.
- transforms dataset
-
transform()for transforming dataset. -
fit()learns parameters. -
fit_transform()fits parameters andtransform()the dataset.
Data Preprocessing
Training
Inference
@sir, copied to 'Data Preprocessing' slide deck
sklearn API
Data API
Provides functionality for loading, generating and preprocessing the training and test data.
| Module | Functionality |
|---|---|
sklearn.datasets |
Loading datasets - custom as well as popular reference dataset. |
sklearn.preprocessing |
Scaling, centering, normalization and binarization methods |
sklearn.impute |
Filling missing values |
sklearn.feature_selection |
Implements feature selection algorithms |
sklearn.feature_extraction |
Implements feature extraction from raw data. |
@sir, copied to 'Data Preprocessing' slide deck
Model API
Implements supervised and unsupervised models
Regression
Classification
-
sklearn.linear_model(linear, ridge, lasso models)
sklearn.trees
sklearn.linear_modelsklearn.svmsklearn.treessklearn.neighborssklearn.naive_bayessklearn.multiclass
sklearn.multioutput implements multi-output classification and regression.
sklearn.cluster implements many popular clustering algorithms
Model evaluation API
sklearn.metrics implements different metrics for model evaluation.
Model selection API
sklearn.model_selection implements various model selection strategies like cross-validation, tuning hyper-parameters and plotting learning curves.
Model inspection API
sklearn.model_inspection includes tools for model inspection.
Practical advice
import sklearn.linear_model import LogisticRegression
?LogisticRegression- It is not possible to remember each and every sklearn API.
- Use documentation for more information as follows:
- Remember high level modules and API design principles.
- Keep the following links handy:
- API reference
- sklearn user guide
- Worked examples for reference implementations
Introduction to sklearn
By ashishtendulkar
Introduction to sklearn
- 481