.. _related_projects: ===================================== Related Projects ===================================== Projects implementing the scikit-learn estimator API are encouraged to use the `scikit-learn-contrib template `_ which facilitates best practices for testing and documenting estimators. The `scikit-learn-contrib GitHub organisation `_ also accepts high-quality contributions of repositories conforming to this template. Below is a list of sister-projects, extensions and domain specific packages. Interoperability and framework enhancements ------------------------------------------- These tools adapt scikit-learn for use with other technologies or otherwise enhance the functionality of scikit-learn's estimators. **Data formats** - `Fast svmlight / libsvm file loader `_ Fast and memory-efficient svmlight / libsvm file loader for Python. - `sklearn_pandas `_ bridge for scikit-learn pipelines and pandas data frame with dedicated transformers. - `sklearn_xarray `_ provides compatibility of scikit-learn estimators with xarray data structures. **Auto-ML** - `auto-sklearn `_ An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator - `TPOT `_ An automated machine learning toolkit that optimizes a series of scikit-learn operators to design a machine learning pipeline, including data and feature preprocessors as well as the estimators. Works as a drop-in replacement for a scikit-learn estimator. **Experimentation frameworks** - `REP `_ Environment for conducting data-driven research in a consistent and reproducible way - `Scikit-Learn Laboratory `_ A command-line wrapper around scikit-learn that makes it easy to run machine learning experiments with multiple learners and large feature sets. **Model inspection and visualisation** - `dtreeviz `_ A python library for decision tree visualization and model interpretation. - `eli5 `_ A library for debugging/inspecting machine learning models and explaining their predictions. - `mlxtend `_ Includes model visualization utilities. - `yellowbrick `_ A suite of custom matplotlib visualizers for scikit-learn estimators to support visual feature analysis, model selection, evaluation, and diagnostics. **Model selection** - `scikit-optimize `_ A library to minimize (very) expensive and noisy black-box functions. It implements several methods for sequential model-based optimization, and includes a replacement for ``GridSearchCV`` or ``RandomizedSearchCV`` to do cross-validated parameter search using any of these strategies. - `sklearn-deap `_ Use evolutionary algorithms instead of gridsearch in scikit-learn. **Model export for production** - `onnxmltools `_ Serializes many Scikit-learn pipelines to `ONNX `_ for interchange and prediction. - `sklearn2pmml `_ Serialization of a wide variety of scikit-learn estimators and transformers into PMML with the help of `JPMML-SkLearn `_ library. - `sklearn-porter `_ Transpile trained scikit-learn models to C, Java, Javascript and others. - `treelite `_ Compiles tree-based ensemble models into C code for minimizing prediction latency. Other estimators and tasks -------------------------- Not everything belongs or is mature enough for the central scikit-learn project. The following are projects providing interfaces similar to scikit-learn for additional learning algorithms, infrastructures and tasks. **Structured learning** - `tslearn `_ A machine learning library for time series that offers tools for pre-processing and feature extraction as well as dedicated models for clustering, classification and regression. - `sktime `_ A scikit-learn compatible toolbox for machine learning with time series including time series classification/regression and (supervised/panel) forecasting. - `HMMLearn `_ Implementation of hidden markov models that was previously part of scikit-learn. - `PyStruct `_ General conditional random fields and structured prediction. - `pomegranate `_ Probabilistic modelling for Python, with an emphasis on hidden Markov models. - `sklearn-crfsuite `_ Linear-chain conditional random fields (`CRFsuite `_ wrapper with sklearn-like API). **Deep neural networks etc.** - `nolearn `_ A number of wrappers and abstractions around existing neural network libraries - `keras `_ Deep Learning library capable of running on top of either TensorFlow or Theano. - `lasagne `_ A lightweight library to build and train neural networks in Theano. - `skorch `_ A scikit-learn compatible neural network library that wraps PyTorch. **Broad scope** - `mlxtend `_ Includes a number of additional estimators as well as model visualization utilities. **Other regression and classification** - `xgboost `_ Optimised gradient boosted decision tree library. - `ML-Ensemble `_ Generalized ensemble learning (stacking, blending, subsemble, deep ensembles, etc.). - `lightning `_ Fast state-of-the-art linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc...). - `py-earth `_ Multivariate adaptive regression splines - `Kernel Regression `_ Implementation of Nadaraya-Watson kernel regression with automatic bandwidth selection - `gplearn `_ Genetic Programming for symbolic regression tasks. - `scikit-multilearn `_ Multi-label classification with focus on label space manipulation. - `seglearn `_ Time series and sequence learning using sliding window segmentation. - `libOPF `_ Optimal path forest classifier - `fastFM `_ Fast factorization machine implementation compatible with scikit-learn **Decomposition and clustering** - `lda `_: Fast implementation of latent Dirichlet allocation in Cython which uses `Gibbs sampling `_ to sample from the true posterior distribution. (scikit-learn's :class:`sklearn.decomposition.LatentDirichletAllocation` implementation uses `variational inference `_ to sample from a tractable approximation of a topic model's posterior distribution.) - `kmodes `_ k-modes clustering algorithm for categorical data, and several of its variations. - `hdbscan `_ HDBSCAN and Robust Single Linkage clustering algorithms for robust variable density clustering. - `spherecluster `_ Spherical K-means and mixture of von Mises Fisher clustering routines for data on the unit hypersphere. **Pre-processing** - `categorical-encoding `_ A library of sklearn compatible categorical variable encoders. - `imbalanced-learn `_ Various methods to under- and over-sample datasets. Statistical learning with Python -------------------------------- Other packages useful for data analysis and machine learning. - `Pandas `_ Tools for working with heterogeneous and columnar data, relational queries, time series and basic statistics. - `statsmodels `_ Estimating and analysing statistical models. More focused on statistical tests and less on prediction than scikit-learn. - `PyMC `_ Bayesian statistical models and fitting algorithms. - `Sacred `_ Tool to help you configure, organize, log and reproduce experiments - `Seaborn `_ Visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Recommendation Engine packages ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - `implicit `_, Library for implicit feedback datasets. - `lightfm `_ A Python/Cython implementation of a hybrid recommender system. - `OpenRec `_ TensorFlow-based neural-network inspired recommendation algorithms. - `Spotlight `_ Pytorch-based implementation of deep recommender models. - `Surprise Lib `_ Library for explicit feedback datasets. Domain specific packages ~~~~~~~~~~~~~~~~~~~~~~~~ - `scikit-image `_ Image processing and computer vision in python. - `Natural language toolkit (nltk) `_ Natural language processing and some machine learning. - `gensim `_ A library for topic modelling, document indexing and similarity retrieval - `NiLearn `_ Machine learning for neuro-imaging. - `AstroML `_ Machine learning for astronomy. - `MSMBuilder `_ Machine learning for protein conformational dynamics time series.