.. _related_projects: ===================================== Related Projects ===================================== Below is a list of sister-projects, extensions and domain specific packages. Interoperability and framework enhancements ------------------------------------------- These tools adapt scikit-learn for use with other technologies or otherwise enhance the functionality of scikit-learn's estimators. - `sklearn_pandas `_ bridge for scikit-learn pipelines and pandas data frame with dedicated transformers. - `Scikit-Learn Laboratory `_ A command-line wrapper around scikit-learn that makes it easy to run machine learning experiments with multiple learners and large feature sets. - `auto-sklearn `_ An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator - `sklearn-pmml `_ Serialization of (some) scikit-learn estimators into PMML. Other estimators and tasks -------------------------- Not everything belongs or is mature enough for the central scikit-learn project. The following are projects providing interfaces similar to scikit-learn for additional learning algorithms, infrastructures and tasks. - `pylearn2 `_ A deep learning and neural network library build on theano with scikit-learn like interface. - `sklearn_theano `_ scikit-learn compatible estimators, transformers, and datasets which use Theano internally - `lightning `_ Fast state-of-the-art linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc...). - `Seqlearn `_ Sequence classification using HMMs or structured perceptron. - `HMMLearn `_ Implementation of hidden markov models that was previously part of scikit-learn. - `PyStruct `_ General conditional random fields and structured prediction. - `py-earth `_ Multivariate adaptive regression splines - `sklearn-compiledtrees `_ Generate a C++ implementation of the predict function for decision trees (and ensembles) trained by sklearn. Useful for latency-sensitive production environments. - `lda `_: Fast implementation of Latent Dirichlet Allocation in Cython. - `Sparse Filtering `_ Unsupervised feature learning based on sparse-filtering - `Kernel Regression `_ Implementation of Nadaraya-Watson kernel regression with automatic bandwidth selection - `gplearn `_ Genetic Programming for symbolic regression tasks. - `nolearn `_ A number of wrappers and abstractions around existing neural network libraries - `sparkit-learn `_ Scikit-learn functionality and API on PySpark. - `keras `_ Theano-based Deep Learning library. - `mlxtend `_ Includes a number of additional estimators as well as model visualization utilities. Statistical learning with Python -------------------------------- Other packages useful for data analysis and machine learning. - `Pandas `_ Tools for working with heterogeneous and columnar data, relational queries, time series and basic statistics. - `theano `_ A CPU/GPU array processing framework geared towards deep learning research. - `Statsmodel `_ Estimating and analysing statistical models. More focused on statistical tests and less on prediction than scikit-learn. - `PyMC `_ Bayesian statistical models and fitting algorithms. - `REP `_ Environment for conducting data-driven research in a consistent and reproducible way - `Sacred `_ Tool to help you configure, organize, log and reproduce experiments - `gensim `_ A library for topic modelling, document indexing and similarity retrieval - `Seaborn `_ Visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. - `Deep Learning `_ A curated list of deep learning software libraries. Domain specific packages ~~~~~~~~~~~~~~~~~~~~~~~~ - `scikit-image `_ Image processing and computer vision in python. - `Natural language toolkit (nltk) `_ Natural language processing and some machine learning. - `NiLearn `_ Machine learning for neuro-imaging. - `AstroML `_ Machine learning for astronomy. - `MSMBuilder `_ Machine learning for protein conformational dynamics time series. Snippets and tidbits --------------------- The `wiki `_ has more!