Installing the development version of scikit-learn¶
This section introduces how to install the main branch of scikit-learn. This can be done by either installing a nightly build or building from source.
Installing nightly builds¶
The continuous integration servers of the scikit-learn project build, test and upload wheel packages for the most recent Python version on a nightly basis.
Installing a nightly build is the quickest way to:
try a new feature that will be shipped in the next release (that is, a feature from a pull-request that was recently merged to the main branch);
check whether a bug you encountered has been fixed since the last release.
pip install --pre --extra-index https://pypi.anaconda.org/scipy-wheels-nightly/simple scikit-learn
Building from source¶
Building from source is required to work on a contribution (bug fix, new feature, code or documentation improvement).
git clone git://github.com/scikit-learn/scikit-learn.git # add --depth 1 if your connection is slow cd scikit-learn
If you plan on submitting a pull-request, you should clone from your fork instead.
pip install numpy scipy cython pip install --verbose --no-build-isolation --editable .
Check that the installed scikit-learn has a version number ending with
python -c "import sklearn; sklearn.show_versions()"
You will have to run the
pip install --no-build-isolation --editable .
command every time the source code of a Cython file is updated
.pxd). Use the
--no-build-isolation flag to
avoid compiling the whole project each time, only the files you have
Scikit-learn requires the following dependencies both at build time and at runtime:
Python (>= 3.7),
NumPy (>= 1.14.6),
SciPy (>= 1.1.0),
Joblib (>= 0.11),
threadpoolctl (>= 2.0.0).
For running on PyPy, PyPy3-v5.10+, Numpy 1.14.0+, and scipy 1.1.0+ are required. For PyPy, only installation instructions with pip apply.
Building Scikit-learn also requires:
Cython >= 0.28.5
If OpenMP is not supported by the compiler, the build will be done with
OpenMP functionalities disabled. This is not recommended since it will force
some estimators to run in sequential mode instead of leveraging thread-based
parallelism. Setting the
SKLEARN_FAIL_NO_OPENMP environment variable
(before cythonization) will force the build to fail if OpenMP is not
Since version 0.21, scikit-learn automatically detects and use the linear algebrea library used by SciPy at runtime. Scikit-learn has therefore no build dependency on BLAS/LAPACK implementations such as OpenBlas, Atlas, Blis or MKL.
Running tests requires:
pytest >= 5.0.1
Some tests also require pandas.
Building a specific version from a tag¶
If you want to build a stable version, you can
git checkout <VERSION>
to get the code for that particular version, or download an zip archive of
the version from github.
If you run the development version, it is cumbersome to reinstall the package
each time you update the sources. Therefore it is recommended that you install
in with the
pip install --no-build-isolation --editable . command, which
allows you to edit the code in-place. This builds the extension in place and
creates a link to the development directory (see the pip docs).
This is fundamentally similar to using the command
python setup.py develop
(see the setuptool docs).
It is however preferred to use pip.
On Unix-like systems, you can equivalently type
make in from the top-level
folder. Have a look at the
Makefile for additional utilities.
Here are instructions to install a working C/C++ compiler with OpenMP support to build scikit-learn Cython extensions for each supported platform.
First, install Build Tools for Visual Studio 2019.
You DO NOT need to install Visual Studio 2019. You only need the “Build Tools for Visual Studio 2019”, under “All downloads” -> “Tools for Visual Studio 2019”.
Secondly, find out if you are running 64-bit or 32-bit Python. The building
command depends on the architecture of the Python interpreter. You can check
the architecture by running the following in
python -c "import struct; print(struct.calcsize('P') * 8)"
For 64-bit Python, configure the build environment by running the following
cmd or an Anaconda Prompt (if you use Anaconda):
$ SET DISTUTILS_USE_SDK=1 $ "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64
x86 to build for 32-bit Python.
Please be aware that the path above might be different from user to user. The aim is to point to the “vcvarsall.bat” file that will set the necessary environment variables in the current command prompt.
Finally, build scikit-learn from this command prompt:
pip install --verbose --no-build-isolation --editable .
The default C compiler on macOS, Apple clang (confusingly aliased as
/usr/bin/gcc), does not directly support OpenMP. We present two alternatives
to enable OpenMP support:
libompwith Homebrew to extend the default Apple clang compiler.
For Apple Silicon M1 hardware, only the conda-forge method below is known to
work at the time of writing (January 2021). You can install the
distribution of conda using the miniforge installer
macOS compilers from conda-forge¶
If you use the conda package manager (version >= 4.7), you can install the
compilers meta-package from the conda-forge channel, which provides
OpenMP-enabled C/C++ compilers based on the llvm toolchain.
First install the macOS command line tools:
It is recommended to use a dedicated conda environment to build scikit-learn from source:
conda create -n sklearn-dev -c conda-forge python numpy scipy cython \ joblib threadpoolctl pytest compilers llvm-openmp conda activate sklearn-dev make clean pip install --verbose --no-build-isolation --editable .
If you get any conflicting dependency error message, try commenting out
any custom conda configuration in the
$HOME/.condarc file. In
channel_priority: strict directive is known to cause
problems for this setup.
You can check that the custom compilers are properly installed from conda forge using the following command:
which should include
The compilers meta-package will automatically set custom environment variables:
echo $CC echo $CXX echo $CFLAGS echo $CXXFLAGS echo $LDFLAGS
They point to files and folders from your
sklearn-dev conda environment
(in particular in the bin/, include/ and lib/ subfolders). For instance
-L/path/to/conda/envs/sklearn-dev/lib should appear in
In the log, you should see the compiled extension being built with the clang
and clang++ compilers installed by conda with the
-fopenmp command line
macOS compilers from Homebrew¶
Another solution is to enable OpenMP support for the clang compiler shipped by default on macOS.
First install the macOS command line tools:
Install the Homebrew package manager for macOS.
Install the LLVM OpenMP library:
brew install libomp
Set the following environment variables:
export CC=/usr/bin/clang export CXX=/usr/bin/clang++ export CPPFLAGS="$CPPFLAGS -Xpreprocessor -fopenmp" export CFLAGS="$CFLAGS -I/usr/local/opt/libomp/include" export CXXFLAGS="$CXXFLAGS -I/usr/local/opt/libomp/include" export LDFLAGS="$LDFLAGS -Wl,-rpath,/usr/local/opt/libomp/lib -L/usr/local/opt/libomp/lib -lomp"
Finally, build scikit-learn in verbose mode (to check for the presence of the
-fopenmp flag in the compiler commands):
make clean pip install --verbose --no-build-isolation --editable .
Linux compilers from the system¶
Installing scikit-learn from source without using conda requires you to have installed the scikit-learn Python development headers and a working C/C++ compiler with OpenMP support (typically the GCC toolchain).
Install build dependencies for Debian-based operating systems, e.g. Ubuntu:
sudo apt-get install build-essential python3-dev python3-pip
then proceed as usual:
pip3 install cython pip3 install --verbose --editable .
Cython and the pre-compiled wheels for the runtime dependencies (numpy, scipy
and joblib) should automatically be installed in
$HOME/.local/lib/pythonX.Y/site-packages. Alternatively you can run the
above commands from a virtualenv or a conda environment to get full
isolation from the Python packages installed via the system packager. When
using an isolated environment,
pip3 should be replaced by
pip in the
When precompiled wheels of the runtime dependencies are not available for your architecture (e.g. ARM), you can install the system versions:
sudo apt-get install cython3 python3-numpy python3-scipy
On Red Hat and clones (e.g. CentOS), install the dependencies using:
sudo yum -y install gcc gcc-c++ python3-devel numpy scipy
Linux compilers from conda-forge¶
Alternatively, install a recent version of the GNU C Compiler toolchain (GCC) in the user folder using conda:
conda create -n sklearn-dev -c conda-forge python numpy scipy cython \ joblib threadpoolctl pytest compilers conda activate sklearn-dev pip install --verbose --no-build-isolation --editable .
The clang compiler included in FreeBSD 12.0 and 11.2 base systems does not
include OpenMP support. You need to install the
openmp library from packages
sudo pkg install openmp
This will install header files in
/usr/local/include and libs in
/usr/local/lib. Since these directories are not searched by default, you
can set the environment variables to these locations:
export CFLAGS="$CFLAGS -I/usr/local/include" export CXXFLAGS="$CXXFLAGS -I/usr/local/include" export LDFLAGS="$LDFLAGS -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lomp"
Finally, build the package using the standard command:
pip install --verbose --no-build-isolation --editable .
For the upcoming FreeBSD 12.1 and 11.3 versions, OpenMP will be included in the base system and these steps will not be necessary.
pip install --verbose --editable .
will build scikit-learn using your default C/C++ compiler. If you want to build
scikit-learn with another compiler handled by
distutils or by
numpy.distutils, use the following command:
python setup.py build_ext --compiler=<compiler> -i build_clib --compiler=<compiler>
To see the list of available compilers run:
python setup.py build_ext --help-compiler
If your compiler is not listed here, you can specify it via the
LDSHARED environment variables (does not work on windows):
CC=<compiler> LDSHARED="<compiler> -shared" python setup.py build_ext -i
Building with Intel C Compiler (ICC) using oneAPI on Linux¶
Intel provides access to all of its oneAPI toolkits and packages through a public APT repository. First you need to get and install the public key of this repository:
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB rm GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
Then, add the oneAPI repository to your APT repositories:
sudo add-apt-repository "deb https://apt.repos.intel.com/oneapi all main" sudo apt-get update
Install ICC, packaged under the name
sudo apt-get install intel-oneapi-compiler-dpcpp-cpp-and-cpp-classic
Before using ICC, you need to set up environment variables:
Finally, you can build scikit-learn. For example on Linux x86_64:
python setup.py build_ext --compiler=intelem -i build_clib --compiler=intelem
It is possible to build scikit-learn compiled extensions in parallel by setting
and environment variable as follows before calling the
pip install or
python setup.py build_ext commands:
export SKLEARN_BUILD_PARALLEL=3 pip install --verbose --no-build-isolation --editable .
On a machine with 2 CPU cores, it can be beneficial to use a parallelism level of 3 to overlap IO bound tasks (reading and writing files on disk) with CPU bound tasks (actually compiling).