Question
Python dependency hell: A compromise between virtualenv and global dependencies?
I've tested various ways to manage my project dependencies in Python so far:
- Installing everything global with pip (saves spaces, but sooner or later gets you in trouble)
- pip & venv or virtualenv (a bit of a pain to manage, but ok for many cases)
- pipenv & pipfile (a little bit easier than venv/virtualenv, but slow and some vendor-lock, virtual envs hide somewhere else than the actual project folder)
- conda as package and environment manager (great as long as the packages are all available in conda, mixing pip & conda is a bit hacky)
- Poetry - I haven't tried this one
- ...
My problem with all of these (except 1.) is that my harddrive space fills up pretty fast: I am not a developer, I use Python for my daily work. Therefore, I have hundreds of small projects that all do their thing. Unfortunately, for 80% of projects I need the "big" packages: numpy
, pandas
, scipy
, matplotlib
- you name it. A typical small project is about 1000 to 2000 lines of code, but has 800MB of package dependencies in venv/virtualenv/pipenv. Virtually I have about 100+ GB of my HDD filled with python virtual dependencies.
Moreover, installing all of these in each virtual environment takes time. I am working in Windows, many packages cannot be easily installed from pip in windows: Shapely
, Fiona
, GDAL
- I need the precompiled wheels from Christoph Gohlke. This is easy, but it breaks most workflows (e.g. pip install -r requirements.txt
or pipenv install
from pipfile). I feel like I am 40% installing/updating package dependencies and only 60% of my time writing code. Further, none of these package managers really help with publishing & testing code, so I need other tools e.g. setuptools
, tox
, semantic-release
, twine
...
I've talked to colleagues but they all face the same problem and no one seems to have a real solution. I was wondering if there is an approach to have some packages, e.g. the ones you use in most projects, installed globally - for example, numpy
, pandas
, scipy
, matplotlib
would be installed with pip in C:\Python36\Lib\site-packages
or with conda
in C:\ProgramData\Miniconda3\Lib\site-packages
- these are well developed packages that don't often break things. And if, I would like to fix that anyway soon in my projects.
Other things would go in local virtualenv-folders - I am tempted to move my current workflow from pipenv
to conda
.
Does such an approach make sense at all? At least there has been a lot of development lately in python, perhaps something emerged that I didn't see yet.
Is there any best-practice guidance on how to setup files in such a mixed global-local environment, e.g. how to maintain setup.py
, requirements.txt
or pyproject.toml
for sharing development projects through Gitlab, Github etc.? What are the pitfalls/caveats?
There's also this great blog post from Chris Warrick that explains it pretty much fully.
[Update 2020]
After half a year, I can say that working with Conda (Miniconda) has solved most of my problems:
- it runs on every system, WSL, Windows, native Linux etc.
conda env create -f myenv.yml
is the same on every platform - most packages are already available on conda-forge, it is easy to get own packages accepted on conda-forge
- for those packages not on conda, I can install
pip
in conda environment and add packages from pypi with pip. Hint:conda update --all -n myenv -c conda-forge
will only update packages from conda, not those installed withpip
. Pip installed dependencies must be updated manually withpip install pack_name --upgrade
. Note that installing packages with pip in conda is an emergency solution that should typically be avoided - I can create strict or open
environment.yml
, specifying the conda channel priority, the packages from conda and the packages from pip - I can create conda environments from those ymls in a single statement, e.g. to setup a dev environment in Gitlab Continuous Integration, using the
Miniconda3 Docker
- this makes test-runs very simple and straight forward - package versions in
yml
s can be defined strict or open, depending on the situation. E.g. you can fix the env to Python 3.6, but have it retrieve any security updates in this version-range (e.g. 3.6.9) - I found that conda solves almost all problems with c-compiled dependencies in Windows; conda env's in Windows do allow freezing python code into an executable (tested!) that can be distributed to Windows end-users who cannot use package managers for some reason.
- regarding the issue with "big dependencies": I ended up creating many specific (i.e. small) and a few unspecific (i.e. big) conda environments: For example, I have a quite big
jupyter_env
, where jupyter lab and most of my scientific packages are installed (numpy, geos, pandas scipy etc.) - I activate it whenever I need access to these tools, I can keep those up to date in a single place. For development of specific packages, I have extra environments that are only used for the package-dependencies (e.g.packe1_env
). I have about 10 environemnts overall, which is manageable. Some general purpose tools are installed in the base conda environment, e.g.pylint
. Be warned: to make pylint/pycodestyle/autopep8 etc. work (e.g.) in VS Code, it must be installed to the same env that contains the python-code-dependencies - otherwise, you'll get unresolved import warnings - I installed miniconda with Chocolatey package manager for windows. I keep it up to date with
conda update -n base conda
, and my envs withconda update --all -n myenv -c conda-forge
once a week, works like a charm! - New Update: there's a
--stack
flag available (as of 2019-02-07) that allows stacking conda environments, e.g.conda activate my_big_env
thenconda activate --stack dev_tools_env
allows making some general purpose packages available in many envs. However, use with caution - I found that code linters, such as pylint, must be in the same env as the dependencies of the code that is linted - New Update 2: I started using
conda
fromWindows Subsystem for Linux
(WSL), this improved again my workflow significantly: packages are installed faster, I can work with VS Code Insiders in Windows directly connected to WSL and there're far less bugs with python packages in the Linux environment. - Another Update on a side note, the Miniconda Docker allows converting local conda env workflows flawlessly into containerized infrastructure (CI & CD), tested this for a while now and pretty happy with it - the Dockerfile is cleaner than with Python Docker because conda manages more of the dependency work than pip does. I use this nowadays more and more, for example, when working with jupyter lab, which is started from within a container.
- yes, I stumbled into compatibility problems between certain packages in a conda env, but very rarely. There're two approaches: if it is an important env that must work stable, enable
conda config --env --set channel_priority strict
- this will only install versions that are compatible. With very few and rare package combinations, this may result in unsolvable dependency conflicts (i.e. the env cannot be created). In this case, I usually create smaller envs for experimental development, with less packages andchannel_priority
set toflexible
(the default). Sometimes, package subsets exists that are easier to solve such asgeoviews-core
(instead ofgeoviews
) ormatplotlib-base
(instead ofmatplotlib
). It's also a good approach to try lower python versions for those experimental envs that are unsolvable withstrict
, e.g.conda create -n jupyter_exp_env python=3.6 -c conda-forge
. A last-resort hack is installing packages with pip, which avoids conda's package resolver (but may result in unstable environments and other issues, you've been warned!). Make sure to explicitly installpip
in your env first. - One overall drawback is that conda gets kind of slow when using the large conda-forge channel. They're working on it, but at the same time conda-forge index is growing really fast.
[Update 2021]
Since this post still gets many views, here is a subjective 2021 update:
- if you are in data science, (mini)conda is still worth a look
- otherwise, Poetry and the
pyproject.toml
seem to be the common agreed upon denominator
[Update 2023]
I am slowly moving away from conda. pip
+venv
seem to be the more viable option and it frequently works better and faster (e.g. pytorch, transformers). Forget Windows: pip only works well in WSL/Linux. For package maintainers, setuptools>64
now allows pyproject.toml
-only based packages, finally a unified packaging experience! Get rid of your setup.py
's..