4 Python Packages, Virtual Environments and Dependency Management

4 Python Packages, Virtual Environments and Dependency Management#

As a developer, data scientist or data engineer, you will often find yourself working on projects that depend on external libraries. For example, consider a project that uses the pandas library for data manipulation and the scikit-learn library for machine learning. Both of these libraries have their dependencies, and these dependencies may have their dependencies, and so on.

Furthermore, it is common for different projects to require different versions of the same library. For example, you may decide to use the latest pandas version for a new project which introduced a new feature that you need while an older existing project may rely on a feature from an older pandas version that is now deprecated. It wouldn’t be ideal to have to uninstall and reinstall pandas every time you switch between projects.

Wouldn’t it be great if you could have taylor made environments for each project with the exact packages that each project needs? This is where the concept of Virtual Environments comes in. Virtual environments allow you to create isolated development spaces, enabling different versions of libraries to coexist on the same machine without conflict.

Imagine now that some new colleagues join your project. They will need to install the same libraries and the same versions of those libraries that you are using. This can be a daunting task, especially if the project has many dependencies. This is where dependency management tools come in. The idea is to make it easy to track all the specific versions of the libraries that your project depends. This way, you can easily share the exact environment needed to run your project with your colleagues.

In this chapter, we will learn how to manage the requirements of your Python projects. We will start by learning how to manage different Python versions using Conda, a popular environment and package manager and then we will learn how to structure your project and manage virtual environments using Poetry, a modern dependency management tool for Python.