Question

Using hyphen/dash in python repository name and package name

I am trying to make my git repository pip-installable. In preparation for that I am restructuring the repo to follow the right conventions. My understanding from looking at other repositories is that I should put all my source code in a package that has the same name as the repository name. E.g. if my repository is called myrepo, then the source code would all go into a package also called myrepo.

My repository has a hyphen in it for readability: e.g. my-repo. So if I wanted to make a package for it with the same name, it would have a hyphen in it as well. In this tutorial it says "don't use hyphens" for python package names. However I've seen well-established packages such as scikit-learn that have hyphens in their name. One thing that I have noticed though is that in the scikit-learn repo, the package name is not the same as the repo name and is instead called sklearn.

I think my discussion above boils down to the following questions:

  1. When packaging a repo, what is the relationship between the repository's name and the package's name? Is there anything to beware of when having names that don't match?
  2. Is it okay to have hyphens in package names? What about in repository names?
  3. If the package name for scikit-learn is sklearn, then how come when I install it I do pip install scikit-learn instead of pip install sklearn?
 46  20214  46
1 Jan 1970

Solution

 86

To answer your 1st point let me rephrase my answer to a different question.

The biggest source of misunderstanding is that the word "package" is heavily overloaded. There are 4 different names in the game — the name of the repository, the name of the directory being used for development (the one that contains setup.py), the name of the directory containing __init__.py and other importable modules, the name of distribution at PyPI. Quite often these 4 are the same or similar but that's not required.

The names of the repository and development directory can be any, their names don't play any role. Of course it's convenient to name them properly but that's only convenience.

The name of the directory with Python files name the package to be imported. Once the package is named for import the name usually stuck and cannot be changed.

The name of the distribution gives one a page at PyPI and the name of distribution files (source distribution, eggs, wheels). It's the name one puts in setup(name='distribution') call.

Let me show detailed real example. I've been maintaining a templating library called CheetahTemplate. I develop it in the development directory called cheetah3/. The distribution at PyPI is called Cheetah3; this is the name I put into setup(name='Cheetah3'). The top-level module is Cheetah hence one does import Cheetah.Template or from Cheetah import Template; that means that I have a directory cheetah3/Cheetah/.

The answer to 2 is: you can have dashes in repository names and PyPI distribution names but not in package (directories with __init__.py files) names and module (.py files) names because you cannot write in Python import xy-zzy, that would be subtraction and SyntaxError.

Point 3: The site and the repository names are scikit-learn, as well as the distribution name, but the importable package (the top-level directory with __init__.py) is sklearn.

PEP 8 has nothing to do with the question as it doesn't talk about distribution, only about importable packages and modules.

2019-02-08

Solution

 2

To summarize various quotes I've found:

https://jgbarah.github.io/presentations/pip-packages/#nomenclature:

  • Project: the directory containing the source code to be packaged, including the modules, the scripts, etc.
  • Module: a file that can be imported by a Python script or other module.
  • Package: a collection of modules, usually as a hierarchy of directories
  • Distribution: a distributable version of a project, usually including a collection of modules (usually, structured as a package), scripts, etc. In many contexts, “distributions” are named “packages”, but I try not to use the term in that sense here to avoid confusion with “package” as a collection of modules (see above).

https://dagster.io/blog/python-project-best-practices:

The top folder, called "my-project," is like the main folder for the entire project. The use of the dash and the underscore help differentiate between the two levels of the project. The underscore, in particular, helps distinguish the project reference from a variable or function name that might use the dash symbol. Because a dash is also a minus sign, the "inner level", which defines a python package, must use the underscore.

https://jgbarah.github.io/presentations/pip-packages/:

If you want to use names with more than one word, use _ (underscore) to separate the words in the package name, and - (hyphen) to separate the words in the distribution name. Other combinations may work, but this one seems to be customary, and minimizes surprises for users.

In the rest of this document, I will use pkg_name for the package (collection of modules) name and dist-name for the distribution name (distributable package).

https://moduscreate.com/blog/github-semantic-naming/:

In response to the question: “Which separator between words in a repository name do you prefer? Note this is the repository name, not the display name.” we discovered the following breakdown.

Github Semantics BreakdownThe overall results here show that hyphens are by far the most popular separator.

2024-06-29