Question

Do Python coders have a bias towards list over tuple?

Basic Facts

Lists are mutable (supporting inserts, appending etc.), Tuples are not
Tuples are more memory efficient, and faster to iterate over

So it would seem their use-cases are clear. Functionally speaking, lists offer a superset of operations, tuples are more performant at what they do.

Observation

Most arrays that my team creates in the course of a program, are in fact, perfectly fine as immutable. We iterate over them, apply map, reduce, filter on them, may be insert into a database from them etc. all without insertion, popping or appending on the array-like-structure.

Question

Yet, a list seems to be not only the default (and only) choice among my developers, but seems even favoured by many library APIs to pass data around (like polars, tensorflow etc. which I use heavily).

And not even like using Tuples require some special skill, knowledge or understanding another-data-structure, it's really the same in terms of necessary syntax to subscript, or iterate.

What am I missing in the reasoning here?

7 122 7

1 Jan 1970

Solution

It's not a matter of bias. By convention, lists are used for homogeneous data, and tuples are used for heterogeneous data, unless a requirement for mutability or hashability forces the opposite.

You can see this convention stated in places like the documentation for built-in types, where lists are described as

mutable sequences, typically used to store collections of homogeneous items

and tuples are described as

immutable sequences, typically used to store collections of heterogeneous data

So for example, if seq[2] represents "the third thing" and seq[3] represents "the fourth thing", you use a list, while if seq[2] represents "income" and seq[3] represents "birthday", you use a tuple.

2024-07-08

user2357112

Solution

I've been using Python since the start, and the intended use cases between lists and tuples have always been a bit fuzzy. Year after year, though, I tend to use tuples more.

While I have no compelling way to argue this case, I always thought a lot of it came down to dislike of the syntax in the first simple cases a programmer tried. Parentheses are "overused" in Python's syntax.

x = ()

looks more like a syntax error at first ("which function did they intend to call?"), while

y = 42,

still looks like a syntax error to my eyes ;-)

The corresponding cases for lists are self-evident at first sight:

x = []
y = [42]

"Readability counts", and first impressions are hard to shake off.

EDIT: BTW, there's another underappreciated reason to use tuples for "very large" sequences, when possible: when CPython's cyclic gc runs and determines that a tuple, and all its components, are immutable "all the way down", the entire tuple is exempted from being scanned in future runs of cyclic gc (it's been proved that it can never become part of a cycle). The same isn't true of lists. Even if a list is immutable "all the way down", there's nothing to stop the programmer from doing, e.g., L[0] = L next, making it part of a cycle.

Exempting large sequences from being scanned by cyclic gc can save lots of cycles in long-running programs. For that reason, e.g., I routinely create tuples with millions of ints rather than use lists.

Example:

>>> import gc
>>> t = tuple(range(100))
>>> gc.is_tracked(t)
True
>>> gc.collect()
0
>>> gc.is_tracked(t) # gc determined `t` can never be in a cycle
False

2024-07-08

Tim Peters