The sources are easier to navigate if you're familiar with some history of Python. Specifically, the type str
was historically called unicode
in CPython and is still called unicode in the C sources.
So, for string methods:
Python str.rfind
will eventually get to C unicode_rfind_impl
. In the main branch you can find auto-generated declarations at Objects/clinic/unicodeobject.c.h#L1074-L1116
and impl at Objects/unicodeobject.c#L12721-L12731
..
Note: These declarations generated Argument Clinic are a relatively recent development from gh-117431: Adapt str.find and friends to use the METH_FASTCALL calling convention (Apr 2024), and they are not used in any officially released version yet. For current 3.12.4 you should look in unicodeobject.c
directly.
You'll note that unicode_rfind_impl
calls any_find_slice
, passing in the direction -1
. This direction <= 0
is used to select a specific implementation depending on the width of the underlying unicode:
asciilib_rfind_slice
(for both strings ASCII)
ucs1lib_rfind_slice
ucs2lib_rfind_slice
ucs4lib_rfind_slice
These calls end up in stringlib routines (stringlib/find.h:rfind_slice
-> stringlib/find.h:rfind
-> stringlib/fastsearch.h:FASTSEARCH
). Then, for the special case of 1-char substrings, we continue to stringlib/fastsearch.h:rfind_char
and eventually end up here where CPython seems to use either memrchr or reverse iteration, depending on the glibc version. For longer substrings we go to stringlib/fastsearch.h:default_rfind
, implemented here, which looks like some sort of Boyer-Moore algo with a bloom filter. An old effbot page describes an earlier version of this code as a "simplication (sic) of Boyer-Moore, incorporating ideas from Horspool and Sunday ... (with) ... a simple Bloom filter", but the implementation detail may have shifted somewhat since then (2006).
Finally, you can use stdlib timeit interactively to emprically verify that str.rfind
does not meaningfully slow down when dealing with longer strings. Taken by itself, this does not guarantee there is a reverse iteration, but it's certainly evidence that the implementation isn't just a naive iteration from the start.
>>> s1 = 'x' * 1000 + 'y'
>>> s2 = 'x' * 1_000_000 + 'y'
>>> timeit s1.rfind('y')
68.7 ns ± 0.183 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> timeit s2.rfind('y')
68.9 ns ± 0.00835 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Compare with putting the 'y' at the start, where we go from nanos to micros:
>>> s1 = 'y' + ('x' * 1000)
>>> s2 = 'y' + ('x' * 1_000_000)
>>> timeit s1.rfind('y')
73.6 ns ± 0.00866 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> timeit s2.rfind('y')
22.3 μs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)