Question

searching performance in list is better than in set

Searching an element in a set takes O(1) time complexity while searching in a list takes O(n) time complexity. The n is the number of elements in a set or list.

In my test on the real searching time cost, the n is set to 1,000,000 which I believe is a big number. The test results are 48.3 msec in the set vs 28 msec in the list, which shows searching in a list is faster than searching in a set and contradicts the time complexity analysis. I tried increasing n and comparing the searching speed, and it turns out that the finding holds true until I get a memory error when n is very big.

(base) PS C:\Users> python -m timeit -n 100 "300 in set(range(1000000))"

100 loops, best of 5: 48.3 msec per loop

(base) PS C:\Users> python -m timeit -n 100 "300 in list(range(1000000))"

100 loops, best of 5: 28 msec per loop

time cost results

Could you help me understand why the performance of searching sequence in a list is better than in a set in Python?

I cannot understand.

 4  89  4
1 Jan 1970

Solution

 8

My original comment, which I was asked to incorporate into this answer (good idea!): Lookup time accounts for very little of what you're timing. You're mostly timing how long it takes just to do the set(range(1000000)) and list(range(1000000)) parts. Building the set/list to begin with are far more expensive than the lookup. Use the -s argument to take the setup cost out of what you're timing.

Elaborating on my comment, and since other answers didn't show the use of -s, here it is:

$ python -m timeit -n 100 -s "n = 1; c = set(range(1000000))" "n in c"
100 loops, best of 5: 46 nsec per loop

$ python -m timeit -n 100 -s "n = 1; c = list(range(1000000))" "n in c"
100 loops, best of 5: 45 nsec per loop

$ python -m timeit -n 100 -s "n = 10000; c = set(range(1000000))" "n in c"
100 loops, best of 5: 60 nsec per loop

$ python -m timeit -n 100 -s "n = 10000; c = list(range(1000000))" "n in c"
100 loops, best of 5: 71.4 usec per loop

The code given to -s is not timed. It lets you set up objects for the timed code to use. n in c is the only code timed now.

2024-07-05
Tim Peters

Solution

 3

I think set creation is taking most of the tim as it involves hashing each element, as mentioned by @tim-peters,.

Here, I have created set and list before performing a search operation and the search time seems to be a lot faster for sets than that of list.

import random
import timeit

def setup(n):
    data = list(range(n))
    random.shuffle(data)
    return set(data), data

def test_set(s, value):
    return value in s

def test_list(l, value):
    return value in l

n = 1000000
s, l = setup(n)

# Test with a value that's guaranteed to be in both
value_in = random.choice(l)

# Test with a value that's guaranteed not to be in either
value_out = n + 1

print("Searching for a value that exists:")
print("Set:", timeit.timeit(lambda: test_set(s, value_in), number=1000))
print("List:", timeit.timeit(lambda: test_list(l, value_in), number=1000))

print("\nSearching for a value that doesn't exist:")
print("Set:", timeit.timeit(lambda: test_set(s, value_out), number=1000))
print("List:", timeit.timeit(lambda: test_list(l, value_out), number=1000))

Result:

Searching for a value that exists:
Set: 0.00023545200008356915
List: 28.11379308300002

Searching for a value that doesn't exist:
Set: 0.0002436169999100457
List: 52.67750670600003
2024-07-05
Aananda Giri

Solution

 2

Python sets use hashmaps, which makes them really fast at lookups. But since hashmaps require you to hash every element, they take a long time to construct. Lists don't do any hashing, which makes construction much faster and lookups much slower.

Searching is pretty fast in either case anyways, so for just one lookup, execution time will be dictated mostly by construction. The set lookup will only outpace the list lookup if you're doing a whole lot of searches with the same set; otherwise, the construction difference will simply be too great.

You can verify that set/list creation is the bottleneck - as well as that set construction is slower than list construction - by removing the lookup from your timeit calls and just creating the set/list:

python -m timeit -n 100 "set(range(1000000))"

python -m timeit -n 100 "list(range(1000000))"

For me, the first call took about twice as long as the second, and there was very little difference between these calls and the ones that included 300 in.

2024-07-05
Andrew Yim