I thought it would be useful to actually time some of the solutions presented here on a larger sample input. For this input and on my machine, I find Cardstdani's approach to be the fastest, followed by the numpy
isin()
approach.
Setup 1
import random
list_1 = [random.randint(1, 10_000) for i in range(100_000)]
list_2 = [random.randint(1, 10_000) for i in range(100_000)]
Setup 2
list_1 = [random.randint(1, 10_000) for i in range(100_000)]
list_2 = [random.randint(10_001, 20_000) for i in range(100_000)]
Timings - ordered from fastest to slowest (setup 1).
Cardstdani - approach 1
I recommend converting Cardstdani's approach into a list comprehension (see this question for why list comprehensions are faster)
s = set(list_2)
booleans = [i in s for i in list_1]
# setup 1
6.01 ms ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2
4.19 ms ± 27.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
No list comprehension
s = set(list_2)
booleans = []
for i in list_1:
booleans.append(i in s)
# setup 1
7.28 ms ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2
5.87 ms ± 8.19 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Cardstdani - approach 2 (with an assist from Timus)
common = set(list_1) & set(list_2)
booleans = [item in common for item in list_1]
# setup 1
8.3 ms ± 34.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2
6.01 ms ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Using the set intersection
method
common = set(list_1).intersection(list_2)
booleans = [item in common for item in list_1]
# setup 1
10.1 ms ± 29.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2
4.82 ms ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
numpy approach (crissal)
a1 = np.array(list_1)
a2 = np.array(list_2)
a = np.isin(a1, a2)
# setup 1
18.6 ms ± 74.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2
18.2 ms ± 47.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2 (assuming list_1, list_2 already numpy arrays)
10.3 ms ± 73.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
list comprehension
l = [i in list_2 for i in list_1]
# setup 1
4.85 s ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# setup 2
48.6 s ± 823 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sharim - approach 1
booleans = list(map(lambda e: e in list_2, list_1))
# setup 1
4.88 s ± 24.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# setup 2
48 s ± 389 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Using the __contains__
method
booleans = list(map(list_2.__contains__, list_1))
# setup 1
4.87 s ± 5.96 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# setup 2
48.2 s ± 486 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sharim - approach 2
set_2 = set(list_2)
booleans = list(map(lambda e: set_2 != set_2 - {e}, list_1))
# setup 1
5.46 s ± 56.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# setup 2
11.1 s ± 75.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Varying the length of the input
Employing the following setup
import random
list_1 = [random.randint(1, n) for i in range(n)]
list_2 = [random.randint(1, n) for i in range(n)]
and varying n
in [2 ** k for k in range(18)]
:
Employing the following setup
import random
list_1 = [random.randint(1, n ** 2) for i in range(n)]
list_2 = [random.randint(1, n ** 2) for i in range(n)]
and varying n
in [2 ** k for k in range(18)]
, we obtain similar results:
Employing the following setup
list_1 = list(range(n))
list_2 = list(range(n, 2 * n))
and varying n
in [2 ** k for k in range(18)]
:
Employing the following setup
import random
list_1 = [random.randint(1, n) for i in range(10 * n)]
list_2 = [random.randint(1, n) for i in range(10 * n)]
and varying n
in [2 ** k for k in range(18)]
: