Question

In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique *while preserving order*?

For example:

>>> x = [1, 1, 2, 'a', 'a', 3]
>>> unique(x)
[1, 2, 'a', 3]

Assume list elements are hashable.

Clarification: The result should keep the first duplicate in the list. For example, [1, 2, 3, 2, 3, 1] becomes [1, 2, 3].

 45  37341  45
1 Jan 1970

Solution

 33
def unique(items):
    found = set()
    keep = []

    for item in items:
        if item not in found:
            found.add(item)
            keep.append(item)
            
    return keep

print unique([1, 1, 2, 'a', 'a', 3])
2008-09-18

Solution

 21

Using:

lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18, 13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1, 5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9, 9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]

And using the timeit module:

$ python -m timeit -s 'import uniquetest' 'uniquetest.etchasketch(uniquetest.lst)'

And so on for the various other functions (which I named after their posters), I have the following results (on my first generation Intel MacBook Pro):

Allen:                  14.6 µs per loop [1]
Terhorst:               26.6 µs per loop
Tarle:                  44.7 µs per loop
ctcherry:               44.8 µs per loop
Etchasketch 1 (short):  64.6 µs per loop
Schinckel:              65.0 µs per loop
Etchasketch 2:          71.6 µs per loop
Little:                 89.4 µs per loop
Tyler:                 179.0 µs per loop

[1] Note that Allen modifies the list in place – I believe this has skewed the time, in that the timeit module runs the code 100000 times and 99999 of them are with the dupe-less list.


Summary: Straight-forward implementation with sets wins over confusing one-liners :-)

2008-09-18