Question

Smallest Integer Problem Code Performance

I am starting Python and I came across this practice test where I have to find the smallest positive integer that does not occur in a list.

Examples:

  1. arr = [8, 2, 1, 4, 3, 5] # output: 6
  2. arr = [-3, 1, 1, -7, 5, 3] # output: 2
  3. arr = [-11, -1, 3] # output: 1

Here's my solution:

def solution(A):
    val = 1
    while True:
        try:
            idx = A.index(val)
            val += 1
        except ValueError:
            break
    return val

According to my results, I only got 1 out of the 4 performance tests. Could someone help me understand why?

 4  191  4
1 Jan 1970

Solution

 3

You need to analyze time complexity to understand the bottle neck. Suppose K is the smallest integer you are looking for, and N is the length of your list. Then your code has complexity O(K*N), because for each iteration over K you are searching the list which takes O(N).

Now let's think about the best conceivable runtime (BCR). To find the integer you are inevitable (with a random list) need to iterate through the whole list, which is O(N). All the other parts might be dropped probably depending on the implementation. So our BCR = O(N). Most probably you need to target this time complexity to have all performance tests to pass.

Now to the solution. I don't think it's optimal, but it will definitely improve your version. You can convert list to the hashset. The search operation in hashset is O(1), so your solution will become O(K+N). The downside is that your space complexity will increase from O(1) to O(N)

def solution(A):
    B = set(A)
    val = 1
    while True:
        if val in B:
            val += 1
        else:
            break
    return val
2024-07-16
Kliment Merzlyakov

Solution

 3

The problem is that A.index(val) sequentially searches the list until it finds val (or not), implicitly creating another while loop within your while loop, giving O(n²) running time.

An alternative approach is to convert your list of integers to a set before further processing.

def solution(A):
    S = set(A)
    val = 1
    while True:
        if val not in S:
            return val
        val += 1

The set class is implementing using a hash table, which allows membership tests (in or not in) to be performed very quickly, without having to loop through all of the values in the set.

2024-07-16
dan04

Solution

 2

The question is "why is this solution bad."

The answer is A.index(val) has to iterate through the list in order to find the position of val. It has to do this for each and every int, until you get an answer. This of course makes this an O(n^2) worst case solution. The correct answer will be O(n).

This is actually a pretty common problem when working with lists. If you are iterating a loop, and within that loop you use a list member that iterates over the list, then you have a double loop, even though it doesn't look like it.

2024-07-16
Kenny Ostrom

Solution

 2

@KlimentMerzlyakov's solution improves the average time complexity from O(n ^ 2) to O(n) at the cost of space having to store all the numbers in a set.

You can improve the space overhead while keeping the time complexity at O(n) by storing each positive integer as a set bit in a binary number instead. Then invert the number so the lowest set bit would be at the the smallest positive integer. You can get the lowest set bit with the algorithm from this answer.

As @Ry- pointed out in the comment, however, since integers are immutable in Python and have to be copied for each operation, storing the bits as an integer would result in a time complexity of O(n ^ 2). Instead, you can store the bits in a bytearray to perform operations in-place, and then iteratively find the index of the lowest byte with an unset bit to calculate the position of the lowest unset bit, so that each additional number would only cost 1 bit of space overhead while the overall time complexity remains at O(n):

def solution(A):
    size = len(A)
    b = bytearray((size >> 3) + 1)
    for i in A:
        if 0 < i <= size:
            i -= 1
            b[i >> 3] |= 1 << (i & 7)
    for i, byte in enumerate(b):
        if byte != 255:
            break
    byte = ~byte
    return (i << 3) + (byte & -byte).bit_length()

so that:

for arr in (
    list(range(1000)) + list(range(1001, 2000)),
    [8, 2, 1, 4, 3, 5],
    [-3, 1, 1, -7, 5, 3],
    [-11, -1, 3]
):
    print(solution(arr))

outputs:

1000
6
2
1

Demo here

2024-07-17
blhsing

Solution

 0

While not the fastest solution, sorting is an option that will run in O(1) memory, O(n log n) time, assuming you can do it in-place. This may make it more practically viable than using a set for very large inputs.

def min_pos_int(arr):
    arr.sort()
    if arr[0] > 1: return 1
    for i in range(len(arr) - 1):
        if arr[i] >= 1 and arr[i + 1] > arr[i] + 1:
            return arr[i] + 1
        elif arr[i] < 1 and arr[i + 1] > 1:
            return 1
    else:
        return max(1, arr[-1] + 1)

If you implement the sort yourself, you can optimize quite a bit further. For example, in a qsort, you can avoid recursing into the negative partitions, and will know the beginning of the positive side immediately.

2024-07-17
Mad Physicist