Question

Finding the subarray with the least median given a size K and an array of length N

I have been struggling the past month with this problem that was given to us on our course while I was upsolving it. The task is to find the window of size K with the least median in an array of integers. It is also worth noting that K will always be odd, so we need not be worried about even length sequences.

An example would be: [1,3,3,2,1], K = 3 we have [1,3,3], [3,3,2], [3,2,1] and sorting this gives us [1,3,3], [2,3,3], [1,2,3] and thus our answer is 2.

I have implemented a solution that uses a sliding window technique and then sorting each window but obviously, it was the slowest one but is still correct (was only given partial points). Another solution I will attach here is where I used the bisect module, but I think using the remove method increased the time complexity of my program. I wanted to know if there is any solution to this that may have a time complexity of O(n log k) time.

from bisect import insort, bisect_left
from typing import Sequence

def min_median(s: Sequence[int], m: int) -> int:
    n = len(s)
    result = 9**30
    window = sorted(s[:m])
    mid = m // 2

    result = window[mid]
    
    for i in range(m, n):
        insort(window, s[i])
        del window[bisect_left(window, s[i - m])]
        
        
        result = min(result, window[mid])
    
    return result
 2  82  2
1 Jan 1970

Solution

 3

To solve this in O(n log k), you need to find a way to keep finding the medians in O(log k) time since there are n - k + 1 subarrays that you need to consider. This requires a data structure where you can insert, delete and find median in O(log k) time for all operations as you slide the window through the array.

One way to do this would be to use AVL trees. For instance, you could use a modified AVL tree which also keeps track of size of the subtree at each node rather than just its height.

Alternatively, you could consider using 2 AVL trees: one for the lower half of the elements and one for the upper half. The idea is similar to the two-heaps (How to implement a Median-heap) approach but uses AVL trees for balanced and ordered data insertion and deletion. The benefit of using AVL over heaps is that it only takes O(log k) to delete from AVL, but O(k) to delete from a heap (unless you use an array to keep track of the pointers), which allows you to get a better overall time complexity.

2024-07-19
Naman Agrawal

Solution

 1

I think Naman Agrawal already pointed to this solution. Use a max hash heap and a min hash heap (hashed by their index in the array), of which the reusable code for one heap is just slightly longer than for a regular heap.

To create the first state, insert k elements into the min heap, then pop k / 2 elements and insert them into the max heap. The middle is now at the root of the min heap.

To update the window:

Remove the first element of the previous window from its place in the heaps.

If the heap with the smaller number of elements lost an element, pop the other heap and insert it in the first heap.

The two heaps now have an equal number of elements.

Insert the new element in the min heap if it is greater than or equal to that min, or in the max heap if it is smaller than or equal to that max. The middle is at the root of the heap with more elements.

2024-07-21
גלעד ברקן