Question

sorting algorithm in O(n) according a specific condition

Given an array A with n−1 numbers where n = 2^k for some integer k. One of the values appears exactly n/2 another appears exactly n/4, and so on. More formally, for all 1 ≤ k ≤ log n exists a value that appears exactly n/2^k times in the array. Describe an algorithm that sorts A with a run time complexity of Θ(n), and analyze its running time. example: input: 5,1,2,1,2,2,2 output: 1,1,2,2,2,2,5

some important pointers:

the time comlexity should be Worst case
using a dictionary is not allowed
the solution should use a stack or other basic data structures

I tried using counting sort but it didn't work because the array doesnt contain values starting from 0 and up. I also thought about using a hash map but the wanted time complexity would be expected and not worst case. thanks for the help in advance

3 134 3

1 Jan 1970

Solution

If 𝑛 is 2^𝑘, you could use this trick:

Get the majority element from the array in linear time (Boyer–Moore majority vote algorithm)
Remove that majority element from the array
As long as the array is not empty, repeat the above

This way we get to know the frequency of each value, can sort and reproduce the values in sorted order. Sorting is sublinear in terms of 𝑛, because we only have log𝑛 values to sort.

The first phase is also linear, because the array size halves at each step, so we do the work of O(𝑛 + 𝑛/2 + 𝑛/4 + ...) = O(2𝑛) = O(𝑛).

Implementation

Here is an implementation of that idea in a JavaScript snippet:

function getMajorityValue(arr) { 
    let counter = 0;
    let major = null;
    for (const value of arr) {
        if (counter == 0) major = value;
        counter += value == major ? 1 : -1;
    }
    return major;
}

function sort(arr) {
    const byFreq = []; // Collect distinct values by their frequencies
    while (arr.length > 0) {
        // Get majority element in current array
        let value = getMajorityValue(arr); // O(length of arr)
        // ...and then remove that majority element
        arr = arr.filter(x => x != value); // O(length of arr)
        // Keep track of it
        byFreq.push(value);
    }
    // Put the least frequent first, most frequent last
    byFreq.reverse();
    // Now byFreq[0] occurs 1 time, byFreq[1] occurs 2 times, byFreq[2] occurs 4 times,...

    // Get the sorted indices, keeping byFreq untouched
    const indices = [...byFreq.keys()].sort((i, j) => byFreq[i] - byFreq[j]); // O(logn log(logn))

    // Populate result array
    const result = [];
    for (const i of indices) {
        const multiples = 1 << i; // 2**i
        result.push(...Array(multiples).fill(byFreq[i]));
    }
    return result;
}

const arr = [5,1,2,1,2,2,2];
const result = sort(arr);
console.log(result);

2024-07-05

trincot

Solution

According to your setup, there are log(n) different numbers.

In O(n), map each element to its count (parse the array).
In O(log(n)*log(log(n))), sort the keys (which are the elements).
In O(n), populate the output array in sorted order, with repetition based on the counts.

2024-07-05

Dave

Solution

Just do a merge-sort that knows how to deal with duplicates.

Why does this work? Well, at the second pass, at least half of the most common value will be absorbed into a duplicate. At every pass remaining, half the remainder will work. This means that the average copy of the most common value only lasts for 3 passes.

The second most common value starts the cascade at 3, and the average disappears at 4.

And so on.

If you calculate this out, this means that the average element can expect to be needed for 3 passes, half needs another, a quarter needs another, and so on. This turns into 4 passes on average. Which is O(n).

Note that we are not particularly using the power of 2 setup. We're just absorbing duplicates. Which means that we'll also perform well in a wide range of situations where relatively few values tend to appear a relatively large number of times. A common situation where that will apply is whenever Zipf's law describes the frequency of the most common elements.

The biggest downside is that handling duplicates requires a lot of extra tracking. But here is a simple version in Python that should do reasonably well.

from collections import deque

def mergesort (things):
    values = deque(things)
    counts = deque((1 for _ in things))
    group_sizes = deque((1 for _ in things))

    while 1 < len(group_sizes):
        i = 0
        group_i = [values.popleft() for _ in range(group_sizes.popleft())]
        counts_i = [counts.popleft() for _ in group_i]
        j = 0
        group_j = [values.popleft() for _ in range(group_sizes.popleft())]
        counts_j = [counts.popleft() for _ in group_j]

        size = 0
        while i < len(group_i) and j < len(group_j):
            if group_i[i] < group_j[j]:
                values.append(group_i[i])
                counts.append(counts_i[i])
                i += 1
            elif group_j[j] < group_i[i]:
                values.append(group_j[j])
                counts.append(counts_j[j])
                j += 1
            else:
                values.append(group_i[i])
                counts.append(counts_i[i] + counts_j[j])
                i += 1
                j += 1
            size += 1

        while i < len(group_i):
            values.append(group_i[i])
            counts.append(counts_i[i])
            i += 1
            size += 1

        while j < len(group_j):
            values.append(group_j[j])
            counts.append(counts_j[j])
            j += 1
            size += 1

        group_sizes.append(size)

    answer = []
    for i in range(len(values)):
        for j in range(counts[i]):
            answer.append(values[i])
    return answer

2024-07-05

btilly

Solution

I suspect there is a better approach that makes use of the "n/2, n/4, ...etc" property, but wouldn't it be possible to use counting sort after modifying the input to be entirely non-negative?

For example: iterate over the input, finding the smallest number x that is less than zero, then iterate a second time while constructing a modified list that adds |x| to all values. Then, you can use counting sort on this modified list consisting of only non-negative numbers. Since the problem states that the input will only be numbers and not other types of sortable things (like letters), this should be feasible.

This will, of course, affect the runtime by adding at least two iterations over n-1 items (three if you separately construct the final result by iterating over the counting sort output and subtracting |x|, instead of doing so within the counting sort), but it should remain within the bounds of O(n + k), where k is the largest value in the initial input plus |x|. You do sacrifice at least k space, which may be fairly large, but the problem as stated does not impose a space limit.

2024-07-05

RotundChinchilla