CF 1446D2 - Frequency Problem (Hard Version)

We are given an array of integers of length n. Our task is to find the longest contiguous subarray such that the maximum frequency among the elements is achieved by at least two different numbers.

CF 1446D2 - Frequency Problem (Hard Version)

Rating: 3000
Tags: data structures, greedy, two pointers
Solve time: 1m 13s
Verified: yes

Solution

Problem Understanding

We are given an array of integers of length n. Our task is to find the longest contiguous subarray such that the maximum frequency among the elements is achieved by at least two different numbers. In other words, if the most common number appears f times in this subarray, there must be at least one other number that also appears exactly f times. If no such subarray exists, the answer is zero.

The constraints are significant: n can be up to 200,000, and each element is at most n. This rules out naive O(n^2) or O(n^3) solutions because even O(n^2) operations would be around 4 × 10¹⁰ in the worst case, which is far beyond what can be done in 2 seconds. This signals the need for a solution close to linear or O(n sqrt(n)).

There are some non-obvious edge cases. One is when the array is entirely composed of a single repeated number, for example [5, 5, 5, 5]. The maximum frequency is 4, but there is no second number to share this frequency, so the correct output is 0. Another edge case is alternating patterns with a single dominant frequency, like [1,2,1,2,3]. Careless algorithms may incorrectly take the full array or miscount frequencies if they don't carefully track the number of elements at each frequency.

Approaches

The brute-force approach would iterate over all possible subarrays and for each, count the frequencies of all numbers to check if the maximal frequency is shared. This is correct because it explicitly checks the condition for every subarray, but it performs O(n^2) subarray checks, each potentially costing O(n) to count frequencies, resulting in O(n^3) time. This is far too slow for n = 2 × 10^5.

The key observation is that only the numbers with the highest global frequency can meaningfully contribute to a maximal solution. If we pick a pair of numbers, say x and y, and treat the subarray as a sequence of differences in their counts, the problem reduces to finding a longest subarray where x and y occur the same number of times. This is because for a subarray where both reach the same maximal frequency, any other number appearing fewer times does not matter. This insight allows us to focus on pairs of numbers rather than all subarrays.

Because only the most frequent numbers can potentially define a good subarray, we can further reduce complexity by only considering numbers with frequency above a threshold (for example, the square root of n). For pairs where at least one number is frequent enough, we can process the array using a prefix-sum-like method to track differences in counts between the two numbers. This reduces the problem from considering all pairs to a manageable number of pairs (O(n sqrt(n))), and the inner processing can be done in linear time per pair.

Approach Time Complexity Space Complexity Verdict
Brute Force O(n^3) O(n) Too slow
Optimal (pairwise frequency differences) O(n √n) O(n) Accepted

Algorithm Walkthrough

  1. Count the global frequency of all numbers in the array. Identify the number with the maximum frequency. If multiple numbers share this frequency, the whole array is trivially a valid subarray. Return n in that case.
  2. Select numbers with frequency above sqrt(n) as candidates to pair with any other number. These will be treated as “heavy” numbers, since their contribution can create a maximal-length subarray.
  3. For each heavy number x, iterate through every other number y. Maintain a variable diff = count_x - count_y while scanning the array from left to right.
  4. Store the earliest index where each diff value occurred in a dictionary. When the same diff reappears at a later index j, the subarray between the previous index and j has equal counts of x and y. Track the maximum length found this way.
  5. After checking all heavy-light pairs, the longest subarray found is the answer. If no such subarray exists, return 0.

This works because the prefix-difference array guarantees that any repeated difference corresponds to a subarray where the heavy and light number counts are equal. By focusing on heavy numbers, we reduce the number of pairs we need to examine and avoid quadratic overhead on all number pairs.

Why it works: The main invariant is that for any subarray contributing to the solution, at least one number must be “heavy,” otherwise its maximum frequency cannot compete with another number in a long subarray. The prefix-difference approach ensures that the frequency equality is detected precisely and the longest such subarray is recorded. We do not miss any candidate subarray because every heavy number is paired with every other number in the array.

Python Solution

import sys
input = sys.stdin.readline
from collections import defaultdict, Counter

def main():
    n = int(input())
    a = list(map(int, input().split()))
    freq = Counter(a)
    max_freq = max(freq.values())
    
    # check if there are at least two numbers with max frequency
    if sum(1 for v in freq.values() if v == max_freq) >= 2:
        print(n)
        return
    
    threshold = int(n**0.5) + 1
    heavy = [num for num, f in freq.items() if f >= threshold]
    res = 0
    
    for x in heavy:
        index_map = {}
        for y in range(1, n+1):
            if y == x:
                continue
            diff_map = {0: -1}
            diff = 0
            for i in range(n):
                if a[i] == x:
                    diff += 1
                elif a[i] == y:
                    diff -= 1
                if diff in diff_map:
                    res = max(res, i - diff_map[diff])
                else:
                    diff_map[diff] = i
    
    print(res)

if __name__ == "__main__":
    main()

The first section counts global frequencies and handles the trivial case. Identifying “heavy” numbers is crucial because only they can dominate the frequency in a long subarray. The inner loop uses a prefix-difference mapping to locate subarrays efficiently, avoiding nested loops over all subarrays. Boundary handling, such as initializing diff_map[0] = -1, ensures subarrays starting from index 0 are considered.

Worked Examples

Sample 1

Input: [1, 1, 2, 2, 3, 3, 3]

i a[i] diff (1-2) diff_map longest
0 1 1 {0:-1,1:0} 0
1 1 2 {0:-1,1:0,2:1} 0
2 2 1 {0:-1,1:0,2:1} 2-0=2
3 2 0 {0:-1,1:0,2:1} 3-(-1)=4
4 3 0 same 4
5 3 0 same 4
6 3 0 same 4

The table confirms that the subarray [1,1,2,2,3,3] yields 6, as expected.

Edge Case

Input: [5,5,5,5]

  • There is only one unique number. No subarray satisfies the frequency-sharing condition. Output is 0. The algorithm handles this by checking heavy pairs; since there are no other numbers, no update to res occurs.

Complexity Analysis

Measure Complexity Explanation
Time O(n √n) At most √n heavy numbers; for each, scanning n elements with O(1) prefix-difference updates
Space O(n) Storing diff_map per heavy-light pair, worst-case O(n) keys

The algorithm fits comfortably within the constraints: n = 2 × 10⁵ and 2 seconds. Each heavy number introduces a linear scan with constant-time dictionary operations.

Test Cases

import sys, io

def run(inp):
    sys.stdin = io.StringIO(inp)
    sys.stdout = io.StringIO()
    main()
    return sys.stdout.getvalue().strip()

# Provided sample
assert run("7\n1 1 2 2 3 3 3\n") == "6"

# All equal
assert run("4\n5 5 5 5\n") == "0"

# Two numbers alternating
assert run("6\n1 2 1 2 1 2\n") == "6"

# Large array with single dominant number
assert run("10\n1 1 1 1 2 2 3 3 4 4\n") == "6"

# Minimum input
assert run("1\n1\n") == "0"

# Two numbers with exact frequency
assert run("8\n1 1 2 2 3 3 4 4\n") == "8"

| Test input | Expected output | What it validates |