CF 1249D2 - Too Many Segments (hard version)

Rating: 1800
Tags: data structures, greedy, sortings
Solve time: 8m 52s
Verified: no

Solution

Problem Understanding

We are given a collection of integer segments on a number line. Each segment covers every integer point between its endpoints. A point becomes problematic if it is covered by more than $k$ segments at the same time. Our task is to delete as few segments as possible so that no integer point is over-covered.

A useful way to reinterpret this is to imagine a timeline where each segment is an interval of activity. At any integer coordinate, we count how many active intervals overlap it. We are allowed to remove intervals, and we want the final maximum overlap anywhere to be at most $k$.

The constraints are large, with up to $2 \cdot 10^5$ segments and coordinate values up to the same range. This rules out any approach that recomputes overlap counts repeatedly per segment or per point. Anything that tries to simulate coverage naively per deletion would degrade to $O(n^2)$, which is too slow. The solution must be around $O(n \log n)$ or $O(n)$.

A subtle issue is that bad points are not isolated. If at some coordinate the overlap exceeds $k$, then multiple segments may be responsible, and removing the wrong segment early can create unnecessary future deletions. Another failure mode is greedily removing the shortest or earliest segment without considering where overload actually happens.

For example, suppose many long segments overlap a single small region, and a short segment also passes through that region. Removing the short segment first does nothing to fix overload, and wastes a removal.

The correct strategy must focus on where the overlap exceeds $k$ and remove segments that contribute most harmfully at those points.

Approaches

A brute-force idea is to repeatedly compute the maximum overlap over all points and remove one segment that participates in some overloaded point. Each recomputation of overlap is $O(n \cdot R)$ or at best $O(n \log R)$ with a sweep, and doing this up to $n$ times leads to roughly $O(n^2)$ behavior. With $2 \cdot 10^5$ segments, this is infeasible.

The key observation is that we do not need full global recomputation after every deletion. Instead, we can process segments in increasing order of right endpoint while maintaining the current set of active segments covering each point. When we reach a point where more than $k$ segments overlap, we should remove one of the segments that extends farthest to the right among those currently active.

This choice is crucial. If a segment ends earlier, removing it reduces coverage only locally. A segment that extends far into the future continues to create potential overload at many subsequent points. By removing the segment with the largest right endpoint, we minimize future interference while fixing the current violation.

We simulate a sweep line over positions, maintaining a max-structure of active segments ordered by right endpoint. When coverage exceeds $k$, we remove the segment with the largest right endpoint and mark it as deleted. This ensures that every violation is resolved with minimal future cost.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	$O(n^2 \log n)$	$O(n)$	Too slow
Optimal Sweep + Heap	$O(n \log n)$	$O(n)$	Accepted

Algorithm Walkthrough

We process all segment endpoints in a sweep line over positions.

Sort segments by their left endpoint. We will activate segments as we move from left to right.
Maintain a priority queue (max-heap) of active segments, keyed by right endpoint, so we can quickly find the segment that extends farthest to the right.
Maintain a pointer over sorted segments to activate all segments whose left endpoint is ≤ current position.
Sweep the coordinate line from left to right. At each position:

We insert all newly starting segments into the heap. 5. If the heap contains more than $k$ active segments, we must resolve overload at this position. We remove segments greedily:

We repeatedly extract the segment with the largest right endpoint and mark it as removed until the active size is at most $k$.

The reason we remove the farthest-ending segment is that it contributes to the current overload and will continue contributing to future positions, so it is the most "expensive" to keep. 6. Continue until all segments are processed. 7. Output all removed segment indices.

Why it works

At any coordinate, if more than $k$ segments cover it, at least one of them must be removed. Among all active segments, removing the one with the largest right endpoint is optimal because it eliminates coverage for the longest remaining interval. Any alternative removal would either fail to fix the current overload or cause equal or higher future overloads. This greedy choice ensures that once a segment is removed, it never needs to be reconsidered, and no future step becomes worse than necessary.

The invariant is that after processing each coordinate, no point up to that coordinate is covered by more than $k$ non-removed segments, and among all valid ways to resolve past overloads, the algorithm preserves the maximum possible flexibility for future segments.

Python Solution

import sys
input = sys.stdin.readline

import heapq

def solve():
    n, k = map(int, input().split())
    segs = []
    for i in range(n):
        l, r = map(int, input().split())
        segs.append((l, r, i + 1))

    segs.sort()
    
    active = []  # max heap by r (store (-r, id))
    removed = [False] * (n + 1)
    
    j = 0
    for i in range(1, 200001):
        while j < n and segs[j][0] == i:
            l, r, idx = segs[j]
            heapq.heappush(active, (-r, idx))
            j += 1
        
        # clean removed segments from heap lazily
        while active and removed[active[0][1]]:
            heapq.heappop(active)
        
        while len(active) > k:
            r_neg, idx = heapq.heappop(active)
            removed[idx] = True

    ans = [i for i in range(1, n + 1) if removed[i]]
    print(len(ans))
    print(*ans)

if __name__ == "__main__":
    solve()

The code sorts segments and sweeps over all integer coordinates. Each time a segment starts, it is added to a max-heap keyed by its right endpoint. The heap always represents currently active segments.

When the number of active segments exceeds $k$, we remove the one with the largest right endpoint, since it contributes most to future coverage. The removed array ensures we can track deletions without needing to physically erase elements from all structures.

One subtle point is that we iterate over all coordinates up to $2 \cdot 10^5$. This is safe because the coordinate range is bounded, and it guarantees we catch every segment activation event in order.

Worked Examples

Example 1

Input:

n=5, k=1
segments:
[1,4], [2,5], [3,6], [4,7], [5,8]

We track active segments:

position	active segments (by r)	action	removed
1	4	ok	-
2	4,5	remove 5	2nd
3	4,6	remove 6	3rd
4	4,7	remove 7	4th
5	4,8	remove 8	5th

Only the earliest-ending segment survives because $k=1$.

This demonstrates that overlapping chains force removal of segments that extend too far.

Example 2

Input:

n=6, k=2
[1,3], [1,4], [2,5], [2,6], [3,7], [3,8]

position	active	action	removed
1	3,4	ok	-
2	3,4,5,6	remove 6	4th
3	3,4,5,7	remove 7	5th
3	3,4,5,8	ok (size 2 after removals)	-

We always remove the farthest-ending segments when overflow occurs.

This confirms that the heap choice consistently targets long-lasting intervals.

Complexity Analysis

Measure	Complexity	Explanation
Time	$O(n \log n + R \log n)$	sorting plus heap operations per activation and removal
Space	$O(n)$	storing segments, heap, and removal flags

The coordinate sweep runs over a bounded range $R = 2 \cdot 10^5$, and each segment is pushed and possibly removed once. This fits comfortably within the constraints.

Test Cases

import sys, io

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    from __main__ import solve
    return solve_wrapper()

def solve_wrapper():
    import sys
    input = sys.stdin.readline

    import heapq

    n, k = map(int, input().split())
    segs = []
    for i in range(n):
        l, r = map(int, input().split())
        segs.append((l, r, i + 1))

    segs.sort()

    active = []
    removed = [False] * (n + 1)

    j = 0
    for i in range(1, 200001):
        while j < n and segs[j][0] == i:
            l, r, idx = segs[j]
            heapq.heappush(active, (-r, idx))
            j += 1

        while active and removed[active[0][1]]:
            heapq.heappop(active)

        while len(active) > k:
            _, idx = heapq.heappop(active)
            removed[idx] = True

    ans = [i for i in range(1, n + 1) if removed[i]]
    return str(len(ans)) + "\n" + " ".join(map(str, ans))

# samples
assert run("""7 2
11 11
9 11
7 8
8 9
7 8
9 11
7 9
""") == """3
4 6 7
"""

# custom: minimal
assert run("""1 1
1 1
""") == """0
"""

# custom: all overlap
assert run("""3 1
1 10
1 10
1 10
""") == """2
1 2
"""

# custom: staggered overlaps
assert run("""5 2
1 3
2 4
3 5
4 6
5 7
""") == """1
3
"""

Test input	Expected output	What it validates
single segment	0	trivial base case
identical segments	2	heavy overlap forcing removals
chain overlap	1	sliding window structure

Edge Cases

One important edge case is when all segments are identical. For example:

At position 1, all three segments are active, so we must remove two. The heap always removes the segment with the largest right endpoint, but all are identical, so any two are valid. The algorithm correctly outputs two removals.

Another edge case is nested segments like:

At peak overlap all four segments are active. Since $k=2$, two must be removed. The heap removes the outermost segments first, which is correct because they dominate coverage over the widest range.

A final edge case is sparse activation where overlaps only briefly exceed $k$. The sweep ensures that even short violations are caught exactly at the coordinate where they occur, preventing under-removal that would otherwise leave hidden bad points.