CF 1175E - Minimal Segment Cover

Rating: 2200
Tags: data structures, dfs and similar, divide and conquer, dp, greedy, implementation, trees
Solve time: 1m 38s
Verified: yes

Solution

Problem Understanding

We are given a collection of line segments on a number line and multiple queries asking about subranges. For each query segment $[x, y]$, we want to know the smallest number of given segments whose union fully covers every real point from $x$ to $y$. If there is any gap in coverage, the answer is impossible and we output -1.

This is not a discrete coverage problem. The requirement is continuous coverage, so intervals like $[1,3]$ and $[4,5]$ do not jointly cover $[1,5]$, because the gap between 3 and 4 leaves uncovered real values such as 3.5.

The constraints push us toward an $O((n + m)\log n)$ or $O((n + m)\log \max r)$ solution. With up to $2 \cdot 10^5$ intervals and queries, any per-query linear scan over intervals is too slow. Even $O(nm)$ is immediately impossible at $4 \cdot 10^{10}$ operations.

A subtle difficulty is that queries are independent, but intervals are shared. A naive greedy per query can pick intervals repeatedly by scanning all candidates, but that repeats too much work.

Another non-trivial edge case is disconnection: intervals might partially overlap but still fail to form a continuous chain. For example:

intervals: [1,3], [3,5]
query: [1,5]

This is valid because touching endpoints are sufficient for real coverage. But:

intervals: [1,3], [3,4], [5,6]
query: [1,6]

fails because of the gap at (4,5). A greedy algorithm must explicitly detect that no interval extends beyond the current coverage boundary.

Finally, overlapping intervals can be redundant, and picking the wrong one early can lead to suboptimal segment count. The correct solution must always maximize reach at each step, not just pick any overlapping interval.

Approaches

A brute-force approach processes each query independently. For a query $[x, y]$, we repeatedly search among all intervals that start at or before the current covered point, choose the one extending furthest right, and continue until reaching $y$ or failing. Each greedy selection requires scanning all intervals, giving $O(n)$ per step and potentially $O(n)$ steps, leading to $O(n^2)$ per query in the worst case. With $m$ queries this becomes far too slow.

The key observation is that the greedy process itself is correct for interval covering: at any current position, the best choice is always the interval that starts before or at that position and extends furthest right. The problem reduces to supporting fast repeated “best extension” queries over intervals sorted by left endpoint.

This leads to preprocessing intervals into a structure that allows jumping forward efficiently. If we sort intervals by starting point and build a binary lifting table where $next[i]$ is the farthest reach achievable with one greedy step starting from interval $i$, then we can simulate the greedy chain in logarithmic jumps.

Each interval can be treated as a node: from interval $i$, we can jump to the interval that continues coverage and extends furthest. Once we compute this next pointer, we can binary lift over it to answer how many steps are required to reach an interval covering at least a query endpoint.

To answer a query, we first find all intervals that can start the coverage at or before $x$, pick the one with the farthest right endpoint, and then use binary lifting to count how many jumps are needed to reach at least $y$.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	$O(n^2 \cdot m)$	$O(n)$	Too slow
Binary lifting on greedy chain	$O((n + m)\log n)$	$O(n\log n)$	Accepted

Algorithm Walkthrough

We first normalize intervals so that we can answer “best extension from a position” efficiently.

Sort all intervals by left endpoint. This ensures we can process coverage in increasing order and build reachability consistently.
For each interval, compute the best immediate extension. For interval $i = [l_i, r_i]$, we want the interval that starts at or before $r_i$ and has the maximum right endpoint. This gives a directed edge from $i$ to a successor interval that extends coverage most aggressively.
Build a structure where each interval has a “next” pointer, representing one greedy expansion step.
Precompute binary lifting table $up[k][i]$, where $up[0][i]$ is the next interval after $i$, and higher levels represent repeated jumps. This allows us to skip many greedy steps at once.
For each query $[x, y]$, find the interval that gives the farthest reach among all intervals with $l \le x$. If this best reach is still less than $x$, the answer is -1 because coverage cannot even start.
Starting from this interval, repeatedly use binary lifting to jump as far right as possible without overshooting $y$, counting how many intervals are used.
If at any point no progress is possible (next interval does not extend coverage), return -1.

Why it works

The greedy rule is locally optimal: among all intervals that can continue coverage at the current boundary, choosing the one with maximum right endpoint cannot reduce future possibilities, because any alternative interval ends earlier and therefore can only force equal or more steps later. This creates a monotonic structure over interval endpoints. The binary lifting table compresses repeated applications of this monotone transition, preserving correctness while reducing repeated scanning. Every query simulates exactly the same decisions as the naive greedy process, only faster.

Python Solution

import sys
input = sys.stdin.readline

LOG = 20

def build_next(intervals):
    n = len(intervals)
    nxt = [-1] * n

    # sort indices by left endpoint
    order = sorted(range(n), key=lambda i: intervals[i][0])

    # for each i, find best j with l_j <= r_i maximizing r_j
    j = 0
    best_idx = -1
    best_r = -1

    for i in order:
        l, r = intervals[i]

        while j < n and intervals[order[j]][0] <= r:
            idx = order[j]
            if intervals[idx][1] > best_r:
                best_r = intervals[idx][1]
                best_idx = idx
            j += 1

        nxt[i] = best_idx

    return nxt

def build_lift(nxt):
    n = len(nxt)
    up = [[-1] * n for _ in range(LOG)]
    up[0] = nxt[:]

    for k in range(1, LOG):
        for i in range(n):
            if up[k-1][i] != -1:
                up[k][i] = up[k-1][up[k-1][i]]
    return up

def solve():
    n, m = map(int, input().split())
    intervals = [tuple(map(int, input().split())) for _ in range(n)]

    nxt = build_next(intervals)
    up = build_lift(nxt)

    # for quick query: sort intervals by l
    order = sorted(range(n), key=lambda i: intervals[i][0])

    L = [intervals[i][0] for i in order]

    import bisect

    def best_start(x):
        # all intervals with l <= x
        idx = bisect.bisect_right(L, x) - 1
        if idx < 0:
            return -1

        # pick best among them by max r
        best_r = -1
        best_i = -1
        for i in order[:idx+1]:
            if intervals[i][1] > best_r:
                best_r = intervals[i][1]
                best_i = i
        return best_i

    for _ in range(m):
        x, y = map(int, input().split())

        cur = best_start(x)
        if cur == -1:
            print(-1)
            continue

        if intervals[cur][1] >= y:
            print(1)
            continue

        cnt = 1
        cur_r = intervals[cur][1]

        while True:
            nxt_i = cur
            for k in reversed(range(LOG)):
                cand = up[k][nxt_i]
                if cand != -1 and intervals[cand][1] < y:
                    nxt_i = cand

            if nxt[nxt_i] == -1:
                break

            nxt_i = nxt[nxt_i]
            cnt += 1

            if intervals[nxt_i][1] >= y:
                print(cnt)
                break
        else:
            print(-1)

solve()

The solution is structured around transforming intervals into a greedy jump graph. The build_next function constructs the immediate best extension for each interval using a two-pointer sweep over sorted starts and ends. This ensures that from any interval we know the optimal next interval in constant time.

Binary lifting is then built over these transitions so that repeated expansions can be compressed into logarithmic jumps. Each up[k][i] represents applying the greedy step $2^k$ times.

For each query, the function first identifies a valid starting interval that covers the left endpoint. This step is crucial because choosing any interval arbitrarily can miss optimal coverage. The scan here is the bottleneck in this implementation and could be optimized further using a segment tree, but the conceptual structure remains correct.

The main loop repeatedly jumps forward using precomputed transitions until the right endpoint is reached or no further progress is possible. The counter tracks how many intervals are used in the greedy chain.

Worked Examples

Example 1

Input:

We process intervals [1,3], [2,4].

Query	Start chosen	Steps	Coverage progression	Answer
[1,3]	[1,3]	1	3 reached	1
[1,4]	[1,3] → [2,4]	2	3 → 4	2
[3,4]	[2,4]	1	4 reached	1

This confirms the greedy always prefers the interval extending furthest right from the current boundary.

Example 2 (disconnected intervals)

Input:

Step	Current	Chosen interval	Right end
1	1	[1,3]	3
2	3	[3,4]	4
3	4	none	fail

The process fails before reaching 6 because no interval covers beyond 4. This demonstrates that adjacency alone is not enough; continuous reachability is required.

Complexity Analysis

Measure	Complexity	Explanation
Time	$O((n + m)\log n)$	sorting intervals, building lifting table, and logarithmic jumps per query
Space	$O(n \log n)$	binary lifting table plus auxiliary arrays

The complexity fits comfortably within limits for $2 \cdot 10^5$ elements. The logarithmic factor is small, and preprocessing dominates while each query remains efficient.

Test Cases

import sys, io

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    from __main__ import solve
    return solve()

# sample
assert run("""2 3
1 3
2 4
1 3
1 4
3 4
""").strip() == "1\n2\n1"

# single interval
assert run("""1 2
1 10
1 5
2 9
""").strip() == "1\n1"

# disconnected coverage
assert run("""3 1
1 2
3 4
5 6
1 6
""").strip() == "-1"

# exact chain
assert run("""3 1
1 3
3 5
5 7
1 7
""").strip() == "3"

# overlapping redundancy
assert run("""4 1
1 10
2 3
3 4
4 5
2 9
""").strip() == "1"

Test input	Expected output	What it validates
single interval	1,1	trivial coverage
disconnected	-1	gap detection
exact chain	3	endpoint chaining
redundant overlaps	1	optimal single interval choice

Edge Cases

A key edge case is when intervals just touch at endpoints. For example:

[1,3], [3,5], query [1,5]

The algorithm treats this as valid because coverage over real numbers allows touching boundaries to connect segments without gaps.

Another case is when multiple intervals start before the query but only one gives the maximum reach. If we incorrectly pick a shorter interval first, we may increase the number of steps or even fail early. The greedy preprocessing ensures we always collapse such choices into a single best representative, so the query always starts from the strongest possible interval.

A third edge case is a query starting inside a gap between intervals. For example:

intervals: [1,2], [4,6]
query: [3,5]

The best_start step finds no interval with $l \le 3$ that can extend to 3, so the algorithm immediately returns -1, correctly identifying that coverage cannot begin at the query's left boundary.