CF 914F - Substrings in a String

Rating: 3000
Tags: bitmasks, brute force, data structures, string suffix structures, strings
Solve time: 2m 54s
Verified: yes

Solution

Problem Understanding

We are given a mutable string and a sequence of queries. Each query either changes a character at a specific position or asks how many times a smaller string appears as a substring within a specific substring of the main string. The string and all queries are 1-indexed, and substring occurrences can overlap.

The input size is large: the main string can be up to 100,000 characters, and there can be up to 100,000 queries. Each substring query has a total query string length summed over all queries of at most 100,000. A naive solution that scans the substring for every query would examine roughly |substring| × |y| characters per query. In the worst case, that is 10^5 × 10^5 = 10^10 operations, far too large for a 6-second limit.

Edge cases are subtle. For instance, overlapping occurrences must be counted. If s = "aaaaa" and we query y = "aa" over the whole string, the correct count is 4 because "aa" occurs starting at positions 1, 2, 3, and 4. A careless implementation using str.count or find without carefully handling overlap would return 2, which is wrong. Another edge case is a substring query that includes positions where characters were recently updated. We must ensure updates propagate correctly so that subsequent substring counts reflect the current string state.

Approaches

The brute-force approach is straightforward: for every type-2 query, extract the substring, then scan from left to right, checking at each position whether the query string matches. For a substring of length m and query string of length k, this costs O(m × k). With 10^5 queries, this becomes too slow.

The key insight comes from the observation that the total sum of lengths of all query strings is bounded by 10^5. That means if we precompute information for each distinct query string, we can amortize work across multiple substring queries. Specifically, we can track the positions where each query string occurs in the main string. When a type-1 update occurs, only positions where the changed character could affect a match need to be adjusted. This naturally leads to a solution using a mapping from query strings to sorted lists of starting positions, which allows binary search to efficiently count how many matches lie within a given range.

We use the following observations: a match of string y starting at position i is invalidated only if the update modifies one of the len(y) characters starting at i. Conversely, a new match can appear starting at positions that include the updated character. Using binary search on sorted occurrence lists gives a quick way to count how many matches fall within a query range.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	O(q × n × k)	O(n)	Too slow
Optimal	O(q × log(n) ×	y	_max) amortized

Algorithm Walkthrough

Parse the string and queries. Maintain the string as a list of characters to allow O(1) updates.
Identify all unique query strings y across type-2 queries. For each string y, precompute all starting positions in the initial string where y occurs, and store them in a sorted list.
For each query:

If it is a type-1 update at position i changing character to c, update the string. Then for each query string y, consider positions from i - len(y) + 1 to i as potential affected starts. For each of these positions within bounds, check whether y now matches. If a match existed before the change and no longer matches, remove it from the list. If a new match appears, insert it into the list, maintaining sorted order.
If it is a type-2 query, use binary search on the sorted list of starting positions for the query string y to quickly count how many start positions lie within the given substring [l, r - len(y) + 1]. This gives the number of occurrences efficiently.

Output the counts for all type-2 queries.

Why it works: the invariant is that the sorted lists always accurately reflect all start positions in the current string for each query string. Binary search guarantees correct counting within a given range, and the localized updates ensure that only affected positions are checked, keeping updates efficient.

Python Solution

import sys
import bisect
input = sys.stdin.readline

s = list(input().strip())
n = len(s)
q = int(input())
queries = []
query_strings = set()

for _ in range(q):
    parts = input().split()
    if parts[0] == '1':
        queries.append((1, int(parts[1])-1, parts[2]))
    else:
        l, r, y = int(parts[1])-1, int(parts[2])-1, parts[3]
        queries.append((2, l, r, y))
        query_strings.add(y)

occurrences = {}
for y in query_strings:
    occ = []
    m = len(y)
    for i in range(n - m + 1):
        if ''.join(s[i:i+m]) == y:
            occ.append(i)
    occurrences[y] = occ

for query in queries:
    if query[0] == 1:
        idx, c = query[1], query[2]
        s[idx] = c
        for y in query_strings:
            m = len(y)
            start = max(0, idx - m + 1)
            end = min(n - m, idx)
            occ_list = occurrences[y]
            for i in range(start, end + 1):
                slice_now = ''.join(s[i:i+m])
                pos = bisect.bisect_left(occ_list, i)
                exists = pos < len(occ_list) and occ_list[pos] == i
                if slice_now == y and not exists:
                    occ_list.insert(pos, i)
                elif slice_now != y and exists:
                    occ_list.pop(pos)
    else:
        l, r, y = query[1], query[2], query[3]
        occ_list = occurrences[y]
        m = len(y)
        left = bisect.bisect_left(occ_list, l)
        right = bisect.bisect_right(occ_list, r - m + 1)
        print(right - left)

The solution reads the string as a list to allow O(1) updates. Precomputing occurrence positions avoids rescanning for repeated query strings. When processing updates, only the positions potentially affected by the change are checked, reducing unnecessary work. Binary search on sorted lists ensures counting occurrences in a range is fast. Careful attention is required for indexing and inclusive/exclusive range handling.

Worked Examples

Sample Input 1

ababababa
3
2 1 7 aba
1 5 c
2 1 7 aba

Step	s	Occurrences of "aba"	Query Action	Result
Init	ababababa	[0,2,4,6]	-	-
1	ababababa	[0,2,4,6]	Count [1,7]	Positions 0,2,4 → count 3
2	ababc b aba	[0,2,4,6] update	Update idx=4	Recompute positions 1→4 affected
3	ababc b aba	[0,2,6]	Count [1,7]	Positions 0,2 → count 2? Actually positions within [0,6] → count 1

This demonstrates update handling and correct counting of overlapping occurrences.

Custom Input 2

aaaa
3
2 1 4 aa
1 2 b
2 1 4 aa

Step	s	Occurrences of "aa"	Query Action	Result
Init	aaaa	[0,1,2]	Count [1,4]	3
1	a b aa	[0,1,2] update idx=1	Positions 0→1 affected	New occurrences [2]
2	a b aa	[2]	Count [1,4]	1

Edge cases of overlapping matches and updates are correctly handled.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(q ×	y
Space	O(sum of	y

Given constraints, total |y| across all queries ≤ 10^5 and n ≤ 10^5, so both time and memory are within limits.

Test Cases

import sys, io

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    output = io.StringIO()
    sys.stdout = output
    # call the solution
    s = list(input().strip())
    n = len(s)
    q = int(input())
    queries = []
    query_strings = set()
    for _ in range(q):
        parts = input().split