CF 1196D2 - RGB Substring (hard version)
We are given a string made only of the letters R, G, and B. From this string we want to pick a contiguous block of fixed length k and modify characters so that this block matches some segment of an infinite repeating pattern "RGBRGBRGB...".
CF 1196D2 - RGB Substring (hard version)
Rating: 1600
Tags: data structures, dp, implementation, two pointers
Solve time: 3m 41s
Verified: yes
Solution
Problem Understanding
We are given a string made only of the letters R, G, and B. From this string we want to pick a contiguous block of fixed length k and modify characters so that this block matches some segment of an infinite repeating pattern "RGBRGBRGB...".
The key freedom is that we are allowed to choose any length-k window inside the string, and we are also allowed to choose any alignment of the periodic pattern as the target. Each alignment corresponds to one of three cyclic shifts of "RGB". Once the target pattern is fixed, we count how many positions in the chosen window already match it, and everything else must be changed.
So the task reduces to finding a window of length k and a phase of the pattern that minimizes mismatches.
The constraints are large: the total length across all queries is up to 200000. That immediately rules out recomputing mismatches independently for every window in a naive way. Any solution that tries all substrings and recomputes comparisons from scratch would behave like O(nk), which would be far too slow when both are large.
A subtle edge case appears when k equals n. In this case there is only one window, but all three phase alignments still need to be considered. Another corner case is when the string is already perfectly aligned with the pattern but starts at a different phase than the one we assume, so an algorithm that fixes the phase incorrectly could miss the optimal solution.
Approaches
A direct approach is to slide a window of length k across the string. For each window, we try all three possible alignments of the repeating pattern. For a fixed window and fixed phase, we compare each character to the expected one and count mismatches.
This works correctly because the answer is exactly the minimum mismatch over all choices. However, each window costs O(k), and there are O(n) windows, giving O(nk) per query. In the worst case k is proportional to n, so this becomes quadratic per query, which is not feasible under the global constraint.
The key observation is that the target pattern is periodic with period 3. This means that for any starting position, the expected character depends only on index modulo 3 plus a phase shift. Instead of recomputing comparisons for each window, we can precompute mismatch arrays for each of the three phases across the entire string. Then each window query becomes a range sum query on these arrays.
Concretely, for each phase p in {0, 1, 2}, we build an array where position i stores whether s[i] differs from the expected character at that position under phase p. Prefix sums over these arrays allow constant-time computation of mismatches in any window. We then slide the window once and take the minimum over all positions and phases.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O(nk) per query | O(1) extra | Too slow |
| Prefix mismatch per phase | O(n) per query | O(n) | Accepted |
Algorithm Walkthrough
- For each of the three possible starting phases of the RGB cycle, construct a mismatch array over the string. At index i, this array stores 1 if s[i] does not match the expected character under that phase, otherwise 0. The reason for separating phases is that the pattern alignment is the only ambiguity in the target string.
- Convert each mismatch array into a prefix sum array so that we can compute the number of mismatches in any interval [l, r] in O(1). This avoids recomputing character comparisons for every window.
- Slide a window of length k over all possible starting positions in the string. For each starting index, compute the mismatch count in that window for each of the three phases using the prefix sums.
- Keep a running minimum over all windows and all phases. This value represents the smallest number of changes required to make some substring match a valid RGB-aligned pattern segment.
- Output the minimum after processing all windows.
Why it works: each possible valid solution is uniquely defined by a choice of window and a choice of phase. The algorithm explicitly evaluates every such pair, but compresses each evaluation into O(1) using prefix sums. Since mismatch counting is exact for each configuration, the global minimum over all configurations is guaranteed to be found.
Python Solution
import sys
input = sys.stdin.readline
def solve():
q = int(input())
base = "RGB"
for _ in range(q):
n, k = map(int, input().split())
s = input().strip()
# prefix sums for 3 phases
pref = [[0] * (n + 1) for _ in range(3)]
for p in range(3):
for i, ch in enumerate(s):
expected = base[(i + p) % 3]
mismatch = 1 if ch != expected else 0
pref[p][i + 1] = pref[p][i] + mismatch
ans = k # worst case: all mismatched
for l in range(0, n - k + 1):
r = l + k
for p in range(3):
cost = pref[p][r] - pref[p][l]
if cost < ans:
ans = cost
print(ans)
if __name__ == "__main__":
solve()
The implementation builds three prefix arrays, each encoding mismatches against one cyclic shift of "RGB". The important detail is the indexing: expected characters are computed using (i + p) % 3 so that each phase corresponds to a different alignment of the pattern.
Window costs are then extracted in O(1) using prefix differences. The double loop over windows and phases is acceptable because it only performs constant-time operations per state.
Worked Examples
Example 1
Input:
5 2
BGGGG
We compute mismatches for each phase and then slide windows of length 2.
| l | window | phase 0 mismatches | phase 1 mismatches | phase 2 mismatches | best |
|---|---|---|---|---|---|
| 0 | BG | 2 | 1 | 1 | 1 |
| 1 | GG | 2 | 1 | 2 | 1 |
| 2 | GG | 2 | 1 | 2 | 1 |
| 3 | GG | 2 | 1 | 2 | 1 |
The minimum over all is 1.
This shows that even though no window perfectly matches a full alignment, the best achievable correction cost is captured by trying all phases.
Example 2
Input:
5 3
RBRGR
| l | window | phase 0 | phase 1 | phase 2 | best |
|---|---|---|---|---|---|
| 0 | RBR | 0 | 3 | 3 | 0 |
| 1 | BRG | 0 | 3 | 3 | 0 |
| 2 | RGR | 1 | 2 | 2 | 1 |
The minimum is 0, achieved when the window already matches a valid RGB segment under an appropriate phase.
This confirms that the algorithm correctly captures perfect matches when alignment is favorable.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) per query | Each phase builds prefix sums in linear time, and each window is evaluated in O(1) |
| Space | O(n) per query | Three prefix arrays of size n |
The total sum of n across queries is bounded by 200000, so this linear approach comfortably fits within time limits. The memory usage is also linear in the current string size, which is safe under constraints.
Test Cases
import sys, io
def run(inp: str) -> str:
sys.stdin = io.StringIO(inp)
output = []
def input():
return sys.stdin.readline()
q = int(sys.stdin.readline())
base = "RGB"
for _ in range(q):
n, k = map(int, sys.stdin.readline().split())
s = sys.stdin.readline().strip()
pref = [[0] * (n + 1) for _ in range(3)]
for p in range(3):
for i, ch in enumerate(s):
expected = base[(i + p) % 3]
pref[p][i + 1] = pref[p][i] + (ch != expected)
ans = k
for l in range(n - k + 1):
for p in range(3):
ans = min(ans, pref[p][l + k] - pref[p][l])
output.append(str(ans))
return "\n".join(output)
# provided samples
assert run("3\n5 2\nBGGGG\n5 3\nRBRGR\n5 5\nBBBRR\n") == "1\n0\n3"
# all same char
assert run("1\n4 2\nBBBB\n") == "1"
# already perfect RGB
assert run("1\n3 3\nRGB\n") == "0"
# reverse pattern
assert run("1\n3 3\nGBR\n") == "0"
# k = n case
assert run("1\n5 5\nRGRGR\n") == "1"
| Test input | Expected output | What it validates |
|---|---|---|
| BBBB, k=2 | 1 | uniform mismatch handling |
| RGB, k=3 | 0 | exact alignment |
| GBR, k=3 | 0 | shifted alignment correctness |
| RGRGR, k=5 | 1 | full window boundary case |
Edge Cases
A first edge case is when the entire string already matches a shifted RGB pattern but not the default phase. For example, input "GBR" with k = 3. The algorithm correctly checks all three phases, so one of them yields zero mismatches. A solution that assumes a fixed starting phase would incorrectly report 3 changes.
Another case is when all characters are identical, such as "BBBBB". No alignment produces many matches, but sliding windows still matter. The prefix arrays ensure each window correctly accumulates mismatches, and the minimum window cost is still correctly identified.
Finally, when k equals 1, every position is its own window. The algorithm reduces to checking which single character is closest to one of R, G, or B in the correct phase position. The prefix approach naturally handles this without special logic, since each window is just a single prefix difference range.