CF 1196D1 - RGB Substring (easy version)
We are given a string made only of three characters, R, G, and B. From this string, we want to find a contiguous segment of fixed length k that is as close as possible to the repeating pattern RGBRGBRGB....
CF 1196D1 - RGB Substring (easy version)
Rating: 1500
Tags: implementation
Solve time: 5m 8s
Verified: yes
Solution
Problem Understanding
We are given a string made only of three characters, R, G, and B. From this string, we want to find a contiguous segment of fixed length k that is as close as possible to the repeating pattern RGBRGBRGB.... We are allowed to modify characters in the original string, and each modification changes one character into another among R, G, and B. The goal is to make at least one length-k substring perfectly match some segment of the infinite periodic pattern, while changing as few characters as possible in the original string.
What matters is not the whole string, but whether there exists a window of length k that can be turned into a perfectly aligned slice of the repeating cycle. Any valid target substring is fully determined once we choose its starting phase in the cycle. There are only three possibilities for that phase: starting at R, starting at G, or starting at B.
The constraints are small: each query has n ≤ 2000, and the total sum of n across queries is also at most 2000. This immediately rules out any cubic or worse solution per query, but allows an O(n^2) approach comfortably. Even a constant-factor multiple of three patterns is fine.
A naive mistake is to assume the target pattern is fixed. For example, if we always compare against RGBRGB... starting from R, we miss cases where the optimal alignment starts at G or B. Another subtle issue is assuming we must modify the entire string, rather than only a single length-k window. The optimal solution depends on choosing the best window position as well as the best phase.
A small illustrative failure case for naive alignment:
Input:
n = 3, k = 3
s = "GBR"
If we only compare to RGB, we need two changes. But if we align with the pattern starting at G (GBRGBR...), the substring already matches perfectly, so the answer is 0. Any solution fixing only one phase would incorrectly overestimate.
Approaches
The brute-force idea is straightforward: for every substring of length k, and for each of the three possible starting phases of the infinite pattern, count how many characters must be changed to match that pattern exactly. For each window, we compute mismatches against RGB..., GBR..., and BRG..., and take the minimum.
For a fixed window, computing mismatch cost takes O(k). There are O(n) windows, and 3 patterns, so total complexity becomes O(3nk) per query, which in the worst case is O(n^3). With n = 2000, this is far too slow.
The key observation is that the target pattern is completely deterministic once we choose a phase. Instead of recomputing each window independently, we can precompute mismatch contributions for each position and each phase. Then every length-k window becomes a simple sum over a sliding range.
We convert the problem into evaluating, for each phase p ∈ {0,1,2}, the mismatch array where position i expects character (RGB)[(i + p) mod 3]. Then we compute prefix sums of mismatches for each phase. Each window cost becomes a constant-time range query. This reduces the problem to O(3n) per query.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O(n³) | O(1) | Too slow |
| Prefix mismatch + sliding window | O(n) | O(n) | Accepted |
Algorithm Walkthrough
We treat R, G, B as indices 0, 1, 2 in a cycle.
- Convert each character in the string into its index form so comparisons are numeric. This simplifies cyclic pattern checks.
- For each phase
pin{0, 1, 2}, construct a mismatch array over the whole string. At positioni, the expected character is(i + p) % 3. If it differs from the actual character, we mark a cost of 1; otherwise 0. This transforms pattern matching into a numerical cost problem. - Build a prefix sum array over this mismatch array for each phase. The prefix sum allows us to compute mismatch counts for any substring in O(1), which is crucial because we will test many overlapping windows.
- Slide a window of length
kacross the string. For each starting positioni, compute the cost of makings[i:i+k]match each of the three phases using prefix sums. Take the minimum across all phases. - Track the minimum value across all windows and output it.
The reason prefix sums matter is that adjacent windows overlap heavily, and recomputing from scratch would repeatedly redo identical work. Prefix sums compress that repeated work into reusable partial sums.
Why it works
Every valid target substring is uniquely determined by two choices: its starting position in the string and its phase in the repeating cycle. The algorithm evaluates all such combinations implicitly. The mismatch arrays encode exact transformation cost for each position under each phase. Prefix sums preserve correctness of range aggregation, so each window cost is computed exactly once per phase without approximation or omission.
Because every possible candidate substring is evaluated, and each cost is computed exactly, the minimum over all computed values is the true global minimum.
Python Solution
import sys
input = sys.stdin.readline
def solve():
q = int(input())
mp = {'R': 0, 'G': 1, 'B': 2}
pattern = [0, 1, 2]
for _ in range(q):
n, k = map(int, input().split())
s = input().strip()
a = [mp[c] for c in s]
best = 10**9
for p in range(3):
pref = [0] * (n + 1)
for i in range(n):
expected = (i + p) % 3
pref[i + 1] = pref[i] + (a[i] != expected)
for i in range(n - k + 1):
cost = pref[i + k] - pref[i]
if cost < best:
best = cost
print(best)
if __name__ == "__main__":
solve()
The solution first encodes characters into integers so cyclic comparisons become arithmetic modulo 3. For each phase, it builds a prefix sum where each entry counts mismatches up to that index. Then every window cost is computed as a difference of two prefix values, which avoids recomputing the entire substring each time. The outer loop over phases is constant size, so it does not affect asymptotic complexity.
A common implementation pitfall is forgetting that each phase needs its own prefix array. Reusing one array would mix incompatible mismatch definitions and produce incorrect results.
Worked Examples
Example 1
Input:
s = BGGGG, k = 2
We evaluate phase 0 (RGB...), phase 1 (GBR...), phase 2 (BRG...). Consider phase 0 mismatch construction:
| i | s[i] | expected | mismatch | prefix |
|---|---|---|---|---|
| 0 | B | R | 1 | 0,1 |
| 1 | G | G | 0 | 1,1 |
| 2 | G | B | 1 | 1,2 |
| 3 | G | R | 1 | 2,3 |
| 4 | G | G | 0 | 3,3 |
For window [0,2) cost = 1, [1,3) cost = 1, [2,4) cost = 2, [3,5) invalid.
Minimum over all phases gives answer 1.
This trace shows how overlapping windows reuse prefix structure instead of recomputing mismatches repeatedly.
Example 2
Input:
s = RBRGR, k = 3
For phase 1 (GBR...), the alignment matches perfectly on substring BRG.
| window | cost phase 0 | cost phase 1 | cost phase 2 | best |
|---|---|---|---|---|
| 0-2 | 2 | 1 | 2 | 1 |
| 1-3 | 1 | 0 | 2 | 0 |
| 2-4 | 2 | 1 | 1 | 1 |
The best window is [1,4) with cost 0. This confirms that optimal solution depends both on position and phase.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) per query | Three linear passes over string plus sliding window evaluation |
| Space | O(n) | Prefix arrays for mismatch counts |
The total sum of n across queries is at most 2000, so the algorithm easily fits within limits even with the constant factor of three phase checks.
Test Cases
import sys, io
def run(inp: str) -> str:
sys.stdin = io.StringIO(inp)
output = io.StringIO()
sys.stdout = output
input = sys.stdin.readline
q = int(input())
mp = {'R': 0, 'G': 1, 'B': 2}
for _ in range(q):
n, k = map(int, input().split())
s = input().strip()
a = [mp[c] for c in s]
best = 10**9
for p in range(3):
pref = [0] * (n + 1)
for i in range(n):
expected = (i + p) % 3
pref[i + 1] = pref[i] + (a[i] != expected)
for i in range(n - k + 1):
best = min(best, pref[i + k] - pref[i])
print(best)
return output.getvalue().strip()
# provided samples
assert run("3\n5 2\nBGGGG\n5 3\nRBRGR\n5 5\nBBBRR") == "1\n0\n3"
# custom cases
assert run("1\n1 1\nR") == "0"
assert run("1\n3 3\nRGB") == "0"
assert run("1\n4 2\nBBBB") == "2"
assert run("1\n6 4\nRGBRGB") == "0"
| Test input | Expected output | What it validates |
|---|---|---|
| 1 char match | 0 | minimal edge case |
| perfect cycle | 0 | fully aligned pattern |
| all same chars | 2 | worst mismatch density |
| repeated cycle | 0 | periodic correctness |
Edge Cases
For the single-character case, such as s = "R", k = 1, the algorithm computes mismatches for each phase. In phase 0, expected is R so cost is 0, and the prefix subtraction over a single element window returns 0 correctly. Other phases yield cost 1, so the minimum is correctly 0.
For a fully uniform string like BBBBBB with k = 3, every window is evaluated against all three phases. The mismatch arrays correctly count differences per position, and the prefix differences ensure each window cost reflects exactly the number of mismatched characters. Even though all windows overlap heavily, each is evaluated independently through prefix sums, preserving correctness without double counting.