CF 1196D1 - RGB Substring (easy version)

Rating: 1500
Tags: implementation
Solve time: 5m 8s
Verified: yes

Solution

Problem Understanding

We are given a string made only of three characters, R, G, and B. From this string, we want to find a contiguous segment of fixed length k that is as close as possible to the repeating pattern RGBRGBRGB.... We are allowed to modify characters in the original string, and each modification changes one character into another among R, G, and B. The goal is to make at least one length-k substring perfectly match some segment of the infinite periodic pattern, while changing as few characters as possible in the original string.

What matters is not the whole string, but whether there exists a window of length k that can be turned into a perfectly aligned slice of the repeating cycle. Any valid target substring is fully determined once we choose its starting phase in the cycle. There are only three possibilities for that phase: starting at R, starting at G, or starting at B.

The constraints are small: each query has n ≤ 2000, and the total sum of n across queries is also at most 2000. This immediately rules out any cubic or worse solution per query, but allows an O(n^2) approach comfortably. Even a constant-factor multiple of three patterns is fine.

A naive mistake is to assume the target pattern is fixed. For example, if we always compare against RGBRGB... starting from R, we miss cases where the optimal alignment starts at G or B. Another subtle issue is assuming we must modify the entire string, rather than only a single length-k window. The optimal solution depends on choosing the best window position as well as the best phase.

A small illustrative failure case for naive alignment:

Input:

n = 3, k = 3
s = "GBR"

If we only compare to RGB, we need two changes. But if we align with the pattern starting at G (GBRGBR...), the substring already matches perfectly, so the answer is 0. Any solution fixing only one phase would incorrectly overestimate.

Approaches

The brute-force idea is straightforward: for every substring of length k, and for each of the three possible starting phases of the infinite pattern, count how many characters must be changed to match that pattern exactly. For each window, we compute mismatches against RGB..., GBR..., and BRG..., and take the minimum.

For a fixed window, computing mismatch cost takes O(k). There are O(n) windows, and 3 patterns, so total complexity becomes O(3nk) per query, which in the worst case is O(n^3). With n = 2000, this is far too slow.

The key observation is that the target pattern is completely deterministic once we choose a phase. Instead of recomputing each window independently, we can precompute mismatch contributions for each position and each phase. Then every length-k window becomes a simple sum over a sliding range.

We convert the problem into evaluating, for each phase p ∈ {0,1,2}, the mismatch array where position i expects character (RGB)[(i + p) mod 3]. Then we compute prefix sums of mismatches for each phase. Each window cost becomes a constant-time range query. This reduces the problem to O(3n) per query.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	O(n³)	O(1)	Too slow
Prefix mismatch + sliding window	O(n)	O(n)	Accepted

Algorithm Walkthrough

We treat R, G, B as indices 0, 1, 2 in a cycle.

Convert each character in the string into its index form so comparisons are numeric. This simplifies cyclic pattern checks.
For each phase p in {0, 1, 2}, construct a mismatch array over the whole string. At position i, the expected character is (i + p) % 3. If it differs from the actual character, we mark a cost of 1; otherwise 0. This transforms pattern matching into a numerical cost problem.
Build a prefix sum array over this mismatch array for each phase. The prefix sum allows us to compute mismatch counts for any substring in O(1), which is crucial because we will test many overlapping windows.
Slide a window of length k across the string. For each starting position i, compute the cost of making s[i:i+k] match each of the three phases using prefix sums. Take the minimum across all phases.
Track the minimum value across all windows and output it.

The reason prefix sums matter is that adjacent windows overlap heavily, and recomputing from scratch would repeatedly redo identical work. Prefix sums compress that repeated work into reusable partial sums.

Why it works

Every valid target substring is uniquely determined by two choices: its starting position in the string and its phase in the repeating cycle. The algorithm evaluates all such combinations implicitly. The mismatch arrays encode exact transformation cost for each position under each phase. Prefix sums preserve correctness of range aggregation, so each window cost is computed exactly once per phase without approximation or omission.

Because every possible candidate substring is evaluated, and each cost is computed exactly, the minimum over all computed values is the true global minimum.

Python Solution

import sys
input = sys.stdin.readline

def solve():
    q = int(input())
    mp = {'R': 0, 'G': 1, 'B': 2}
    pattern = [0, 1, 2]

    for _ in range(q):
        n, k = map(int, input().split())
        s = input().strip()
        a = [mp[c] for c in s]

        best = 10**9

        for p in range(3):
            pref = [0] * (n + 1)

            for i in range(n):
                expected = (i + p) % 3
                pref[i + 1] = pref[i] + (a[i] != expected)

            for i in range(n - k + 1):
                cost = pref[i + k] - pref[i]
                if cost < best:
                    best = cost

        print(best)

if __name__ == "__main__":
    solve()

The solution first encodes characters into integers so cyclic comparisons become arithmetic modulo 3. For each phase, it builds a prefix sum where each entry counts mismatches up to that index. Then every window cost is computed as a difference of two prefix values, which avoids recomputing the entire substring each time. The outer loop over phases is constant size, so it does not affect asymptotic complexity.

A common implementation pitfall is forgetting that each phase needs its own prefix array. Reusing one array would mix incompatible mismatch definitions and produce incorrect results.

Worked Examples

Example 1

Input:

s = BGGGG, k = 2

We evaluate phase 0 (RGB...), phase 1 (GBR...), phase 2 (BRG...). Consider phase 0 mismatch construction:

i	s[i]	expected	mismatch	prefix
0	B	R	1	0,1
1	G	G	0	1,1
2	G	B	1	1,2
3	G	R	1	2,3
4	G	G	0	3,3

For window [0,2) cost = 1, [1,3) cost = 1, [2,4) cost = 2, [3,5) invalid.

Minimum over all phases gives answer 1.

This trace shows how overlapping windows reuse prefix structure instead of recomputing mismatches repeatedly.

Example 2

Input:

s = RBRGR, k = 3

For phase 1 (GBR...), the alignment matches perfectly on substring BRG.

window	cost phase 0	cost phase 1	cost phase 2	best
0-2	2	1	2	1
1-3	1	0	2	0
2-4	2	1	1	1

The best window is [1,4) with cost 0. This confirms that optimal solution depends both on position and phase.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n) per query	Three linear passes over string plus sliding window evaluation
Space	O(n)	Prefix arrays for mismatch counts

The total sum of n across queries is at most 2000, so the algorithm easily fits within limits even with the constant factor of three phase checks.

Test Cases

import sys, io

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    output = io.StringIO()
    sys.stdout = output

    input = sys.stdin.readline

    q = int(input())
    mp = {'R': 0, 'G': 1, 'B': 2}

    for _ in range(q):
        n, k = map(int, input().split())
        s = input().strip()
        a = [mp[c] for c in s]

        best = 10**9

        for p in range(3):
            pref = [0] * (n + 1)
            for i in range(n):
                expected = (i + p) % 3
                pref[i + 1] = pref[i] + (a[i] != expected)

            for i in range(n - k + 1):
                best = min(best, pref[i + k] - pref[i])

        print(best)

    return output.getvalue().strip()

# provided samples
assert run("3\n5 2\nBGGGG\n5 3\nRBRGR\n5 5\nBBBRR") == "1\n0\n3"

# custom cases
assert run("1\n1 1\nR") == "0"
assert run("1\n3 3\nRGB") == "0"
assert run("1\n4 2\nBBBB") == "2"
assert run("1\n6 4\nRGBRGB") == "0"

Test input	Expected output	What it validates
1 char match	0	minimal edge case
perfect cycle	0	fully aligned pattern
all same chars	2	worst mismatch density
repeated cycle	0	periodic correctness

Edge Cases

For the single-character case, such as s = "R", k = 1, the algorithm computes mismatches for each phase. In phase 0, expected is R so cost is 0, and the prefix subtraction over a single element window returns 0 correctly. Other phases yield cost 1, so the minimum is correctly 0.

For a fully uniform string like BBBBBB with k = 3, every window is evaluated against all three phases. The mismatch arrays correctly count differences per position, and the prefix differences ensure each window cost reflects exactly the number of mismatched characters. Even though all windows overlap heavily, each is evaluated independently through prefix sums, preserving correctness without double counting.