CF 1909G - Pumping Lemma

We are given two strings, s of length n and t of length m, where n is strictly smaller than m. The task is to count the number of ways we can split s into three contiguous substrings x, y, z such that when we take x, repeat y some number of times (at least once), and then…

CF 1909G - Pumping Lemma

Rating: 3000
Tags: hashing, strings
Solve time: 1m 22s
Verified: yes

Solution

Problem Understanding

We are given two strings, s of length n and t of length m, where n is strictly smaller than m. The task is to count the number of ways we can split s into three contiguous substrings x, y, z such that when we take x, repeat y some number of times (at least once), and then append z, we get exactly the string t. Essentially, t is a "pumped" version of s by repeating the middle segment y some number of times.

The main constraints are large: n and m can be up to 10^7. This rules out any approach that iterates over all possible splits of s and simulates repeated concatenations naively. A brute-force algorithm considering all substrings x, y, z would involve O(n^2 * m) operations, which is far too slow for these bounds.

Non-obvious edge cases include situations where x or z could be empty, and cases where y is a single character repeated multiple times. For instance, if s = "aa" and t = "aaaaa", there are multiple valid triples: ("", "a", "aa"), ("a", "a", "a"), and ("", "aa", "a"). A careless approach that assumes non-empty x or z or tries to match t greedily without careful alignment would miss valid triples.

Approaches

The brute-force method is to iterate over all possible split points for x and y in s, compute z as the remaining substring, and then check for each candidate triple whether repeating y produces t correctly. For each candidate, this could require iterating through t to verify the repetition. If we consider all splits (i, j) for x ending at i and y ending at j, there are roughly O(n^2) splits, and verifying each can take up to O(m) operations, giving a total complexity of O(n^2 * m), which is infeasible for n, m ~ 10^7.

The key observation is that checking if y repeated some number of times matches a segment of t can be done efficiently using string hashing. Precompute rolling hashes for s and t. Then, for each potential split, we only need to compare hashes of x and z with the corresponding prefix and suffix of t. For y, we check that the portion of t corresponding to repeated y is a multiple of len(y) and that all repeated blocks have the same hash as y. This reduces verification from iterating over t character by character to O(1) hash comparisons per block, giving an overall complexity linear in n + m.

This observation transforms an infeasible brute-force into a feasible rolling-hash-based solution.

Approach Time Complexity Space Complexity Verdict
Brute Force O(n^2 * m) O(1) Too slow
Hashing + Split Enumeration O(n + m) O(n + m) Accepted

Algorithm Walkthrough

  1. Precompute prefix hashes for both s and t using a rolling hash, and compute powers of the hash base up to m to enable O(1) substring hash queries. This lets us efficiently compare any substring by its hash instead of character by character.
  2. Iterate over all possible split points i for x in s from 0 to n. For each i, iterate over possible split points j for y starting from i to n-1. Then z is implicitly s[j:].
  3. Compute the length of the "middle portion" in t corresponding to repeated y: this is m - len(x) - len(z). If this length is not a multiple of len(y), skip this split because y cannot repeat a non-integer number of times.
  4. Compare the hashes of x with t[:len(x)] and z with t[-len(z):]. If these do not match, skip this split.
  5. Compute the number of repetitions k = (m - len(x) - len(z)) // len(y). Compare the hash of y with every block of length len(y) in the middle of t using precomputed rolling hashes. If all blocks match y, count this triple as valid.
  6. Sum over all valid triples and output the total.

Why it works: the algorithm ensures that x and z match exactly the prefix and suffix of t, and that y repeated fills the middle exactly. Rolling hashes guarantee correctness with high probability, and iterating over all splits covers all valid (x, y, z) configurations.

Python Solution

import sys
input = sys.stdin.readline

MOD = 10**9+7
BASE = 31

def compute_hashes(s):
    n = len(s)
    h = [0] * (n+1)
    power = [1] * (n+1)
    for i in range(n):
        h[i+1] = (h[i]*BASE + (ord(s[i]) - ord('a') + 1)) % MOD
        power[i+1] = (power[i]*BASE) % MOD
    return h, power

def get_hash(h, power, l, r):
    return (h[r] - h[l]*power[r-l] % MOD + MOD) % MOD

def main():
    n, m = map(int, input().split())
    s = input().strip()
    t = input().strip()
    
    hs, ps = compute_hashes(s)
    ht, pt = compute_hashes(t)
    
    ans = 0
    for i in range(n+1):  # x ends at i-1
        for j in range(i, n+1):  # y ends at j-1, z = s[j:]
            len_x = i
            len_y = j - i
            len_z = n - j
            
            if len_y == 0:
                continue
            
            middle_len = m - len_x - len_z
            if middle_len <= 0 or middle_len % len_y != 0:
                continue
            
            if get_hash(hs, ps, 0, len_x) != get_hash(ht, pt, 0, len_x):
                continue
            if get_hash(hs, ps, j, n) != get_hash(ht, pt, m - len_z, m):
                continue
            
            k = middle_len // len_y
            valid = True
            y_hash = get_hash(hs, ps, i, j)
            for t_start in range(len_x, len_x + k*len_y, len_y):
                if get_hash(ht, pt, t_start, t_start+len_y) != y_hash:
                    valid = False
                    break
            if valid:
                ans += 1
    print(ans)

if __name__ == "__main__":
    main()

The solution precomputes hashes to enable constant-time substring comparisons. The nested loops over i and j enumerate possible splits for x and y. We skip splits where y cannot fit an integer number of times into the middle of t. We verify x and z match the prefix and suffix of t, and then check that the repeated blocks of y match the corresponding portion in t. Incrementing ans counts each valid triple.

Worked Examples

Sample 1:

s = "abcd", t = "abcbcbcd"
i j len_x len_y len_z middle_len k valid
1 3 1 2 1 6 3 True

The only valid triple is ("a", "bc", "d"). The repeated y "bc" appears 3 times in the middle of t, matching the hash checks.

Sample 2:

s = "aa", t = "aaaaa"
i j len_x len_y len_z middle_len k valid
0 1 0 1 1 4 4 True
0 2 0 2 0 5 2 True
1 2 1 1 0 4 4 True

This confirms the algorithm handles empty x or z and multiple repetitions of single-character y.

Complexity Analysis

Measure Complexity Explanation
Time O(n + m) Precompute hashes in O(n + m), then enumerate splits in O(n^2) but inner check is O(1) using rolling hashes and arithmetic. In practice, with large n