CF 1909G - Pumping Lemma
We are given two strings, s of length n and t of length m, where n is strictly smaller than m. The task is to count the number of ways we can split s into three contiguous substrings x, y, z such that when we take x, repeat y some number of times (at least once), and then…
Rating: 3000
Tags: hashing, strings
Solve time: 1m 22s
Verified: yes
Solution
Problem Understanding
We are given two strings, s of length n and t of length m, where n is strictly smaller than m. The task is to count the number of ways we can split s into three contiguous substrings x, y, z such that when we take x, repeat y some number of times (at least once), and then append z, we get exactly the string t. Essentially, t is a "pumped" version of s by repeating the middle segment y some number of times.
The main constraints are large: n and m can be up to 10^7. This rules out any approach that iterates over all possible splits of s and simulates repeated concatenations naively. A brute-force algorithm considering all substrings x, y, z would involve O(n^2 * m) operations, which is far too slow for these bounds.
Non-obvious edge cases include situations where x or z could be empty, and cases where y is a single character repeated multiple times. For instance, if s = "aa" and t = "aaaaa", there are multiple valid triples: ("", "a", "aa"), ("a", "a", "a"), and ("", "aa", "a"). A careless approach that assumes non-empty x or z or tries to match t greedily without careful alignment would miss valid triples.
Approaches
The brute-force method is to iterate over all possible split points for x and y in s, compute z as the remaining substring, and then check for each candidate triple whether repeating y produces t correctly. For each candidate, this could require iterating through t to verify the repetition. If we consider all splits (i, j) for x ending at i and y ending at j, there are roughly O(n^2) splits, and verifying each can take up to O(m) operations, giving a total complexity of O(n^2 * m), which is infeasible for n, m ~ 10^7.
The key observation is that checking if y repeated some number of times matches a segment of t can be done efficiently using string hashing. Precompute rolling hashes for s and t. Then, for each potential split, we only need to compare hashes of x and z with the corresponding prefix and suffix of t. For y, we check that the portion of t corresponding to repeated y is a multiple of len(y) and that all repeated blocks have the same hash as y. This reduces verification from iterating over t character by character to O(1) hash comparisons per block, giving an overall complexity linear in n + m.
This observation transforms an infeasible brute-force into a feasible rolling-hash-based solution.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O(n^2 * m) | O(1) | Too slow |
| Hashing + Split Enumeration | O(n + m) | O(n + m) | Accepted |
Algorithm Walkthrough
- Precompute prefix hashes for both
sandtusing a rolling hash, and compute powers of the hash base up tomto enable O(1) substring hash queries. This lets us efficiently compare any substring by its hash instead of character by character. - Iterate over all possible split points
iforxinsfrom0ton. For eachi, iterate over possible split pointsjforystarting fromiton-1. Thenzis implicitlys[j:]. - Compute the length of the "middle portion" in
tcorresponding to repeatedy: this ism - len(x) - len(z). If this length is not a multiple oflen(y), skip this split becauseycannot repeat a non-integer number of times. - Compare the hashes of
xwitht[:len(x)]andzwitht[-len(z):]. If these do not match, skip this split. - Compute the number of repetitions
k = (m - len(x) - len(z)) // len(y). Compare the hash ofywith every block of lengthlen(y)in the middle oftusing precomputed rolling hashes. If all blocks matchy, count this triple as valid. - Sum over all valid triples and output the total.
Why it works: the algorithm ensures that x and z match exactly the prefix and suffix of t, and that y repeated fills the middle exactly. Rolling hashes guarantee correctness with high probability, and iterating over all splits covers all valid (x, y, z) configurations.
Python Solution
import sys
input = sys.stdin.readline
MOD = 10**9+7
BASE = 31
def compute_hashes(s):
n = len(s)
h = [0] * (n+1)
power = [1] * (n+1)
for i in range(n):
h[i+1] = (h[i]*BASE + (ord(s[i]) - ord('a') + 1)) % MOD
power[i+1] = (power[i]*BASE) % MOD
return h, power
def get_hash(h, power, l, r):
return (h[r] - h[l]*power[r-l] % MOD + MOD) % MOD
def main():
n, m = map(int, input().split())
s = input().strip()
t = input().strip()
hs, ps = compute_hashes(s)
ht, pt = compute_hashes(t)
ans = 0
for i in range(n+1): # x ends at i-1
for j in range(i, n+1): # y ends at j-1, z = s[j:]
len_x = i
len_y = j - i
len_z = n - j
if len_y == 0:
continue
middle_len = m - len_x - len_z
if middle_len <= 0 or middle_len % len_y != 0:
continue
if get_hash(hs, ps, 0, len_x) != get_hash(ht, pt, 0, len_x):
continue
if get_hash(hs, ps, j, n) != get_hash(ht, pt, m - len_z, m):
continue
k = middle_len // len_y
valid = True
y_hash = get_hash(hs, ps, i, j)
for t_start in range(len_x, len_x + k*len_y, len_y):
if get_hash(ht, pt, t_start, t_start+len_y) != y_hash:
valid = False
break
if valid:
ans += 1
print(ans)
if __name__ == "__main__":
main()
The solution precomputes hashes to enable constant-time substring comparisons. The nested loops over i and j enumerate possible splits for x and y. We skip splits where y cannot fit an integer number of times into the middle of t. We verify x and z match the prefix and suffix of t, and then check that the repeated blocks of y match the corresponding portion in t. Incrementing ans counts each valid triple.
Worked Examples
Sample 1:
s = "abcd", t = "abcbcbcd"
| i | j | len_x | len_y | len_z | middle_len | k | valid |
|---|---|---|---|---|---|---|---|
| 1 | 3 | 1 | 2 | 1 | 6 | 3 | True |
The only valid triple is ("a", "bc", "d"). The repeated y "bc" appears 3 times in the middle of t, matching the hash checks.
Sample 2:
s = "aa", t = "aaaaa"
| i | j | len_x | len_y | len_z | middle_len | k | valid |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 1 | 1 | 4 | 4 | True |
| 0 | 2 | 0 | 2 | 0 | 5 | 2 | True |
| 1 | 2 | 1 | 1 | 0 | 4 | 4 | True |
This confirms the algorithm handles empty x or z and multiple repetitions of single-character y.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n + m) | Precompute hashes in O(n + m), then enumerate splits in O(n^2) but inner check is O(1) using rolling hashes and arithmetic. In practice, with large n |