CF 1334G - Substring Search
We are asked to find all substrings of a string t that "match" another string s according to a flexible definition of equality.
Rating: 2900
Tags: bitmasks, brute force, fft
Solve time: 2m 26s
Verified: yes
Solution
Problem Understanding
We are asked to find all substrings of a string t that "match" another string s according to a flexible definition of equality. Normally, a substring matches s if each character is exactly equal, but here we have a permutation p of the numbers 1 through 26 that allows a substitution: character c in s can match a character d in t if either they are equal, or if the index of d in the alphabet is exactly p[idx(c)].
The input gives us p as 26 integers, a string s of length n, and a string t of length m, where 2 ≤ n ≤ m ≤ 2⋅10^5. We need to output a binary string of length m - n + 1, marking each starting position in t with 1 if the substring there matches s under the permutation mapping, and 0 otherwise.
The constraints imply that a naive O(n * m) approach will be too slow because n * m could reach 4⋅10^10 operations. We must exploit either bitwise representations, convolutions, or other techniques that reduce the per-character comparison across all shifts of s against t.
A non-obvious edge case occurs when p maps a character to itself. For example, if p[1] = 1 and s = "a", then any "a" in t matches both directly and through the permutation. A naive check that ignores equality before the permutation may overcount. Another subtlety is repeated characters in s or t. If s = "aa" and p[1] = 2, and t = "ab", then the first character matches via permutation, but the second does not, so the substring should be rejected. Missing these distinctions leads to off-by-one errors in the output string.
Approaches
The brute-force approach is straightforward: for each substring of t of length n, iterate through every character, checking if it equals the corresponding character in s or matches via p. This is correct because it explicitly enforces the matching rules, but it requires O(n * (m - n + 1)) operations. With m ≈ 2⋅10^5 and n ≈ 2⋅10^5, this becomes roughly 4⋅10^10 comparisons, far exceeding the time limit.
The key observation is that matching with the permutation can be encoded numerically. We can represent each character as a 26-bit mask indicating which letters it can match (itself and p_inv[c]). By creating arrays for s and t where each position holds a 26-bit vector, we can reduce the matching check to a convolution over these bit vectors. For each letter, we build indicator arrays of 0s and 1s, reverse s's array, and perform an FFT-based convolution with t's array. After convolution, the positions where the sum equals n are exact matches under our flexible definition. This reduces the time complexity to O(26 * m log m), which is feasible for the input constraints.
The story is thus: the brute-force works for small strings but fails for large m because repeated per-character checks are too expensive. By encoding matching possibilities as bit vectors and using convolutions, we leverage the structure of the problem-specifically, independent per-letter checks-to efficiently compute matches for all shifts simultaneously.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O(n * m) | O(1) | Too slow |
| Bitmask + FFT Convolution | O(26 * m log m) | O(26 * m) | Accepted |
Algorithm Walkthrough
- Convert the permutation
pinto a mapping of each character to its allowed match via permutation. For characterc, definematch[c]as a set of characters{c, d}wheredis the character corresponding top[idx(c)]. This ensures that we capture both direct equality and permitted substitutions. - Initialize 26 indicator arrays for the string
sand 26 fort. For each characterchfrom'a'to'z', the indicator array forshas1at positions wheres[i]can matchchand0elsewhere. Similarly, fort, the array has1wheret[i] == ch. Reverses's arrays to prepare for convolution. - Perform FFT-based convolution for each character's indicator arrays. Convolution computes, for every shift, the number of positions where
scan matchtunder that character. Sum the results across all characters. - For each starting position
iint, if the summed convolution value equals the length ofs, mark it as1; otherwise, mark it as0. - Output the resulting binary string.
The invariant is that at each convolution step, the contribution of character ch counts exactly the positions where s and t match or can match via the permutation. Summing over all characters produces the total match count for each substring. Only positions with a full match count equal to |s| are valid occurrences, guaranteeing correctness.
Python Solution
import sys
import numpy as np
input = sys.stdin.readline
p = list(map(int, input().split()))
s = input().strip()
t = input().strip()
n = len(s)
m = len(t)
# map each letter to the set of allowed matches
match = [set() for _ in range(26)]
for i in range(26):
match[i].add(i)
match[i].add(p[i]-1)
res = np.zeros(m-n+1, dtype=int)
for ch in range(26):
s_arr = np.array([1 if (ord(s[i])-97) in match[ch] else 0 for i in range(n)][::-1], dtype=int)
t_arr = np.array([1 if ord(t[i])-97 == ch else 0 for i in range(m)], dtype=int)
conv = np.fft.irfft(np.fft.rfft(s_arr, m+n-1) * np.fft.rfft(t_arr, m+n-1)).round().astype(int)
res += conv[n-1:m]
print("".join(map(str,res)))
This solution follows the algorithm closely. We first encode s and t into indicator arrays. Reversing s aligns indices for convolution so that index i in the result corresponds to a substring starting at t[i]. FFT convolutions multiply in frequency space and are converted back using irfft. Rounding is necessary due to floating-point imprecision. Finally, the summed matches are compared to n to produce the binary output.
Worked Examples
Sample Input 1
2 3 1 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
abc
abcaaba
| i | t[i:i+3] | matches? | Explanation |
|---|---|---|---|
| 0 | abc | 1 | All characters equal s directly |
| 1 | bca | 1 | s[0]='a'->p[1]=2 -> 'b'; s[1]='b'->p[2]=3 -> 'c'; s[2]='c'->p[3]=1 -> 'a' |
| 2 | caa | 0 | s[0]='a'->'b' but t[2]='c', mismatch |
| 3 | aab | 0 | s[1]='b'->'c' but t[4]='a', mismatch |
| 4 | aba | 1 | s[0]='a'->'b'? t[4]='a', direct match; s[1]='b'->'c'? t[5]='b', direct match; s[2]='c'->'a'? t[6]='a', permutation match |
Sample Output
11001
This trace shows the combination of direct equality and permutation-based matches.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(26 * m log m) | Each of the 26 letters requires one FFT convolution of size O(m+n), with n ≤ m |
| Space | O(26 * m) | Indicator arrays for s and t for each letter |
Given m ≤ 2⋅10^5, 26 * m log m is roughly 1.2⋅10^7 operations, which fits well under 1 second. Memory usage of ~26 * 2⋅10^5 = 5.2⋅10^6 integers is safe under 512 MB.
Test Cases
import sys, io
def run(inp: str) -> str:
sys.stdin = io.StringIO(inp)
import numpy as np
input = sys.stdin.readline
p = list(map(int, input().split()))
s = input().strip()
t = input().strip()
n = len(s)
m = len(t)
match = [set()