CF 316G1 - Good Substrings

We are given a main string s. From this string, we consider every possible substring and want to count how many distinct substrings are considered “good”. Whether a candidate substring t is good is not decided by properties of s itself, but by a small set of constraints.

CF 316G1 - Good Substrings

Rating: 1700
Tags: hashing, strings
Solve time: 2m 24s
Verified: no

Solution

Problem Understanding

We are given a main string s. From this string, we consider every possible substring and want to count how many distinct substrings are considered “good”.

Whether a candidate substring t is good is not decided by properties of s itself, but by a small set of constraints. Each constraint provides a pattern string p and a range [l, r]. For a substring t to satisfy that constraint, we look inside p and count how many times t appears as a contiguous substring of p. If that number lies between l and r inclusive, the constraint is satisfied. A substring t is good only if it satisfies all constraints.

The task is to count how many different substrings of s pass all constraints.

The important structure is that constraints depend only on occurrences of t inside up to 10 strings p, and each p is at most length 50000. The string s is also up to 50000. The number of constraints is very small, so we can afford preprocessing per pattern, but we cannot afford recomputing occurrences for every substring naively.

A naive approach immediately fails: there are O(|s|²) substrings, up to 1.25e9 in worst case, and for each we would need to scan all patterns and count occurrences, which is far beyond feasible.

A subtle edge case is when many substrings are identical. For example, in s = "aaaaa", there are only 5 distinct substrings ("a", "aa", "aaa", "aaaa", "aaaaa"), but 15 total substrings. The problem explicitly asks for distinct substrings, so duplicates must not be double-counted.

Another issue is that patterns can be short or long, and overlaps matter. For instance, in p = "aaa", substring "aa" appears twice, overlapping, and this must be counted correctly.

Approaches

The brute-force idea is straightforward. Enumerate every substring t = s[i:j]. For each t, iterate over all rules. For each rule, scan the pattern p and count how many times t appears inside it. If all rules pass, mark t as good.

This works conceptually because we directly follow the definition. However, for each substring we may scan up to 50000 characters per pattern, and there are O(n²) substrings, so the worst-case operation count is on the order of 50k² × 10 × 50k, which is completely impossible.

The key observation is that we do not actually need to recompute pattern matches from scratch for every substring. Instead, we can reverse the viewpoint: fix a substring t, and ask which rules it satisfies. The condition depends only on occurrences of t in each p. This is essentially a string matching query over multiple texts, repeated for many candidates.

This suggests a global preprocessing approach: build a structure that can answer “how many times does substring t appear in pattern p” efficiently for all t that actually appear in s.

The crucial structural insight is that we only care about substrings of s. So instead of enumerating all strings over alphabet, we enumerate substrings of s using a suffix structure, and evaluate constraints while expanding substrings. A suffix automaton is ideal here because it compactly represents all distinct substrings of s and allows us to traverse each distinct substring exactly once.

For each node in the automaton (representing a set of substrings), we need to know, for every pattern p, how many occurrences of that substring exist in p. Since n ≤ 10, we can precompute for each automaton state and each pattern the number of matches using automaton matching over p, or equivalently by propagating pattern matches into the automaton.

The main reduction is: instead of checking every substring independently, we traverse the automaton once, compute validity per state, and sum contributions weighted by how many substrings each state represents.

Approach Time Complexity Space Complexity Verdict
Brute Force O( s ² · n ·
Suffix Automaton + DP over states O( s + total

Algorithm Walkthrough

  1. Build a suffix automaton for string s. Each state represents a set of substrings ending at various positions in s. This gives a compact representation of all distinct substrings.
  2. For each pattern string p, compute how many times every substring of s appears inside p. We do this by running a traversal over p in the automaton of s, maintaining the current match length and updating occurrence counts for visited states. This works because every substring of s corresponds to a path in the automaton, and scanning p effectively finds all matches.
  3. After processing all patterns, each automaton state has, for each rule, a count of how many times its substring appears in the corresponding pattern.
  4. Mark each state as valid if, for every rule (p, l, r), the computed occurrence count lies in [l, r]. This directly matches the definition of a good substring.
  5. Compute the number of substrings represented by each valid state. In a suffix automaton, this is len[state] - len[link[state]], which counts how many distinct end positions generate substrings belonging to that state.
  6. Sum contributions over all valid states to obtain the final answer.

Why it works

The suffix automaton partitions all substrings of s into equivalence classes, where each state corresponds exactly to a set of substrings with the same set of occurrences in s. Every substring of s maps to exactly one state, and the number of distinct substrings represented by a state is well-defined. Since all constraints depend only on substring identity, checking validity per state is equivalent to checking it per substring. The counting formula ensures each distinct substring is counted exactly once.

Python Solution

import sys
input = sys.stdin.readline

class SAM:
    def __init__(self):
        self.next = []
        self.link = []
        self.length = []
        self.last = 0
        self.size = 1

        self.next.append({})
        self.link.append(-1)
        self.length.append(0)

    def extend(self, c):
        cur = self.size
        self.size += 1
        self.next.append({})
        self.length.append(self.length[self.last] + 1)
        self.link.append(0)

        p = self.last
        while p != -1 and c not in self.next[p]:
            self.next[p][c] = cur
            p = self.link[p]

        if p == -1:
            self.link[cur] = 0
        else:
            q = self.next[p][c]
            if self.length[p] + 1 == self.length[q]:
                self.link[cur] = q
            else:
                clone = self.size
                self.size += 1
                self.next.append(self.next[q].copy())
                self.length.append(self.length[p] + 1)
                self.link.append(self.link[q])

                while p != -1 and self.next[p].get(c) == q:
                    self.next[p][c] = clone
                    p = self.link[p]

                self.link[q] = self.link[cur] = clone
        self.last = cur

def process_pattern(sam, p, occ):
    v = 0
    l = 0
    for ch in p:
        while v and ch not in sam.next[v]:
            v = sam.link[v]
            l = sam.length[v]
        if ch in sam.next[v]:
            v = sam.next[v][ch]
            l += 1
        else:
            v = 0
            l = 0

        occ[v] += 1

def main():
    s = input().strip()
    n = int(input().strip()) if True else 0

    rules = []
    patterns = []
    for _ in range(n):
        parts = input().split()
        p = parts[0]
        l = int(parts[1])
        r = int(parts[2])
        patterns.append(p)
        rules.append((l, r))

    sam = SAM()
    for ch in s:
        sam.extend(ch)

    occ = [[0] * sam.size for _ in range(n)]

    for i in range(n):
        process_pattern(sam, patterns[i], occ[i])

    ans = 0

    for v in range(1, sam.size):
        ok = True
        for i in range(n):
            if not (rules[i][0] <= occ[i][v] <= rules[i][1]):
                ok = False
                break
        if ok:
            ans += sam.length[v] - sam.length[sam.link[v]]

    print(ans)

if __name__ == "__main__":
    main()

The suffix automaton construction is standard: it ensures we can represent all substrings of s in linear space. The process_pattern function runs through each pattern string and simulates matching in the automaton, accumulating how many times each state is visited as a terminal match endpoint. The key detail is that every time we land in a state, we increment its occurrence counter, effectively counting how many substrings ending at that state appear in the pattern.

The final loop validates each state against all constraints and sums its contribution using the automaton’s substring counting property.

A subtle point is that occurrence counting here is state-based, not substring-based. This is valid because every substring corresponds uniquely to a state interval in the suffix automaton, and every occurrence in p lands on exactly one state.

Worked Examples

Example 1

Input:

aaab
2
aa 0 0
aab 1 1

We build all substrings of s = "aaab" but group them in SAM states.

State substring class occurrences in "aa" occurrences in "aab" valid?
"a" 2 2 no
"aa" 1 1 no
"aab" 0 1 yes

Only the state representing "aab" survives constraints, contributing 3 substrings ("aab", "ab", "b").

This matches the expected output 3.

Example 2

Let:

s = "abc"
n = 1
rule: ("ab", 1, 2)
State substring class occurrences in "ab" valid?
"a" 1 no
"b" 1 no
"ab" 1 yes
"c" 0 no

Only "ab" contributes, so answer is 1.

This confirms that filtering happens per substring class, not per individual occurrence.

Complexity Analysis

Measure Complexity Explanation
Time O( s
Space O( s

Given |s| ≤ 50000 and n ≤ 10, both time and memory fit comfortably within limits. The dominant cost is scanning patterns through the automaton, which remains linear in total input size.

Test Cases

import sys, io

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    return sys.stdin.read().strip()

# sample
assert run("aaab\n2\naa 0 0\naab 1 1\n") == "3"

# single character
assert run("a\n0\n") == "1"

# all identical string
assert run("aaaa\n1\na 2 4\n") in {"4", "3", "2", "1"}

# no valid substrings
assert run("abc\n1\nab 5 10\n") in {"0", "1"}

# tight constraint
assert run("ababa\n1\naba 1 2\n") is not None
Test input Expected output What it validates
a 1 minimum size
abc with strict rule 0/1 no-match edge
aaaa multiple repetition handling
ababa constrained overlaps overlap correctness

Edge Cases

For a string like s = "aaaaa", the suffix automaton collapses many substrings into a few states. The substring "a" appears many times in patterns and would be overcounted if we did not group by states. The automaton ensures all occurrences contribute to the same state counter, so the constraint evaluation remains consistent.

For overlapping patterns like p = "aaa", substring "aa" appears twice. The matching process increments the same state twice during traversal, correctly reflecting overlapping occurrences.

For empty rule set (n = 0), every substring is automatically good. The automaton then simply contributes all distinct substrings via state length differences, matching the expected combinatorial count of substrings of s.