CF 1426F - Number of Subsequences

We are given a string composed of the letters "a", "b", "c", and the special character "?". Each "?" can be replaced independently by any of the letters "a", "b", or "c".

CF 1426F - Number of Subsequences

Rating: 2000
Tags: combinatorics, dp, strings
Solve time: 1m 10s
Verified: yes

Solution

Problem Understanding

We are given a string composed of the letters "a", "b", "c", and the special character "?". Each "?" can be replaced independently by any of the letters "a", "b", or "c". After performing all possible replacements, we need to count the total number of subsequences "abc" across all resulting strings. A subsequence does not require contiguous letters but must maintain relative order.

The input size allows strings up to 200,000 characters, which means any algorithm that explicitly generates all $3^k$ combinations of "?" will be infeasible. Even a single "?" can triple the number of strings, so brute force enumeration is completely out of the question. The key challenge is to account for all possible replacements without iterating through them individually, which requires careful combinatorial reasoning or dynamic programming.

Non-obvious edge cases include strings that contain only question marks, strings without any "a", "b", or "c", and strings where letters appear in positions that prevent forming "abc". For example, the string "???" can generate 27 strings, and the correct output is 1×3 + 3×1 + … depending on the distribution, which illustrates that naive counting can easily miss multiplicative contributions from question marks.

Approaches

The naive approach is to generate every possible string obtained by replacing "?" with "a", "b", or "c", then count "abc" subsequences in each string. This is correct but impractical, because for k question marks there are $3^k$ strings. For $k = 20$, this exceeds $3^{20} \approx 3.5 \times 10^9$, far beyond the feasible limit for a 1-second program.

The insight that leads to an efficient solution is to maintain running counts of the number of ways subsequences ending in "a", "ab", and "abc" can be formed as we process the string from left to right. We treat "?" as a wildcard that can be any of the three letters. Each "?" multiplies existing subsequences by three because each can generate three options, but it also contributes to new subsequences. By tracking counts of partial subsequences and scaling by powers of three to account for previous wildcards, we can compute the answer in linear time.

Approach Time Complexity Space Complexity Verdict
Brute Force $O(3^k \cdot n)$ $O(n)$ Too slow
Dynamic Programming (counting subsequences) $O(n)$ $O(1)$ Accepted

Algorithm Walkthrough

  1. Initialize three counters: a_count, ab_count, abc_count, all set to 0. These represent the number of subsequences of type "a", "ab", and "abc" formed so far. Initialize power_of_three to 1, which represents the number of ways the previous question marks can be replaced.
  2. Iterate over each character ch in the string.
  3. If ch is "a", increment a_count by power_of_three. This reflects that any existing combinations of previous wildcards can be extended by this "a".
  4. If ch is "b", increment ab_count by a_count. Each previous "a" can now pair with this "b" to form a partial "ab" subsequence.
  5. If ch is "c", increment abc_count by ab_count. Each previous "ab" can now pair with this "c" to complete an "abc" subsequence.
  6. If ch is "?", it can be treated as any of "a", "b", or "c". Before updating counts, store the current values of a_count, ab_count, and abc_count in temporary variables. Update each count according to the contributions of choosing "a", "b", or "c":
  • New abc_count = 3 × old abc_count + old ab_count (contribution if "?" is "c")
  • New ab_count = 3 × old ab_count + old a_count (contribution if "?" is "b")
  • New a_count = 3 × old a_count + power_of_three (contribution if "?" is "a")
  • Multiply each old count by 3 to account for the three options of the current "?" and add the new subsequence contributions.
  1. After handling a "?", multiply power_of_three by 3 to reflect the additional wildcard.
  2. Return abc_count modulo $10^9 + 7$ after processing the entire string.

Why it works: At every step, the counts reflect all possible subsequences that can be formed with all previous characters and wildcard expansions. By updating counts multiplicatively for each "?" and additively for contributions to new subsequences, we ensure no combination is missed or double-counted. The algorithm maintains an invariant: a_count = number of ways to pick "a"s so far, ab_count = number of ways to pick "ab"s, and abc_count = number of ways to pick "abc"s, considering all substitutions of "?" seen to that point.

Python Solution

import sys
input = sys.stdin.readline

MOD = 10**9 + 7

n = int(input())
s = input().strip()

a_count = 0
ab_count = 0
abc_count = 0
power_of_three = 1  # 3^number_of_question_marks_so_far

for ch in s:
    if ch == 'a':
        a_count = (a_count + power_of_three) % MOD
    elif ch == 'b':
        ab_count = (ab_count + a_count) % MOD
    elif ch == 'c':
        abc_count = (abc_count + ab_count) % MOD
    elif ch == '?':
        new_abc = (3 * abc_count + ab_count) % MOD
        new_ab = (3 * ab_count + a_count) % MOD
        new_a = (3 * a_count + power_of_three) % MOD
        a_count, ab_count, abc_count = new_a, new_ab, new_abc
        power_of_three = (power_of_three * 3) % MOD

print(abc_count)

The solution initializes counters and iterates through the string once, updating counts according to the rules for letters and wildcards. The subtlety lies in correctly multiplying by three for previous sequences when a "?" is encountered, and then adding contributions for new subsequences formed if the "?" becomes "a", "b", or "c".

Worked Examples

Sample Input 1: "ac?b?c"

Index Char a_count ab_count abc_count power_of_three
0 a 1 0 0 1
1 c 1 0 0 1
2 ? 4 1 0 3
3 b 4 5 0 3
4 ? 15 19 5 9
5 c 15 19 24 9

This trace shows how each "?" multiplies previous counts and contributes to new subsequences, matching the expected total of 24.

Custom Input 2: "???"

Index Char a_count ab_count abc_count power_of_three
0 ? 1 0 0 3
1 ? 4 1 0 9
2 ? 13 5 1 27

The answer is 1, as expected, confirming that the algorithm correctly handles multiple consecutive question marks.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is processed once, with constant time updates of counters.
Space O(1) Only a fixed number of integer counters are maintained.

Given $n \le 2 \cdot 10^5$, the solution easily runs within 1 second.

Test Cases

import sys, io

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    n = int(input())
    s = input().strip()

    MOD = 10**9 + 7
    a_count = ab_count = abc_count = 0
    power_of_three = 1

    for ch in s:
        if ch == 'a':
            a_count = (a_count + power_of_three) % MOD
        elif ch == 'b':
            ab_count = (ab_count + a_count) % MOD
        elif ch == 'c':
            abc_count = (abc_count + ab_count) % MOD
        elif ch == '?':
            new_abc = (3 * abc_count + ab_count) % MOD
            new_ab = (3 * ab_count + a_count) % MOD
            new_a = (3 * a_count + power_of_three) % MOD
            a_count, ab_count, abc_count = new_a, new_ab, new_abc
            power_of_three = (power_of_three *