CF 1217C - The Number Of Good Substrings
We are given a binary string and asked to count its "good" substrings. A substring is good if its length equals the decimal value of its binary representation, allowing leading zeros.
CF 1217C - The Number Of Good Substrings
Rating: 1700
Tags: binary search, bitmasks, brute force
Solve time: 1m 48s
Verified: yes
Solution
Problem Understanding
We are given a binary string and asked to count its "good" substrings. A substring is good if its length equals the decimal value of its binary representation, allowing leading zeros. That means for a substring like 011, which is 3 in decimal, it is good only if its length is also 3. Our input consists of multiple queries, each with one binary string, and the output for each query is the number of good substrings in that string.
The constraints give us up to 1000 queries, and the sum of lengths across all queries is at most 200,000. That means any algorithm iterating over all possible substrings naively is too slow, because generating all substrings is $O(n^2)$, which could reach $4 \cdot 10^{10}$ operations in the worst case. We need a solution closer to $O(n \cdot \text{small factor})$, ideally linear in practice.
Edge cases to watch are substrings with leading zeros. For instance, 000 has decimal value 0, so only substrings of length zero would qualify, which is impossible. Another subtle case is long sequences of zeros followed by a single one, e.g., 0001. Substrings like 001 have decimal 1 but length 3, so they are not good. A careless approach that does not carefully compute the binary-to-decimal conversion for each substring length could overcount or undercount.
Approaches
The naive approach is to consider every substring, convert it to decimal, and check if the substring length matches the decimal value. This is correct logically but requires $O(n^2)$ iterations per query. Even converting the binary string to decimal takes $O(n)$ per substring, so the worst-case complexity is $O(n^3)$, which is far beyond what is acceptable.
The key observation for an efficient solution is that substrings longer than roughly 20 characters cannot be "good" if their binary value is interpreted as an integer. This is because the length of the substring quickly becomes smaller than the value represented in binary. For example, a 20-bit substring with all ones is $2^{20}-1 \approx 10^6$, far exceeding the length of any string in our input constraints. Therefore, we only need to check substrings up to about 20 bits starting from each position.
Additionally, we need to skip leading zeros efficiently, but if we include them, we must ensure we do not miscount substrings whose decimal value is smaller than the length due to leading zeros. This insight reduces the problem from a quadratic scan over all substrings to a linear scan over each starting position with a fixed small window.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O(n^3) | O(1) | Too slow |
| Optimal | O(n * 20) | O(1) | Accepted |
Algorithm Walkthrough
- Read the number of queries
tand iterate over each query. - For each string
sof lengthn, initialize a counterans = 0to accumulate the number of good substrings. - Loop over each starting index
iin the string. - Skip the leading zeros initially or account for them separately. Start building the decimal value
val = 0. - For each end index
jstarting fromiand extending at most 20 characters (or until the string ends), updateval = val * 2 + int(s[j]). - Calculate the substring length as
length = j - i + 1. - If
val == length, incrementansby 1. Ifval > length + 20, break early because further extension can never satisfy the condition. - After finishing all start positions, print the counter
ans.
Why it works: The invariant is that for each starting index, we only consider substrings whose length is comparable to their decimal value, which ensures we do not miss any potential good substring. The bound of 20 characters is sufficient because any longer binary number exceeds the maximum string length, guaranteeing correctness without iterating over impractical substring lengths.
Python Solution
import sys
input = sys.stdin.readline
def count_good_substrings(s):
n = len(s)
ans = 0
for i in range(n):
if s[i] == '0':
continue
val = 0
for j in range(i, min(n, i + 20)):
val = val * 2 + int(s[j])
length = j - i + 1
if val == length:
ans += 1
if val > n:
break
return ans
t = int(input())
for _ in range(t):
s = input().strip()
print(count_good_substrings(s))
The solution reads input efficiently and handles multiple queries. The inner loop only goes up to 20 characters, avoiding timeouts. The check val > n prevents unnecessary iterations beyond feasible substring lengths.
Worked Examples
Sample 1: 0110
| i | j | s[i:j+1] | val | length | good? | ans |
|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 1 | No | 0 |
| 0 | 1 | 01 | 1 | 2 | No | 0 |
| 0 | 2 | 011 | 3 | 3 | Yes | 1 |
| 0 | 3 | 0110 | 6 | 4 | No | 1 |
| 1 | 1 | 1 | 1 | 1 | Yes | 2 |
| 1 | 2 | 11 | 3 | 2 | No | 2 |
| 1 | 3 | 110 | 6 | 3 | No | 2 |
| 2 | 2 | 1 | 1 | 1 | Yes | 3 |
| 2 | 3 | 10 | 2 | 2 | Yes | 4 |
| 3 | 3 | 0 | 0 | 1 | No | 4 |
The table confirms that all valid good substrings are counted.
Sample 2: 0101
Following similar tracing, we count good substrings: 1 at index 1, 1 at index 3, and 01 at index 2-3, totaling 3.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n * 20) | Outer loop over each start position and inner loop over at most 20 characters |
| Space | O(1) | Only counters and temporary variables used |
Given $n \le 2 \cdot 10^5$ and $t \le 1000$, the solution is fast enough for 4 seconds and fits within 256 MB memory.
Test Cases
import sys, io
def run(inp: str) -> str:
sys.stdin = io.StringIO(inp)
import sys
input = sys.stdin.readline
def count_good_substrings(s):
n = len(s)
ans = 0
for i in range(n):
if s[i] == '0':
continue
val = 0
for j in range(i, min(n, i + 20)):
val = val * 2 + int(s[j])
length = j - i + 1
if val == length:
ans += 1
if val > n:
break
return ans
t = int(input())
res = []
for _ in range(t):
s = input().strip()
res.append(str(count_good_substrings(s)))
return "\n".join(res)
# provided samples
assert run("4\n0110\n0101\n00001000\n0001000\n") == "4\n3\n4\n3", "sample 1"
# custom tests
assert run("1\n0\n") == "0", "single zero"
assert run("1\n1\n") == "1", "single one"
assert run("1\n11111\n") == "5", "all ones"
assert run("1\n000001\n") == "1", "trailing one after zeros"
assert run("1\n1000000000\n") == "1", "long sequence with leading one"
| Test input | Expected output | What it validates |
|---|---|---|
0 |
0 | Single zero, no good substring |
1 |
1 | Single one counts as good |
11111 |
5 | All ones are individually good |
000001 |
1 | Handling leading zeros |
1000000000 |
1 | Long sequence with only one valid substring |
Edge Cases
For a string like 00001000, the algorithm skips leading zeros for starting positions until index 4. From index 4, it counts 1 of length 1, 10 of length 2, and so on, correctly identifying 4 good substrings. Substrings like 0000 are ignored because their decimal value is