CF 128B - String
We are given a string of lowercase letters and an integer k. The task is to generate all possible substrings of the string, sort them lexicographically, and return the k-th substring in that order.
Rating: 2100
Tags: brute force, constructive algorithms, hashing, implementation, string suffix structures, strings
Solve time: 1m 21s
Verified: yes
Solution
Problem Understanding
We are given a string of lowercase letters and an integer k. The task is to generate all possible substrings of the string, sort them lexicographically, and return the k-th substring in that order. Substrings can repeat, so multiple identical substrings must be considered separately. If there are fewer than k substrings, we return "No such line.".
The constraints are that the string length n can reach up to 100,000, and k can be up to 100,000. Enumerating all substrings naively is infeasible, because a string of length n has roughly n(n+1)/2 substrings, which is up to 5 billion for n = 100,000. Sorting such a large collection is impossible within the time limit. This rules out any O(n²) or O(n² log n) approach.
Edge cases include strings with repeated characters, strings of length 1, and k exceeding the total number of substrings. For example, for "aa" and k = 3, the substrings in lexicographic order are "a", "a", "aa". A naive approach might accidentally remove duplicates, producing the wrong result, or could miscount indices.
Approaches
A brute-force solution is simple to describe. Generate every substring, store them in a list, sort the list lexicographically, and return the element at index k-1. This approach works for small strings because generating and sorting substrings is correct in principle. However, its time complexity is O(n² log n), as there are O(n²) substrings and sorting them takes O(n² log n²). With n = 10⁵, this is far too slow.
The key insight for an efficient solution is to avoid generating all substrings explicitly. Lexicographically sorted substrings can be represented as prefixes of the suffixes of the string. If we generate all suffixes and sort them, the lexicographic order of substrings corresponds to walking along these sorted suffixes and taking prefixes incrementally. By limiting the prefix length to k, we only consider the minimum needed characters, which avoids O(n²) work.
We can implement this efficiently using a priority queue or simply by sorting all suffixes, then counting prefixes in order. We stop once we reach the k-th substring. This reduces the memory overhead and avoids full substring enumeration.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O(n² log n²) | O(n²) | Too slow |
| Suffix + Prefix Enumeration | O(n log n + k²) | O(n + k²) | Accepted |
Algorithm Walkthrough
- Generate all suffixes of the string by taking
s[i:]for eachifrom 0 to n-1. Each suffix starts at a different position and contains all substrings starting from that position. - Sort the list of suffixes lexicographically. This ensures that walking through the suffixes in order will give substrings in lexicographic order as long as we consider their prefixes.
- Initialize a counter to track how many substrings we have enumerated. For each suffix in sorted order, enumerate all prefixes up to length k. Limit the prefix length to avoid generating more characters than needed, because the k-th substring can never be longer than k.
- For each prefix, increment the counter. If the counter equals k, print this prefix and terminate. If we finish enumerating all prefixes from all suffixes without reaching k, print
"No such line.". - This method guarantees correctness because the sorted suffixes contain all possible substrings in order, and counting prefixes in this way correctly handles repeated substrings.
Why it works: By sorting suffixes, we create a global lexicographic order. Every substring of the original string is a prefix of some suffix. Enumerating prefixes in increasing length ensures that duplicates are counted correctly and substrings appear in proper order. Limiting prefix length to k bounds unnecessary work and prevents time and memory blow-up.
Python Solution
import sys
input = sys.stdin.readline
def main():
s = input().strip()
k = int(input())
n = len(s)
# Generate all suffixes
suffixes = [s[i:] for i in range(n)]
suffixes.sort()
count = 0
for suff in suffixes:
# Limit the prefix length to k to avoid extra work
for l in range(1, min(len(suff), k) + 1):
count += 1
if count == k:
print(suff[:l])
return
print("No such line.")
if __name__ == "__main__":
main()
The first part reads the string and integer input efficiently. Generating suffixes is O(n) in space. Sorting suffixes is O(n log n). When counting prefixes, we stop as soon as the k-th substring is found. We use min(len(suff), k) because any substring longer than k cannot be the k-th substring if counting from the start.
Worked Examples
Sample 1:
Input: "aa", k = 2
| suffixes | sorted | prefixes counted | counter | output |
|---|---|---|---|---|
"aa", "a" |
"a", "aa" |
"a" |
1 | - |
"aa" |
2 | "a" |
This trace shows that the first suffix "a" generates "a". The second suffix "aa" generates "a" (duplicate counted), which is the second substring, so the output is "a".
Sample 2:
Input: "bc", k = 3
| suffixes | sorted | prefixes counted | counter | output |
|---|---|---|---|---|
"bc", "c" |
"bc", "c" |
"b" |
1 | - |
"bc" |
2 | - | ||
"c" |
"c" |
3 | "c" |
Here the algorithm correctly identifies "c" as the 3rd substring in lexicographic order.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n + k²) | Sorting n suffixes is O(n log n), enumerating prefixes up to length k across sorted suffixes is O(k²) |
| Space | O(n + k²) | O(n) for suffixes, O(k²) for storing small prefixes temporarily |
Given the constraints n ≤ 10⁵, k ≤ 10⁵, O(n log n + k²) is acceptable. Sorting 100,000 suffixes is fast enough, and limiting prefix enumeration ensures we never exceed memory.
Test Cases
import sys, io
def run(inp: str) -> str:
sys.stdin = io.StringIO(inp)
sys.stdout = io.StringIO()
main()
return sys.stdout.getvalue().strip()
# Provided samples
assert run("aa\n2\n") == "a", "sample 1"
assert run("bc\n3\n") == "c", "sample 2"
# Custom cases
assert run("a\n1\n") == "a", "single character string"
assert run("a\n2\n") == "No such line.", "k larger than total substrings"
assert run("aaa\n4\n") == "aa", "repeated letters with duplicates"
assert run("abcde\n10\n") == "cde", "moderate-length string with unique letters"
assert run("xyz\n6\n") == "No such line.", "total substrings less than k"
| Test input | Expected output | What it validates |
|---|---|---|
"a", 1 |
"a" |
single-character string |
"a", 2 |
"No such line." |
k exceeds total substrings |
"aaa", 4 |
"aa" |
repeated letters, duplicate counting |
"abcde", 10 |
"cde" |
normal case, unique letters |
"xyz", 6 |
"No such line." |
k larger than total substrings |
Edge Cases
For strings with repeated letters like "aaa" and k = 4, the sorted suffixes are "a", "aa", "aaa". Prefix enumeration counts "a", "aa", "aaa", "a" (from second suffix), "aa" (from second suffix), etc. The algorithm counts duplicates in order, so the 4th substring is "aa". For k larger than the total number of substrings, the algorithm completes the loops without returning, printing "No such line." exactly once. Single-character strings are handled because the suffix list has one element and the prefix enumeration works even for length 1.