CF 254C - Anagram

Rating: 1800
Tags: greedy, strings
Solve time: 14m 52s
Verified: yes

Solution

Problem Understanding

We have two uppercase strings of equal length. We may replace characters in the first string, and after all replacements the final string only needs to be an anagram of the second string. The order does not matter for the multiset of characters, but among all valid answers using the minimum number of replacements, we must output the lexicographically smallest resulting string.

The key detail is that we are not trying to transform s into t position by position. We only care that the final character frequencies match the frequencies of t.

Suppose t contains three As and two Bs. Then the final string must also contain exactly three As and two Bs, regardless of where they appear.

The length can reach 10^5, which immediately rules out anything quadratic. We cannot repeatedly rebuild strings, sort large prefixes many times, or try all replacement choices greedily without efficient bookkeeping. Since the alphabet contains only 26 uppercase letters, frequency counting becomes very attractive. An O(n * 26) solution is completely fine, while O(n^2) would already be around 10^10 operations in the worst case.

The lexicographically smallest requirement creates several subtle cases.

Consider:

s = BAA
t = AAC

The minimum number of replacements is 1. We must replace B with C, giving CAA. A careless strategy that greedily puts the smallest possible letter at every position might try to change the first B into A, but then we would have too many As and still need another replacement later.

Another tricky case is when a character already appears in excess but we should delay replacing some occurrences.

Example:

s = CBA
t = ABB

We need frequencies {A:1, B:2}. The correct answer is:

ABB

If we greedily replace the first oversized character immediately without checking future positions, we may keep the leading C and modify later letters instead, producing something larger lexicographically.

One more subtle situation happens when a character is currently excessive, but keeping an early occurrence helps lexicographic order less than saving replacements for later positions.

Example:

s = BBAA
t = AACC

We need two replacements. The best answer is:

AACC

A naive "replace only when forced" implementation can accidentally keep the first B, then later discover it must introduce an extra A anyway, producing a lexicographically larger answer.

The core difficulty is balancing two goals simultaneously:

First, use the minimum possible number of replacements.

Second, among all such solutions, make the earliest positions as small as possible.

Approaches

The brute-force way to think about the problem is to generate all anagrams of t, then check how many positions differ from s. The number of differing positions equals the number of replacements needed. Among all anagrams with minimum distance, we pick the lexicographically smallest one.

This is correct because every valid final string must be an anagram of t. The problem is that t can have up to 10^5 characters. Even for length 20, the number of distinct permutations is already enormous. Full enumeration is hopeless.

A more practical brute-force variant would greedily try every possible replacement at every position while maintaining frequency validity. For each index, we could test all 26 letters and see whether the remaining suffix can still satisfy the target frequencies. That still becomes expensive if implemented naively, especially with repeated frequency recomputation.

The important observation is that the minimum number of replacements is determined entirely by frequency overlap.

If s already contains five As and t needs three As, then at least two As must disappear. If s contains one C and t needs four Cs, then we must create three new Cs somewhere.

So the problem becomes:

We know exactly which letters are excessive and which letters are missing. How do we redistribute them to get the lexicographically smallest string?

This leads to a greedy left-to-right strategy.

At each position:

If the current character is still needed, we may want to keep it.

But if replacing it with a smaller missing character improves lexicographic order, we should do so immediately.

If the current character is excessive and there exists a smaller missing character, changing now is always beneficial because earlier positions dominate lexicographic order.

If only larger missing characters remain, we prefer delaying replacement when possible, because keeping a smaller current character helps the prefix stay minimal.

The small alphabet size makes bookkeeping easy. We maintain how many copies of each letter are still required and how many copies remain later in the string.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	Exponential	Exponential	Too slow
Optimal	O(n * 26)	O(26)	Accepted

Algorithm Walkthrough

Count the frequency of every character in t.

These counts describe exactly what the final string must contain. 2. Count the frequency of every character in s.

Comparing the two frequency arrays tells us which letters are excessive and which are missing. 3. Traverse the string from left to right.

We decide the final value of each position greedily because lexicographic order depends first on earlier positions. 4. For the current character c, first reduce its remaining count in the suffix.

This tells us how many copies of c still exist later if we choose not to modify this occurrence. 5. Check whether c is excessive.

If the current total count of c is already no larger than required, we must keep this character. Replacing it would only create an unnecessary extra operation. 6. If c is excessive, search for the smallest missing character x.

We try letters from 'A' upward because we want the lexicographically smallest result. 7. If x < c, replace immediately.

A smaller letter at an earlier position always improves the answer. 8. If x > c, replace only when we cannot safely keep c.

Specifically, if the remaining future copies of c are fewer than the required amount, then this occurrence must stay. Otherwise we can postpone replacement to a later occurrence. 9. Whenever a replacement happens, update both frequency arrays.

The excessive letter loses one occurrence, and the missing letter gains one occurrence. 10. Continue until all positions are processed.

Why it works

The algorithm maintains a simple invariant:

Before processing each position, the current frequency array represents the characters still present in the partially modified string.

Whenever we replace a character, we always move one unit from an excessive letter to a missing letter, so the minimum replacement count is preserved.

The lexicographic optimality comes from the left-to-right greedy choice. If a smaller missing letter can be placed at the current position, delaying that improvement can never help because lexicographic comparison depends on the earliest differing position. Conversely, when only larger replacement letters exist, we keep the current smaller character whenever possible and postpone the unavoidable increase to a later position.

Because every decision is locally optimal and never harms future feasibility, the final string is the lexicographically smallest among all minimum-replacement solutions.

Python Solution

import sys
input = sys.stdin.readline

def solve():
    s = list(input().strip())
    t = input().strip()

    need = [0] * 26
    cur = [0] * 26

    for ch in t:
        need[ord(ch) - 65] += 1

    for ch in s:
        cur[ord(ch) - 65] += 1

    changes = 0

    for i in range(len(s)):
        c = ord(s[i]) - 65

        # this character is not excessive
        if cur[c] <= need[c]:
            continue

        for x in range(26):
            if cur[x] < need[x]:
                # smaller character, always beneficial
                if x < c:
                    cur[c] -= 1
                    cur[x] += 1
                    s[i] = chr(x + 65)
                    changes += 1
                    break

                # larger character
                # replace only if enough copies remain later
                elif x > c:
                    if cur[c] - 1 >= need[c]:
                        cur[c] -= 1
                        cur[x] += 1
                        s[i] = chr(x + 65)
                        changes += 1
                    break

    print(changes)
    print("".join(s))

solve()

The two frequency arrays are the entire backbone of the solution.

need stores how many copies each character must appear in the final string. cur stores how many copies currently exist in our evolving answer.

The traversal is greedy and permanent. Once we move past a position, we never revisit it. That is safe because lexicographic order prioritizes earlier positions completely over later ones.

The condition:

if cur[c] <= need[c]:

is extremely important. It means this character is no longer excessive. Replacing it would force us to recreate the same character elsewhere, increasing the operation count unnecessarily.

The second subtle part is:

if cur[c] - 1 >= need[c]:

This checks whether we can safely discard the current occurrence and still have enough copies of this character later. Missing this condition causes many wrong answers because we may remove too many copies of a letter too early.

The loop over all 26 letters is cheap because the alphabet size is constant. Even with 10^5 characters, the total work stays around 2.6 * 10^6 operations.

Worked Examples

Example 1

Input:

s = ABA
t = CBA

Target frequencies are:

A:1 B:1 C:1

Current frequencies are:

A:2 B:1

Position	Current Char	Excessive?	Smallest Missing	Action	Result String
0	A	Yes	C	Keep, because C > A and one A is needed	ABA
1	B	No	C	Keep	ABA
2	A	Yes	C	Replace with C	ABC

Final answer:

1
ABC

This trace shows why we sometimes delay replacement. Replacing the first A with C would produce CBA, which is lexicographically larger than ABC.

Example 2

Input:

s = DDBAAC
t = ADBADC

Target frequencies:

A:2 B:1 C:1 D:2

Current frequencies:

A:2 B:1 C:1 D:2

The strings already have identical frequency multisets.

Position	Current Char	Excessive?	Action	Result
0	D	No	Keep	DDBAAC
1	D	No	Keep	DDBAAC
2	B	No	Keep	DDBAAC
3	A	No	Keep	DDBAAC
4	A	No	Keep	DDBAAC
5	C	No	Keep	DDBAAC

Final answer:

0
DDBAAC

This demonstrates that the algorithm never performs unnecessary replacements. Matching frequencies already imply zero operations.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n * 26)	Each position scans at most 26 letters
Space	O(26)	Only fixed-size frequency arrays are stored

Since the alphabet is fixed, the factor of 26 behaves like a constant. With n = 10^5, the algorithm easily fits within the time limit. Memory usage is negligible.

Test Cases

# helper: run solution on input string, return output string
import sys, io

def solve():
    input = sys.stdin.readline

    s = list(input().strip())
    t = input().strip()

    need = [0] * 26
    cur = [0] * 26

    for ch in t:
        need[ord(ch) - 65] += 1

    for ch in s:
        cur[ord(ch) - 65] += 1

    changes = 0

    for i in range(len(s)):
        c = ord(s[i]) - 65

        if cur[c] <= need[c]:
            continue

        for x in range(26):
            if cur[x] < need[x]:
                if x < c:
                    cur[c] -= 1
                    cur[x] += 1
                    s[i] = chr(x + 65)
                    changes += 1
                    break

                elif x > c:
                    if cur[c] - 1 >= need[c]:
                        cur[c] -= 1
                        cur[x] += 1
                        s[i] = chr(x + 65)
                        changes += 1
                    break

    print(changes)
    print("".join(s))

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    out = io.StringIO()
    backup = sys.stdout
    sys.stdout = out

    solve()

    sys.stdout = backup
    return out.getvalue()

# provided sample
assert run("ABA\nCBA\n") == "1\nABC\n", "sample 1"

# minimum size
assert run("A\nA\n") == "0\nA\n", "single equal character"

# complete replacement
assert run("AAA\nBBB\n") == "3\nBBB\n", "all characters replaced"

# already an anagram
assert run("ABCD\nDCBA\n") == "0\nABCD\n", "no changes needed"

# lexicographic subtlety
assert run("BAA\nAAC\n") == "1\nCAA\n", "must delay replacement correctly"

# repeated excessive letters
assert run("ZZZZ\nAAAA\n") == "4\nAAAA\n", "large repeated replacements"

Test input	Expected output	What it validates
`A / A`	`0 A`	Minimum-size input
`AAA / BBB`	`3 BBB`	Every position must change
`ABCD / DCBA`	`0 ABCD`	Existing anagram should remain unchanged
`BAA / AAC`	`1 CAA`	Delayed replacement for lexicographic optimality
`ZZZZ / AAAA`	`4 AAAA`	Repeated excess handling

Edge Cases

Consider:

s = BAA
t = AAC

The target requires one C, but replacing the first B immediately gives:

CAA

The algorithm processes the first position:

B is excessive, and the smallest missing letter is C.

Since C > B, the algorithm checks whether enough Bs remain later. They do not, because this is the only B. So replacement is forced.

The final answer becomes:

CAA

which is optimal.

Now consider:

s = ABA
t = CBA

At the first A, the smallest missing letter is C.

Since C > A, replacing immediately would worsen the prefix. The algorithm keeps this A because another A exists later and can be replaced instead.

At the final position, replacement becomes necessary:

ABC

This demonstrates the core greedy principle: postpone harmful replacements whenever possible.

Finally, consider a case where earlier improvement is always correct:

s = CBA
t = ABB

Target frequencies need an extra B and remove C.

At position 0, C is excessive and the smallest missing letter is B.

Because B < C, replacing immediately strictly improves lexicographic order. The algorithm performs the replacement at once and produces:

BBA

Any solution starting with C would be lexicographically larger.