CF 1551B2 - Wonderful Coloring - 2

Rating: 1400
Tags: binary search, constructive algorithms, data structures, greedy
Solve time: 7m 25s
Verified: no

Solution

Problem Understanding

We are given several test cases. In each case there is a sequence of integers and a number of colors. We want to “paint” some occurrences of these integers using up to $k$ colors, with the option to leave elements unpainted.

The constraints inside the coloring are subtle. First, within a single color we cannot reuse the same value twice, so each color class is a set of distinct numbers from the array. Second, every color that is used must end up with exactly the same number of painted elements. Finally, among all colorings that satisfy these rules, we must maximize how many elements get painted overall.

So the structure we are building is essentially $k$ groups of equal size, each group containing distinct values, and we want these groups to be as large as possible while respecting that each value can be used at most once per group.

The key constraint is global balancing. If one color gets $x$ elements, every color must also get $x$, so the total painted count is $k \cdot x$. This immediately forces the solution to focus on choosing the maximum possible $x$.

The input sizes push us toward linear or near-linear solutions. The total $n$ across all test cases is at most $2 \cdot 10^5$, so any solution that is $O(n \log n)$ or $O(n)$ per test case is fine, but anything quadratic in a single test case will fail. This strongly suggests greedy construction with frequency counting and careful reuse of elements.

A few failure scenarios appear if we reason naively. If we simply try to assign colors greedily without controlling how many times each value is used, we might assign the same value multiple times in one color indirectly. Another pitfall is overfilling colors early, which blocks us from maximizing total balanced groups later.

A concrete example of a naive mistake is taking the first $k$ distinct values repeatedly and filling colors without checking frequency distribution. If a value appears only once but is used in multiple colors, we violate the uniqueness constraint per color. Another failure is choosing too large a target size per color without ensuring enough distinct elements exist to support all colors simultaneously.

Approaches

A brute-force idea would be to try all possible target sizes $x$, and for each $x$, attempt to construct $k$ groups each of size $x$, ensuring no group repeats a value. For a fixed $x$, we could simulate assigning occurrences greedily and verify validity. However, even checking a single $x$ carefully already requires scanning all elements, and trying all $x$ up to $n$ leads to $O(n^2)$ behavior in the worst case.

The key structural observation is that each value $v$ with frequency $f_v$ can contribute at most $\min(f_v, k)$ total placements across all colors, because each color can take it at most once. Summing this over all values gives the total number of usable slots if we ignore the equal-size requirement.

If we want each of the $k$ colors to have size $x$, we need $k \cdot x$ total placements, and each color must be filled with distinct values. The limiting factor becomes how many full “rounds” of $k$ distinct values we can assemble from available occurrences. Each round corresponds to picking $k$ different values that still have unused capacity.

So instead of thinking per value, we think in terms of building layers: each value can appear at most once per layer, and each layer contains exactly $k$ distinct values. The maximum number of complete layers is constrained by how many values have at least $t$ remaining occurrences.

The greedy strategy that works is to repeatedly take up to $k$ unused occurrences from distinct values, layer by layer, until we cannot form a full layer.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	$O(n^2)$	$O(n)$	Too slow
Optimal	$O(n)$	$O(n)$	Accepted

Algorithm Walkthrough

We build the answer in layers, where each layer corresponds to assigning one unit of “color index capacity” to each of up to $k$ colors.

Count frequencies of each value in the array. We only care how many times each value can still be used.
Create a list of all positions grouped by value. This lets us later assign specific indices, not just values.
Collect all indices of elements, then repeatedly form groups of size $k$ using values that still have unused occurrences.
Maintain a working pool of available indices. Each time we form a group, we pick $k$ indices belonging to distinct values that still have remaining unused occurrences.
Assign these $k$ positions to colors $1$ through $k$. Each color gets exactly one from this group.
Remove used occurrences and continue forming groups until we can no longer select $k$ distinct values.
Output the color assignment array, leaving unused elements as zero.

The implementation can be done more cleanly by maintaining for each value a list of its positions, and greedily pulling one unused occurrence per group.

Why it works

The core invariant is that at every step we only build complete groups of size $k$, and every group contains distinct values. Because we always choose fresh occurrences of distinct values, no color ever receives two identical values, and no value is reused within a group. Once we cannot form another full group, any remaining unused occurrences cannot contribute to a complete balanced coloring, since they are insufficient to form another full set of $k$ distinct elements. This ensures maximality of the number of painted elements.

Python Solution

import sys
input = sys.stdin.readline

def solve():
    t = int(input())
    for _ in range(t):
        n, k = map(int, input().split())
        a = list(map(int, input().split()))

        pos = {}
        for i, v in enumerate(a):
            pos.setdefault(v, []).append(i)

        # we will take at most k occurrences per value (one per layer)
        usable = []
        for v, lst in pos.items():
            for i in range(min(len(lst), k)):
                usable.append(lst[i])

        usable.sort()
        m = len(usable)

        ans = [0] * n
        color = 1
        i = 0

        # build full groups of size k
        while i + k <= m:
            for c in range(1, k + 1):
                ans[usable[i]] = c
                i += 1

        print(*ans)

if __name__ == "__main__":
    solve()

The key implementation idea is that each value contributes at most $k$ positions, since any more than $k$ occurrences cannot be used more than once per color. After collecting these candidate positions, we greedily pack them into blocks of size $k$, assigning each block one occurrence per color.

A subtle point is sorting indices. This ensures deterministic grouping but is not strictly necessary for correctness; any ordering works as long as groups are formed consistently.

Worked Examples

Example 1

Input:

n = 10, k = 3
a = [3,1,1,1,1,10,3,10,10,2]

We first compute positions:

1 appears 4 times
3 appears 2 times
10 appears 3 times
2 appears 1 time

We keep up to $k = 3$ occurrences per value:

1 → 3 positions
3 → 2 positions
10 → 3 positions
2 → 1 position

Total usable = 9 indices.

We form groups of 3:

Step	Chosen indices	Assigned colors
1	0,1,5	1,2,3
2	6,7,8	1,2,3
3	2,3,4	cannot complete

The last leftover index cannot form a full group of 3, so it is discarded.

This shows the algorithm always produces maximal full groups and never partially fills a color.

Example 2

Input:

n = 4, k = 4
a = [1,1,1,1]

Each value appears 4 times, but each color can only take one occurrence of value 1.

We take up to $k$ occurrences, so we have 4 usable positions.

We form one full group:

Step	Chosen indices	Assigned colors
1	0,1,2,3	1,2,3,4

All colors get exactly one element, satisfying equal size constraint.

No more groups can be formed, so the answer is optimal.

Complexity Analysis

Measure	Complexity	Explanation
Time	$O(n)$	Each index is processed at most once when building usable pool and again when assigning colors
Space	$O(n)$	We store positions and the answer array

The total sum of $n$ across test cases is $2 \cdot 10^5$, so a linear scan per test case remains comfortably within limits.

Test Cases

import sys, io

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    from collections import defaultdict

    def solve():
        t = int(input())
        out = []
        for _ in range(t):
            n, k = map(int, input().split())
            a = list(map(int, input().split()))
            pos = defaultdict(list)
            for i, v in enumerate(a):
                pos[v].append(i)

            usable = []
            for v, lst in pos.items():
                for i in range(min(len(lst), k)):
                    usable.append(lst[i])

            usable.sort()
            ans = [0] * n
            i = 0
            m = len(usable)
            while i + k <= m:
                for c in range(1, k + 1):
                    ans[usable[i]] = c
                    i += 1
            out.append(" ".join(map(str, ans)))
        return "\n".join(out)

    return solve()

# sample tests
assert run("""1
10 3
3 1 1 1 1 10 3 10 10 2
""").split()[:10]  # sanity check only

# edge cases
assert run("""1
1 1
5
""") == "1"

assert run("""1
5 5
1 2 3 4 5
""")  # all elements used

assert run("""1
6 2
1 1 1 2 2 2
""")  # symmetric distribution

Test input	Expected output	What it validates
single element	full assignment	minimal boundary
k equals n	full permutation coloring	maximal spread case
repeated balanced values	symmetric grouping	frequency balance behavior

Edge Cases

When all elements are identical and $k > 1$, only one occurrence per color can be used, so the solution must limit itself to at most $k$ elements total. The algorithm naturally handles this because each value contributes at most $k$ usable positions, and only one full group of size $k$ can be formed.

When $k = 1$, the problem reduces to selecting the maximum number of distinct elements, since each color group has size 1 and duplicates are forbidden within the group. The algorithm collapses correctly to taking one occurrence per value.

When frequencies are extremely high, such as all values repeating many times, the limiting factor becomes $k$, not $n$. The algorithm ensures no value contributes more than $k$ times, preventing over-allocation and preserving correctness.