Algorist's Combinatorics

The Problem

This is a problem from Chapter 1, Question 4 in “A Walk in Combinatorics” (4th edition) by Miklós Bóna. Don’t ask me why I was even reading this book in the first place, though.

We have distributed two hundred balls into one hundred boxes with the restrictions that no box got more than one hundred balls, and each box got at least one. Prove that it is possible to find some boxes that together contain exactly one hundred balls.

The Textbook Solution

Chapter 1 of the book is titled “Seven Is More Than Six. The Pigeon-Hole Principle.” Hence, the curated solution at the end of the chapter uses the pigeonhole principle (duh). The big idea goes like this:

WLOG, assume that the first and second bins have different number of balls, \(x_1 \neq x_2\). If this isn’t possible, then every bin has the same number of balls, so there is a trivial solution (any 50/50 partition works). Now, consider partial sums \(S_k\), the number of balls in the first \(k\) bins. If there exists two such sums \(S_i, S_j\) that have a difference of exactly 100, we are done, since we can just take whatever’s between the two indices \((i, ..., j-1)\). If there doesn’t exist such two indices, then we can conclude that we have 100 distinct (mod 100) values for 100 values of \(S_k\) (if there were, that is the previous case). Of course this is true since \(S_k\) increases monotonically, so no duplicate values.

At this point, we are ready to use the pigeonhole principle. Consider the second bin, \(x_2\). There must be another index such that \(S_k - x_2 \equiv 0 \pmod {100}\) (because there are 100 options), and in fact the real difference must be exactly 100. So we can construct the partition, \([k] - \{2\}\). And since we assumed \(x_1 \neq x_2\), \(k = 1\) is impossible. QED.

This is a slick proof, which actually proves something heuristically stronger: the construction gives an almost-contiguous set of indices!

My Solution

But somehow, even after being indoctrinated with the pigeonhole principle for the entirety of the book’s chapter, I took a completely different approach. I’d like to think because I recently read a paper about approximating integrals of monotonous functions1. But more so because I really do have more training in algorithm design and analysis than formal mathematics. Without further ado, here’s my construction.

First, sort the bins in nonincreasing order of the number of balls (all hail radix sort!). Let’s overload the notation used by the previous section, but now with the sorted list. Again, greedily take the first \(k\) bins, until we can’t pack anymore. If \(S_k = 100\), we are done. If not, it means \(S_{k+1} > 100, S_k < 100\), and \(x_{k+1} = S_{k+1} - S_k\). Note that we are deficient of \(<x_{k+1}\) balls (being exactly \(x_{k+1}\) balls away means that \(S_{k+1} = 100\)). If only there were such a way that we can guarantee that we can find another subset that fills a deficit of fewer than \(x_{k+1}\) balls…

Here’s my claim: we can always take balls from the back of our list to fill in the missing balls. In fact, there will be at least \((x_{k+1}-1)\) bins at the back with exactly 1 ball. So regardless of what \(x_{k+1}\) actually is, we can construct a partition.

What?!

Yep, it’s true. Here’s a trivial but important observation.

Obs: \(x_{k+1} \geq 2\), because by integrality, \(S_{k+1} \geq 101, S_k \leq 99\), and thus \(x_{k+1} = S_{k+1} - S_k \geq 2\).

Now, imagine we start with every bin with 2 balls uniformly. To transform this into the sorted list we actually have, we must transfer some of the balls in the back to the front. What happens is the following:

1) At least one ball is transferred from the end to the first bin. If not, every bin has 2 balls, and this is again the trivial case.

2) \((x_{k+1} - 2)\) balls are transferred from the back of the list to \({k+1}\). Then there are exactly \(x_{k+1}\) balls there.

Combining the two together, we get that there are at least \((x_{k+1} - 1)\) balls transferred from the back of the list to somewhere in the front. And remember that every bin has at least one ball. So indeed, it must be the case that the last \((x_{k+1} - 1)\) bins all have exactly 1 ball in them! Moreover, we can rest assured that the forward and backward packings never cross each other due to the observation above. In conclusion, we combine the first \(k\) bins with \(100-S_k\) one-ball bins from the back, for exactly 100 balls. QED.

It’s a bit more work and clever thinking than the pigeonhole proof, but what’s illuminating is that this gives an alternative greedy algorithm. The construction also has a different property: the partition we get will be filled up with the biggest chunks, followed by the smallest chunks to fill in the gap.

As an actual algorithm, the two proofs give the same linear asymptotics, assuming we use radix sort for the implementation. I think both proofs are quite cool, since they are essentially orthogonal to each other but produce the same result nonetheless. Personally, I always enjoy when my slightly nontraditional background in TCS allows me to come up with an entirely new perspective on problems.

Try it

Here’s the algorithm in action. Each bar is a bin (sorted left-to-right, largest first). Blue bars are the greedy prefix, the orange bar is the critical bin \(x_{k+1}\) that was too large to include, and green bars are the unit bins from the back that fill the deficit. Most of what you see will be very anticlimactic, so there’s also a button for multiple trials. Enjoy! (Credit to Claude Code, absolutely fantastic for prototyping and embedded scripts.)

|
Greedy prefix Skipped bin Back-fill Unused
  1. For those who really want to find out what that means: link 




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Proving the Monty Hall Problem, Intuitively
  • The REUNS Workshop Experience
  • An Interesting Randomized Algorithm
  • Advent of Code 2024 Week 2 Reflection
  • Advent of Code 2024 Week 1 Reflection