Question: You’re in your town’s head to head marble racing championship, the traditional way to determine who is the town’s next mayor. The race is split into two heats and your time in either heat is a random, normally distributed variable. If you have the fastest time in the first run, what is the probability $P_\text{win it all}$ that you end up winning the event, as determined by the sum of your times on heat run?

Extra credit: what if there are $29$ other candidates in the race?

Extra extra credit What about $N$?

(FiveThirtyEight)

Solution (originally published 2021-01-24)

If the first skier wins the first round by $\Delta T_1,$ they will prevail so long as they don’t lose the second round by more than $\Delta T_1.$

The gap between the first round times $\left(\Delta T_1 = t^\text{A}_1 - t^\text{B}_1\right)$ which has some symmetric distribution $P(\Delta T_1).$ Since we know that person $\text{A}$ won the first round, we condition on the left side of $P$ and the expected value of $\Delta T_1$ is whatever the $25^\text{th}$ percentile of the distribution is.

If person $\text{A}$ is to win overall, $\Delta T_1$ has to be less than or equal to $-\Delta T_2.$ The gap in the second round is a random variable from the same distribution, so the chance that $\Delta T_1 < -\Delta T_2$ (and, therefore, that player $\text{A}$ wins) is $P_2 = 1 - 0.25 = 0.75.$

Pushing on

There are several ways to manifest the result above through calculation but none of the them yielded to generalization. An approximate attempt yielded good results for low $N$ but broke down as $N$ grew, notably resisting the stable plateauing that persists near $30\%$ for a wide range of $N.$

A simulation suggests a roughly linear decrease on log-log axes, and yields $P_{30} \approx 0.314409.$ Its overall behavior is decently approximated by $P \approx x^{-1/3}$ over a wide range:

Fig: plot of $\log P(\text{first round winner wins})$ vs $\log N.$

def round(N):
    data = [ np.random.normal() for _ in range(N) ]
    first_win = data.index(min(data))
    for i in range(N):
        data[i] += np.random.normal()
    overall_win = data.index(min(data))
    if first_win == overall_win:
        return 1
    else:
        return 0
  
domain = range(2, 50, 2)
datapoints = [np.mean([round(N) for _ in range(100000)])  for N in domain]

My attempts at an approximate solution for $N=30$ got the right shape, but took too long to “turn up”. Updates forthcoming if I make progress.

2025-11-18 Update

The formal solution I was trying to approximate (to no avail) almost five years ago was the following:

If the first and second heat times for racer $j$ are $f_j$ and $s_j$, and the winner goes first, then

  • each $f_j$ can range from $f_1$ up to $\infty$ (all other racers are slower than the winner in heat one)
  • each $s_j$ can range from $f_1 + s_1 - f_j$ up to $\infty$ (because anything faster would cause racer $j$ to win the race)
  • $f_1$ and $s_1$ can range from $-\infty$ to $\infty$ (because the winner sets the constraints for the other racers)

The probability that racer $j$ meets these conditions is

\[J(f_1,s_j) = \int\limits_{f_1}^\infty \text{d}f_j\, \mathcal{N}(f_j) \int\limits_{f_1 + s_1 - f_j}^\infty \text{d}s_j\, \mathcal{N}(s_j).\]

The constraint on the losers is set by the times of the winner, Racer $1$, so they’re independent of each other and the probability that all $(N-1)$ satisfy the constraint is just

\[J(f_1,s_1)^{N-1}.\]

The probability that the racer who wins the first heat wins the whole race is then just

\[N \int\limits_{-\infty}^\infty \text{d}f_1\mathcal{N}(f_1) \int\limits_{-\infty}^\infty\text{d}s_1 \mathcal{N}(s_1)J(f_1,s_1)^{N-1}.\]

We can evaluate this numerically for e.g. $N=30$ and we get $0.31471089\ldots$, which closely matches simulation.

However, it is not easy to get a formula for general $N$ because the integral $J$ depends strongly on the values of $f_1$ and $s_1$. Depending on how they’re set it could be very likely, a tossup, or very unlikely for a given Racer $j$ to lose to Racer $1$. Without definite values, we can’t evaluate the integral or systematically approximate it.

The crucial insight is that we actually don’t have to…

Instead, we can calculate the expected value of the probability that the winner of the first heat wins the whole race. The strategy will be to find the probability that the winner of the first heat wins in terms of their first heat time, and then average this using the probability distribution of the winning first heat time.

\[P(\text{heat 1 winner wins}) = \int\limits_{-\infty}^\infty \text{d}x\, P(\text{first heat winner wins}|x)P(\text{first heat time is }x).\]

We know two useful things about the winner

  • they won the first heat
  • their first heat time plus their second heat time is less than the expected minimum total time of the $(N-1)$ other racers.

To win the first heat, they need the probability of beating their first heat time $f_1$ to be less than $1/N.$

The probability that a normal variable is less than a target $v$ is

\[P(z < v) = \int\limits_{-\infty}^v \text{d}z\, \mathcal{N}(z),\]

which is approximated by

\[\mathcal{N}(v)/v.\]

The minimum should occur at the value $\gamma$ where the probability to fall short of $\gamma$ is less than $\frac1N.$ So, we can find the likely value of $f_1,$ $\gamma,$ by solving

\[\begin{align} \frac{\mathcal{N}(\gamma)}{\gamma} &= \frac1N \\ \frac{e^{-\gamma^2/2}}{\sqrt{2\pi}\gamma} &= \frac1N \\ \gamma^2/2 + \log \sqrt{2\pi}\gamma &= \log N. \end{align}\]

We could solve this by iterated approximation, but to first order we can keep the dominant quadratic $\gamma$ term and find $\gamma \approx \sqrt{2\log N}.$ It isn’t the most wonderful approximation, but improves as $N$ grows and is consistent with the other approximations we’ll make. The plot shows the distribution of minima for samples of $N$ unit normal variables in blue and the approximation for the mean in gold for $N$ ranging from $10$ to $10^6.$

$f_1$ will be peaked near $\gamma$, but vary around it. We can rewrite $f_1 = -\gamma + x$ where $x$ is assumed small compared to $\gamma.$ The total time taken is $f_1 + s_1.$ Because $f_1$ and $s_1$ are both normal variables with variance $1$, if we divide the sum by $\sqrt{2}$ it will be a normal variable as too: $(f_1+s_1)/\sqrt{2}.$

Racer $1$ wins if $f_1+s_1$ is less than the smallest time of the $(N-1)$ other racers. The total time for other racers is also a normal variable $t_j = (f_j+s_j)/\sqrt{2}.$ The expected minimum for this variable is found the same way we found it for $f_1,$ except with $(N-1)$ in place of $N.$ In practice, $(N-1)\approx N$ for large $N$ so the expected minimum for the other racers is $-\gamma$ too.

So, Racer $1$ will win if $(f_1+s_1)/\sqrt{2} < -\gamma$ or

\[\frac{-\gamma + x}{\sqrt{2}} + \frac{s_1}{\sqrt{2}} < -\gamma,\]

which is equivalent to

\[s_1 < -\overbrace{\left(\sqrt{2}-1\right)}^{\nu}\gamma + x.\]

So, for a given value of $x,$ the probability Racer $1$ wins is $\mathcal{N}(\nu \gamma + x)/(\nu\gamma + x).$ Working this out, we get

\[\begin{align} P(\text{Racer 1 wins}|x) &= \frac{1}{\sqrt{2\pi}}\frac{1}{\nu\gamma+x} e^{-(\nu\gamma+x)^2/2} \\ &= \frac{1}{\sqrt{2\pi}}\frac{1}{\nu\gamma+x} e^{-(\nu^2\gamma^2 + 2\nu \gamma x + x^2)/2} \\ &\approx \frac{1}{\sqrt{2\pi}}\frac{1}{\nu\gamma} e^{-(\nu^2\gamma^2 + 2\nu \gamma x)/2} \\ &= \frac{1}{\sqrt{2\pi}}\frac{1}{\nu\gamma} \left(e^{-\gamma^2/2}\right)^{\nu^2} e^{-\nu \gamma x} \\ &= \frac{1}{\sqrt{2\pi}}\frac{1}{\nu\gamma} \left(\frac{\sqrt{2\pi}\gamma}{N}\right)^{\nu^2} e^{-\nu x} \end{align}\]

We dropped the $x^2$ term in the exponent on the assumption that the probability distribution of $x$ will crush the integrand before the $x^2$ term has a big impact. This will not be true for small values of $N$ so we should expect our approximation to overestimate, and converge asymptotically from above.

Now, the probability that a single racer’s first heat time is over $-\gamma + x$ is one minus the probability that they’re under: $1 - \frac{\mathcal{N}(-\gamma + x)}{-\gamma + x}.$ And the probability that none of the $(N-1)$ people’s first heat time is faster is $\left[1 - \frac{\mathcal{N}(-\gamma + x)}{(-\gamma + x)}\right]^{N-1}.$

\[\begin{align} P(\text{first heat time < }-\gamma + x) &= \frac{1}{\sqrt{2\pi}}\frac{\mathcal{N}(-\gamma + x)}{(\gamma + x)} \\ &= \frac{1}{\sqrt{2\pi}}\frac{e^{-(-\gamma + x)^2/2}}{(\gamma + x)} \\ &= \frac{1}{\sqrt{2\pi}}\frac{e^{-(\gamma^2 -2x\gamma + x^2)/2}}{(\gamma + x)} \\ &\approx \frac{1}{\sqrt{2\pi}}\frac{e^{-(\gamma^2 -2x\gamma)/2}}{\gamma} \\ &= \frac{\mathcal{N}(\gamma)}{\gamma}e^{\gamma x} \\ &= e^{\gamma x}/N \end{align}\]

The probability that a racer’s first heat time is greater than $(-\gamma + x)$ is then $(1-e^{\gamma x}/N)$ and the probability that the minimum first heat time is greater than $(-\gamma+x)$ is $\approx(1-e^{\gamma x}/N)^N = e^{-e^{\gamma x}}.$ So, the cumulative probability that the minimum is less than $(-\gamma+x)$ is $1 - e^{-e^{\gamma x}}$ and the probability distribution that the minimum first heat time is $x$ exactly is then just

\[P(\text{first heat min} = x) = \frac{\text{d}}{\text{d}x} 1 - e^{-e^{\gamma x}} = \gamma e^{\gamma x} e^{-e^{\gamma x}}.\]

Putting it all together, the probability that the winner of the first heat wins the entire race is

\[\begin{align} P(\text{heat 1 winner wins}) &= \int\limits_{-\infty}^\infty \text{d}x\, P(\text{first heat winner wins}|x)P(\text{first heat time is }x) \\ &\approx\frac{1}{\sqrt{2\pi}}\frac{1}{\nu\gamma} \left(\frac{\sqrt{2\pi}\gamma}{N}\right)^{\nu^2} \int\limits_{-\infty}^{\infty} \text{d}x\, e^{-\nu \gamma x} \gamma e^{\gamma x} e^{-e^{\gamma x}}. \end{align}\]

The integral has no actual $\gamma$ dependence which we can see by setting $z = e^{\gamma x}$ so that $\text{d}z = \gamma e^{\gamma x} \text{d}x = \gamma z \text{d}x$ so that the probability becomes

\[\begin{align} P(\text{heat 1 winner wins}) &\approx \frac{1}{\sqrt{2\pi}}\frac{1}{\nu\gamma} \left(\frac{\sqrt{2\pi}\gamma}{N}\right)^{\nu^2} \int\limits_{-\infty}^{\infty} \text{d}z\, \frac{1}{\gamma z} \gamma z e^{-z} z^{-\nu} \\ &\approx \frac{1}{\sqrt{2\pi}}\frac{1}{\nu\gamma} \left(\frac{\sqrt{2\pi}\gamma}{N}\right)^{\nu^2} \int\limits_{-\infty}^{\infty} \text{d}z\, e^{-z} z^{-\nu}. \end{align}\]

The integral will get us an overall numerical factor, but already we can see how the problem will scale with $N.$ Pulling out the $N$ dependent terms from the prefactor, we have

\[\begin{align} P(\text{heat 1 winner wins}) &\propto \gamma^{\nu^2-1}N^{-\nu^2} \\ & \propto (\log N)^{1-\sqrt{2}} N^{-(3-2\sqrt{2})}. \end{align}\]

This integral is just the gamma function $\Gamma(1-\nu)$ making the final result

\[\begin{align} P(\text{heat 1 winner wins}) &\approx \frac{1}{\nu}(4\pi)^\frac{\nu^1-1}{2}\Gamma(1-\nu)\left(\log N\right)^{\frac{\nu^2-1}{2}}N^{-\nu^2} \\ &\approx (1+\sqrt{2})(4\pi)^{1-\sqrt{2}}\Gamma[2-\sqrt{2}]\left(\log N\right)^{1-\sqrt{2}} N^{-(3-2\sqrt{2})}. \end{align}\]

As the number of racers increases the advantage of the first heat winner grows, far exceeding $1/N.$

To get more accurate predictions for small $N,$ we’d need to perturbatively develop the $e^{-x^2/2\gamma^2}$ that we dropped for the approximation, and use higher iteration estimates for $\gamma.$ The perturbative terms shrink with powers of $1/\log N$, so convergence is slow, but we can already see the asymptotic result (gold curve) and the simulation (blue points) converging.