The central limit theorem (CLT) posits that means are approximately normally distributed for large sample sizes. This is crucial for many statistical applications in many fields of research that deal with randomness. Any measure that we are interested in that has inherent noise (blood measures, neurophysiological activity, social behavior) can be analyzed by knowing how a mean of a sample of such measures is distributed.
So how does the CLT look like? It requires multiple independently and identically distributed (iid.) measures \(X_i\) with \(i\in 1..n\). The iid. assumption relates to a situation in which the observations \(X_i\) come from the same randomness mechanism (e.g., many participants receiving the same medication, it is not the case that one gets a higher dosage than another unless that is the random process we are interested in) and are independent draws from this mechanism. Then, if we were to repeat this experiment — draw \(n\) observations and compute the mean — over and over again, the computed means will be approximately normally distributed.
Theorem 1. (Central Limit Theorem)
Let \(X_1\), \(X_2\), …, \(X_n\) iid. as \(X\). Assume that all moments are finite and, in particular, assume without loss of generality that \(E[X] = 0\) and \(Var[X] = 1\). Let \(Z_n = \frac{1}{\sqrt{n}}\sum\limits_{i=1}^n X_i\) (we multiply by \(\frac{\sqrt{n}}{n} = \frac{1}{\sqrt{n}}\) to make \(Z_n\) have variance 1). Then
\[\begin{align*} \lim_{n\rightarrow \infty} Z_n \sim \mathcal{N}(0,1) \text{.} \end{align*}\]
Proof. We will first look at a short overview of the proof. In this, we will use technicalities about characteristic functions, which I explain later. If you are a mathematician, this structure may appear usual to you because most of the time, people write all the required lemmas before the important theorem. But here, I focus on the core proof first and only the add the details later.
\(Z_n\) is a sum of scaled random variables \(\frac{X_i}{\sqrt{n}}\). This sum has a characteristic function that is the product of individual characteristic functions (see Lemma 2):
\[\begin{align*} \psi_{Z_n}(t) &= \prod\limits_{i=1}^n \psi_{\frac{X_i}{\sqrt{n}}}(t) \\ &= [E[e^{itX/\sqrt{n}}]]^n \\ &= \left[E\left[ \frac{(itX/\sqrt{n})^0}{0!} + \frac{(itX/\sqrt{n})^1}{1!} + \frac{(itX/\sqrt{n})^2}{2!} + \frac{(itX/\sqrt{n})^3}{3!} + \dots \right]\right]^n \\ &= \left[ 1 + \frac{(itE[X])}{\sqrt{n}} - \left(\frac{t^2E[X^2]}{2n}\right) + O\left(\frac{1}{n^{3/2}}\right) + \dots \right]^n \\ &= \left[ 1 - \left(\frac{t^2}{2n}\right) + O\left(\frac{1}{n^{3/2}}\right) \right]^n \\ \end{align*}\]
We consider the limit of this characteristic function as \(n \rightarrow \infty\) .
\[\begin{align*} \lim_{n\rightarrow \infty} \psi_{Z_n}(t) &= \lim_{n\rightarrow \infty} \left[ 1 - \left(\frac{t^2}{2n}\right) + O\left(\frac{1}{n^{3/2}}\right) \right]^n \\ &= \lim_{n\rightarrow \infty} \left[ \left(1 + \frac{(-t^2/2)}{n}\right)^n + \sum_{k=1}^n \binom{n}{k} \left(1+\frac{1}{n}\right)^{n-k} O\left(\frac{1}{n^{3/2}}\right)^k \right] \\ &= e^{-\frac{t^2}{2}} + \lim_{n\rightarrow \infty} \left[ \sum_{k=1}^n \binom{n}{k} \left(1+\frac{1}{n}\right)^{n-k} O\left(\frac{1}{n^{3/2}}\right)^k \right] \end{align*}\]
The first part produces \(e^{-t^2/2}\) and the second part vanishes as \(n\) approaches infinity because
\[\begin{align*} \left\lvert \lim_{n\rightarrow \infty} \left[ \sum_{k=1}^n \binom{n}{k} \left(1+\frac{1}{n}\right)^{n-k} O\left(\frac{1}{n^{3/2}}\right)^k \right] \right\lvert & \leq \left\lvert \lim_{n\rightarrow \infty} \left[ \sum_{k=1}^n n^k \cdot e \cdot O\left(\frac{1}{n^{k 3/2}}\right) \right] \right\lvert \\ & = \left\lvert \lim_{n\rightarrow \infty} \left[ O\left(\frac{1}{n^{1/2}}\right) \right] \right\lvert \\ & = 0 \\ \end{align*}\]
Then, this is just the characteristic function of the standard normal distribution (Lemma 3). Since characteristic functions uniquely determine the probability distribution (Lemma 4), \(Z_n\) approaches the normal distribution as \(n\rightarrow \infty\).
\[\begin{align*} \lim_{n\rightarrow \infty} \psi_{Z_n}(t) = e^{-\frac{t^2}{2}} \end{align*}\]
Lemma 2 (Sum of random variables, product of characteristic functions).
The sum of two random variables \(Z = X + Y\) has the characteristic function of the product of the two variable’s characteristic functions, \(\psi_Z(t) = \psi_X(t)\psi_Y(t)\).
Proof \[\begin{align*} \psi_Z(t) &= E\left[e^{itZ}\right] \\ &= E\left[e^{it(X+Y)}\right] \\ &= E\left[e^{itX}e^{itY}\right] \\ &= E\left[e^{itX}\right]E\left[e^{itY}\right] \\ &= \psi_X(t)\psi_Y(t) \end{align*}\]
Lemma 3 (Characteristic functions of the standard normal distribution)
A random variable \(X\) with the standard normal probability distribution has the characteristic function \(\psi_X(t) = e^{-t/2}\).
Proof \[\begin{align*} \psi_X(t) &= E\left[e^{itX}\right] \\ &= \int\limits_{-\infty}^{\infty} e^{itx} f(x) dx \\ &= \int\limits_{-\infty}^{\infty} e^{itx} \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}x^2} dx \\ &= \frac{1}{\sqrt{2\pi}} \int\limits_{-\infty}^{\infty} e^{-\frac{1}{2}(x^2 - 2itx)} dx \\ &= \frac{1}{\sqrt{2\pi}} \int\limits_{-\infty}^{\infty} e^{-\frac{1}{2}(x^2 - 2itx + (it)^2 - (it)^2)} dx \\ &= \frac{1}{\sqrt{2\pi}} \int\limits_{-\infty}^{\infty} e^{-\frac{1}{2}(x - it)^2} e^{-\frac{t^2}{2}} dx \\ &= e^{-\frac{t^2}{2}} \int\limits_{-\infty}^{\infty} f(x-it) dx \\ \end{align*}\]
The latter integral is equal to 1 because it is basically the normal probability distribution. The imaginary offset does not change the result of the integral. We can substitute \(y = x-it\), \(\frac{dy}{dx} = 1\) so that \(\int_{-\infty}^{\infty} f(x-it) dx = \int_{-\infty}^{\infty} f(y) dy = 1\). Then we get the characteristic function as desired.
\[\begin{align*} \psi_X(t) = e^{-\frac{t^2}{2}} \end{align*}\]
Lemma 4 (Probability distributions are uniquely determined by their characteristic function)
Let \(\psi_X(t)\) be the characteristic function of X, then the probability mass of \(X\) in the interval \([a, b]\) is recovered by
\[\begin{align*} \lim_{T\rightarrow\infty} \frac{1}{2\pi} \int\limits_{-T}^{T} \frac{e^{-ita}-e^{-itb}}{it} \psi_X(t) dt = P(a < X < b) + \frac{P(X=a) + P(X=b)}{2} \text{.} \end{align*}\]
Proof. The two exponential terms in the integral can be rewritten as
\[\begin{align*} \int\limits_{-T}^T \frac{e^{itc}}{2it} dt &= \int_0^T \frac{\cos(tc) - i \cdot \sin(tc)}{2it} dt + \int_{-T}^0 \frac{\cos(tc) - i \cdot \sin(tc)}{2it} dt \\ &= \int_0^T \frac{\cos(tc) + i \cdot \sin(tc)}{2it} dt + \int_{0}^T \frac{\cos(tc) + i \cdot (-1)\sin(tc)}{2i(-t)} dt \\ &= \int_0^T \frac{\cos(tc)}{2it} - \frac{\cos(tc)}{2it} + \frac{\sin(tc)}{2it} + \frac{i \cdot (-1)\sin(tc)}{2i(-t)} dt \\ &= \int_0^T \frac{\sin(tc)}{t} dt \text{.} \end{align*}\]
Coming back to the equation we want to prove here, we replace the characteristic function by its definition and obtain:
\[\begin{align*} \lim_{T\rightarrow\infty} \frac{1}{2\pi} \int\limits_{-T}^{T} \frac{e^{-ita}-e^{-itb}}{it} \psi_X(t) dt &= \lim_{T\rightarrow\infty} \frac{1}{2\pi} \int\limits_{-T}^{T} \frac{e^{-ita}-e^{-itb}}{it} \int\limits_{-\infty}^{\infty} e^{itx}f(x) dx dt \\ &= \int\limits_{-\infty}^{\infty} \lim_{T\rightarrow\infty} \frac{1}{2\pi} \int\limits_{-T}^{T} \frac{e^{it(x-a)}-e^{it(x-b)}}{it} f(x) dt dx \\ &= \int\limits_{-\infty}^{\infty} \frac{1}{\pi} \lim_{T\rightarrow\infty} \int\limits_{0}^{T} \frac{\sin(t(x-a))}{t} - \frac{\sin(t(x-b))}{t} dt f(x) dx \\ \end{align*}\]
We will use the limit of the inner integral (Lemma 5):
\[\begin{align*} \lim_{T\rightarrow\infty} \frac{e^{itc}}{2it} = \lim_{T\rightarrow\infty} \int_0^T \frac{\sin(tc)}{t} dt = { \begin{cases} -\frac{\pi}{2}, & \text{for } c < 0 \\ +\frac{\pi}{2}, & \text{for } c > 0 \\ 0, & \text{for } c = 0 \end{cases} } \end{align*}\]
The inner integral is thus \(1/2\) at \(x = a\) and \(x = b\) (\(\pi\) cancels out), \(1\) (\(a<x<b\)), and \(0\) (otherwise). With that, the probability mass in the desired interval is retrieved and, therefore, characteristic functions and probability functions have a one-to-one relationship.
Lemma 5 (Limit of the intregral of \(\sin(tc)/t\))
\[\begin{align*} \lim_{T\rightarrow\infty} \int_0^T \frac{\sin(tc)}{t} dt = { \begin{cases} -\frac{\pi}{2}, & \text{for } c < 0 \\ +\frac{\pi}{2}, & \text{for } c > 0 \\ 0, & \text{for } c = 0 \end{cases} } \end{align*}\]
Proof. For this proof, we need the Laplace transform \(\mathcal{L}_f(s) = \int_{s}^\infty e^{-tx} f(t) dt\). The Laplace transform has a special property for functions of the form \(f(t)/t\) (with the idea that \(f(t) = \sin(t)\) later).
\[\begin{align*} \mathcal{L}\left\{ \frac{f(t)}{t} \right\}(s) &= \int\limits_{0}^{\infty} \frac{f(t)}{t} e^{-st} dt \\ &= \int\limits_{0}^{\infty} \frac{f(t)}{t} \left( e^{- \lim_{r \rightarrow s} rt} - e^{- \lim_{r \rightarrow \infty} rt} \right) dt \\ &= \int\limits_{0}^{\infty} \frac{f(t)}{t} \left( e^{-rt} \vert_{r=\infty}^{r=s} \right) dt \\ &= \int\limits_{0}^{\infty} f(t) \left( \int\limits_{\infty}^{s} - e^{-rt} dr \right) dt \\ &= \int\limits_{0}^{\infty} f(t) \left( \int\limits_{s}^{\infty} e^{-rt} dr \right) dt \\ &= \int\limits_{s}^{\infty} \int\limits_{0}^{\infty} f(t) e^{-rt} dt dr \\ &= \int\limits_{s}^{\infty} \mathcal{L}\left\{f(t)\right\}(r) dr \end{align*}\]
With this, we can proof the Lemma. We first need to get rid fo the constant \(c\) in \(\sin(ct)\) function with a change of variables: We substitute \(u=tc\), \(t = u/c\) and \(du = c\cdot dt\). Since we consider the limit to infinity, this constant simply drops if it is positive which we assume for now \((c>0)\).
\[\begin{align*} \lim_{T\rightarrow\infty} \int_0^T \frac{\sin(tc)}{t} dt &= \lim_{T\rightarrow\infty} \int_0^{T/c} \frac{\sin(u)}{u} \frac{c}{c} dt \\ &= \int_0^{\infty} \frac{\sin(u)}{u} du \\ \end{align*}\]
For negative constants, \(c<0\), the integral goes from \(-\infty\) to \(0\) and the proof can be written analogously with an additional minus in the end. For \(c=0\), the integral is simply \(0\)
Now we add the term \(e^{-st}\) with the limit of \(s\rightarrow \infty\). This term therefore vanishes but we can use it to write the function in the integral as a Laplace transform. For this Laplace transform, we make use of the special form of \(\sin(t)/t\) as introduced initially.
\[\begin{align*} \lim_{T\rightarrow\infty} \int_0^T \frac{\sin(tc)}{t} dt &= \int_0^{\infty} \frac{\sin(u)}{u} du \\ &= \lim_{s\rightarrow 0} \int_0^{\infty} \frac{\sin(u)}{u} e^{-st} du \\ &= \lim_{s\rightarrow 0} \int_0^{\infty} \mathcal{L}\left\{ \frac{\sin(u)}{u} \right\}(s) du \\ &= \lim_{s\rightarrow 0} \int_s^{\infty} \mathcal{L}\left\{ \sin(u) \right\}(r) dr \\ \end{align*}\]
We solve this Laplace transform (Lemma 6) and then the integral (Lemma 7) to arrive at the desired statement.
\[\begin{align*} \lim_{T\rightarrow\infty} \int_0^T \frac{\sin(tc)}{t} dt &= \lim_{s\rightarrow 0} \int_0^{\infty} \mathcal{L}\left\{ \frac{\sin(u)}{u} \right\}(s) du \\ &= \lim_{s\rightarrow 0} \int_s^{\infty} \frac{1}{1+r^2} dr \\ &= \tan^{-1}(r)\vert_{r=0}^{r=\infty} \\ &= \frac{\pi}{2} - 0 \\ &= \frac{\pi}{2} \\ \end{align*}\]
Lemma 6 (Laplace transform of of \(\sin(t)\)).
\[\begin{align*} \mathcal{L}\{\sin(t)\}(s) = \frac{1}{1+s^2} \end{align*}\]
Proof. First, we need the Laplace transform of \(e^{at}\). \[\begin{align*} \mathcal{L}\{e^{at}\}(s) &= \int\limits_{0}^{\infty} e^{-st} e^{at} dt \\ &= \int\limits_{0}^{\infty} e^{(a-s)t} dt \\ &= \frac{1}{a-s} e^{(a-s)t} \vert_{t=0}^{t=\infty} \\ &= \frac{1}{a-s} e^{(a-s)\infty} - \frac{1}{a-s} e^{(a-s)\cdot0} \\ &\overbrace{=}^{s>a} 0 - \frac{1}{a-s} \\ &= \frac{1}{s-a} \\ \end{align*}\]
Now we can solve the Laplace transform of \(\sin(t)\) using the equality \((e^{it} - e^{-it})/(2it) = \left(\cos(t) + i\sin(t) - \cos(-t) - i\sin(-t)\right)/(2i) = 2i\sin(t)/(2i) = \sin(t)\).
\[\begin{align*} \mathcal{L}\{\sin(t)\}(s) &= \mathcal{L}\bigg\{ \frac{e^{it} - e^{-it}}{2i} \bigg\}(s) \\ &= \frac{1}{2i} \mathcal{L}\{ e^{it} - e^{-it} \} \\ &= \frac{1}{2i} \left( \frac{1}{s-i} - \frac{1}{s-(-i)} \right) \\ &= \frac{1}{2i} \left( \frac{(s+i)}{(s+i)(s-i)} - \frac{s-i}{(s+i)(s-i)} \right) \\ &= \frac{1}{2i} \left( \frac{2i}{s^2 - i^2} \right) \\ &= \frac{1}{s^2 + 1} \\ \end{align*}\]
Lemma 7 (Derivative of the inverse tangens).
\[\begin{align*} \int \frac{1}{1+s^2} ds = \tan^{-1}(s) \end{align*}\]
Proof. The derivative of the tangens is
\[\begin{align*} \frac{d}{ds} \tan(s) &= \frac{d}{ds} \frac{\sin(s)}{\cos(s)} \\ &= \frac{\sin'(s)\cos(s) - \sin(s)\cos'(s)}{\cos^2(s)} \\ &= \frac{\cos^2(s) + \sin^2(s)}{\cos^2(s)} \\ &= 1 + \tan^2(s) \text{,} \end{align*}\]
and the derivative of the inverse therefore is
\[\begin{align*} \frac{d}{ds}\tan^{-1}(s) &= \frac{1}{\tan'(\tan^{-1}(s))} \\ &= \frac{1}{1 + \tan^2(\tan^{-1}(s))} \\ &= \frac{1}{1 + s^2} \text{.} \end{align*}\]
I couldn’t find a self-contained proof of the central limit theorem. So I made one by myself in hopes it helps someone. I took parts from various sources, mostly from (https://sas.uwaterloo.ca/~dlmcleis/s901/chapt6.pdf but also from https://www.youtube.com/c/papaflammy/).