This tutorial provides definitions and examples of a statistic. Further, it explains the relationship between a statistic and sampling distribution.
The Definition of a Statistic
A statistic is a function of observable random variables, T=t(X1, X2, …, Xn), which does not depend on any unknown parameters.
script t is the function that we apply to X1, X2, …, Xn to define the statistic, which is denoted by capital T.
The intent of the use of a statistic T is to make inferences about the distribution of the set of random variables. Thus, if the variables are not observable or if the function t(X1, X2, …, Xn) depends on unknown parameters, T would not be useful in making such inferences.
Example 1 of a statistic
Note that, the set of observable random variables need not be a random sample. For instance, 13 planes were in service, and the first 10 air conditioner failer times were as follows.
23,50,50,55,74,90,97,102,130,194
For this case, \( T = \sum_{i=1}^{10} y_i+2y_{10} = 1447\) is a statistic. For sure, X1, X2, …, and Xn can be random samples, but they need not be random samples.
Example 2 of a statistic
Let X1, X2, …, and Xn represent a random sample from a population. The sample mean \( \bar{X} \) is a statistic with the function of t(X1, X2, …, Xn) = (X1, X2, …, Xn)/n. Often, the statistic of the sample mean \( \bar{X} \) is written as follows.
\( \bar{X} = \sum_{i=1}^{n} \frac{X_i}{n} \)
Note that, in the function above, \( \bar{X} \) uses the capital case of \( X \). In contrast, when a random sample is observed, the value of \( \bar{X} \) computed from the data is denoted by lowercase \( \bar{x} \).
The sample mean \( \bar{x} \) is useful because it can estimate the population mean and population variance. In particular, if X1, X2, …, Xn represent a random sample from a population f(x) with \( E(X) =\mu \) and \(Var(X) =\sigma^2 \). Then, we can get:
\( E(\bar{X}) = \mu\)
\( Var(\bar{X}) = \frac{\sigma^2}{n} \)
A statistic and sampling distribution
A statistic is also a random variable. The distribution of a statistic is referred to as a derived distribution or sampling distribution, in contrast to the population distribution.
Many important statistics can be expressed as a linear combination of independent normal random variables. For instance, if X1, X2, …, Xn denotes a random sample from \( N(\mu, \sigma^2)\), then, we can get the sampling distribution of \( \bar{X} \) as follows.
\( \bar{X} \sim N(\mu, \sigma^2/n)\)