ci — Confidence intervals for means, proportions, and variances

Title stata.com

ci — Conﬁdence intervals for means, proportions, and variances

Description Quick start Menu Syntax

Options Remarks and examples Stored results Methods and formulas

Acknowledgment References Also see

Description

ci computes conﬁdence intervals for population means, proportions, variances, and standard

deviations.

cii is the immediate form of ci; see [U] 19 Immediate commands for a general discussion of

immediate commands.

Quick start

Conﬁdence intervals for means of normally distributed variables v1, v2, and v3

ci means v1-v3

Conﬁdence interval for mean of Poisson-distributed variable v4

ci means v4, poisson

Conﬁdence interval for rate of v4 with total exposure recorded in v5

ci means v4, poisson exposure(v5)

Conﬁdence interval for proportion of binary variable v6

ci proportions v6

Conﬁdence intervals for variances of v1, v2, and v3

ci variances v1-v3

Same as above, but Bonett conﬁdence intervals are produced

ci variances v1-v3, bonett

90% Bonett conﬁdence intervals for standard deviations of v1, v2, and v3

ci variances v1-v3, sd bonett level(90)

Conﬁdence interval for a mean based on a sample with 85 observations, a sample mean of 10, and a

standard deviation of 3

cii means 85 10 3

90% conﬁdence interval for rate from a sample with 4,379 deaths over 11,394 person-years

cii means 11394 4379, poisson level(90)

Agresti–Coull conﬁdence interval for proportion based on a sample with 2,377 observations and 136

successes

cii proportions 2377 136, agresti

2 ci — Conﬁdence intervals for means, proportions, and variances

Bonett conﬁdence interval for variance based on a sample with 20 observations, sample variance of 9,

and estimated kurtosis of 1.8

cii variances 20 9 1.8, bonett

Same as above, but with conﬁdence interval for standard deviation

cii variances 20 3 1.8, sd bonett

Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Conﬁdence intervals

cii for a normal mean

Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Normal mean CI calculator

cii for a Poisson mean

Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Poisson mean CI calculator

cii for a proportion

Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Proportion CI calculator

cii for a variance

Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Variance CI calculator

cii for a standard deviation

Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Standard deviation CI calculator

ci — Conﬁdence intervals for means, proportions, and variances 3

Syntax

Conﬁdence intervals for means, normal distribution

ci mean s



varlist



weight



, options



cii mean s #

obs

mean



, level(#)



Conﬁdence intervals for means, Poisson distribution

ci mean s



varlist



weight



, poisson



exposure(varname) options



cii means #

exposure

events

, poisson



level(#)



Conﬁdence intervals for proportions

ci prop ortions



varlist



weight



, prop options options



cii prop ortions #

obs

succ



, prop options level(#)



Conﬁdence intervals for variances

ci var iances



varlist



weight



, bonett options



cii var iances #

obs

variance



, level(#)



cii var iances #

obs

variance

kurtosis

, bonett



level(#)



Conﬁdence intervals for standard deviations

ci var iances



varlist



weight



, sd



bonett options



cii var iances #

obs

, sd



level(#)



cii var iances #

obs

kurtosis

, sd bonett



level(#)



obs

must be a positive integer. #

exposure

, #

, and #

variance

must be a positive number. #

succ

and

events

must be a nonnegative integer or between 0 and 1. If the number is between 0 and 1,

Stata interprets it as the fraction of successes or events and converts it to an integer number

representing the number of successes or events. The computation then proceeds as if two integers

had been speciﬁed. If option bonett is speciﬁed, you must additionally specify #

kurtosis

with cii

variances.

4 ci — Conﬁdence intervals for means, proportions, and variances

prop options Description

exact calculate exact conﬁdence intervals; the default

wald calculate Wald conﬁdence intervals

wilson calculate Wilson conﬁdence intervals

agresti calculate Agresti–Coull conﬁdence intervals

jeffreys calculate Jeffreys conﬁdence intervals

options Description

level(#) set conﬁdence level; default is level(95)

separator(#) draw separator line after every # variables; default is separator(5)

total add output for all groups combined (for use with by only)

by, collect, and statsby are allowed with ci, and collect is allowed with cii; see [U] 11.1.10 Preﬁx commands.

aweights are allowed with ci means for normal data, and fweights are allowed with all ci subcommands; see

[U] 11.1.6 weight.

Options

Options are presented under the following headings:

Options for ci and cii means

Options for ci and cii proportions

Options for ci and cii variances

Options for ci and cii means



 

Main



poisson speciﬁes that the variables (or numbers for cii) are Poisson-distributed counts; exact Poisson

conﬁdence intervals will be calculated. By default, conﬁdence intervals for means are calculated

based on a normal distribution.

exposure(varname) is used only with poisson. You do not need to specify poisson if you specify

exposure(); poisson is assumed. varname contains the total exposure (typically a time or an

area) during which the number of events recorded in varlist was observed.

level(#) speciﬁes the conﬁdence level, as a percentage, for conﬁdence intervals. The default is

level(95) or as set by set level; see [R] level.

separator(#) speciﬁes how often separation lines should be inserted into the output. The default is

separator(5), meaning that a line is drawn after every ﬁve variables. separator(10) would

draw the line after every 10 variables. separator(0) suppresses the separation line.

total is used with the by preﬁx. It requests that in addition to output for each by-group, output be

added for all groups combined.

ci — Conﬁdence intervals for means, proportions, and variances 5

Options for ci and cii proportions



 

Main



exact, wald, wilson, agresti, and jeffreys specify how binomial conﬁdence intervals are to be

calculated.

exact is the default and speciﬁes exact (also known in the literature as Clopper–Pearson [1934])

binomial conﬁdence intervals.

wald speciﬁes calculation of Wald conﬁdence intervals.

wilson speciﬁes calculation of Wilson conﬁdence intervals.

agresti speciﬁes calculation of Agresti–Coull conﬁdence intervals.

jeffreys speciﬁes calculation of Jeffreys conﬁdence intervals.

See Brown, Cai, and DasGupta (2001) for a discussion and comparison of the different binomial

conﬁdence intervals.

level(#) speciﬁes the conﬁdence level, as a percentage, for conﬁdence intervals. The default is

level(95) or as set by set level; see [R] level.

separator(#) speciﬁes how often separation lines should be inserted into the output. The default is

separator(5), meaning that a line is drawn after every ﬁve variables. separator(10) would

draw the line after every 10 variables. separator(0) suppresses the separation line.

total is used with the by preﬁx. It requests that in addition to output for each by-group, output be

added for all groups combined.

Options for ci and cii variances



 

Main



sd speciﬁes that conﬁdence intervals for standard deviations be calculated. The default is to compute

conﬁdence intervals for variances.

bonett speciﬁes that Bonett conﬁdence intervals be calculated. The default is to compute normal-based

conﬁdence intervals, which assume normality for the data.

level(#) speciﬁes the conﬁdence level, as a percentage, for conﬁdence intervals. The default is

level(95) or as set by set level; see [R] level.

separator(#) speciﬁes how often separation lines should be inserted into the output. The default is

separator(5), meaning that a line is drawn after every ﬁve variables. separator(10) would

draw the line after every 10 variables. separator(0) suppresses the separation line.

total is used with the by preﬁx. It requests that in addition to output for each by-group, output be

added for all groups combined.

Remarks and examples stata.com

Remarks are presented under the following headings:

Conﬁdence intervals for means

Normal-based conﬁdence intervals

Poisson conﬁdence intervals

Conﬁdence intervals for proportions

Conﬁdence intervals for variances

Immediate form

6 ci — Conﬁdence intervals for means, proportions, and variances

Conﬁdence intervals for means

ci means computes a conﬁdence interval for the population mean for each of the variables in

varlist.

Normal-based conﬁdence intervals

Example 1: Normal-based conﬁdence intervals

Without the poisson option, ci means produces normal-based conﬁdence intervals that are correct

if the variable is normally distributed and asymptotically correct for all other distributions satisfying

the conditions of the central limit theorem.

. use https://www.stata-press.com/data/r18/auto

(1978 automobile data)

. ci means mpg price

Variable Obs Mean Std. err. [95% conf. interval]

mpg 74 21.2973 .6725511 19.9569 22.63769

price 74 6165.257 342.8719 5481.914 6848.6

The standard error of the mean of mpg is 0.67, and the 95% conﬁdence interval is [ 19.96, 22.64 ].

We can obtain wider conﬁdence intervals, 99%, by typing

. ci means mpg price, level(99)

Variable Obs Mean Std. err. [99% conf. interval]

mpg 74 21.2973 .6725511 19.51849 23.07611

price 74 6165.257 342.8719 5258.405 7072.108

Example 2: The by preﬁx

The by preﬁx breaks out the conﬁdence intervals according to by-group; total adds an overall

summary. For instance,

. by foreign: ci means mpg, total

-> foreign = Domestic

Variable Obs Mean Std. err. [95% conf. interval]

mpg 52 19.82692 .657777 18.50638 21.14747

-> foreign = Foreign

Variable Obs Mean Std. err. [95% conf. interval]

mpg 22 24.77273 1.40951 21.84149 27.70396

-> Total

Variable Obs Mean Std. err. [95% conf. interval]

mpg 74 21.2973 .6725511 19.9569 22.63769

ci — Conﬁdence intervals for means, proportions, and variances 7

Example 3: Controlling the format

You can control the formatting of the numbers in the output by specifying a display format for

the variable; see [U] 12.5 Formats: Controlling how data are displayed. For instance,

. format mpg %9.2f

. ci means mpg

Variable Obs Mean Std. err. [95% conf. interval]

mpg 74 21.30 0.67 19.96 22.64

Poisson conﬁdence intervals

If you specify the poisson option, ci means assumes count data and computes exact Poisson

conﬁdence intervals.

Example 4: Poisson conﬁdence intervals

We have data on the number of bacterial colonies on a Petri dish. The dish has been divided into

36 small squares, and the number of colonies in each square has been counted. Each observation in

our dataset represents a square on the dish. The variable count records the number of colonies in

each square counted, which varies from 0 to 5.

. use https://www.stata-press.com/data/r18/petri, clear

. ci means count, poisson

Poisson exact

Variable Exposure Mean Std. err. [95% conf. interval]

count 36 2.333333 .2545875 1.861158 2.888825

ci reports that the average number of colonies per square is 2.33. If the expected number of colonies

per square were as low as 1.86, the probability of observing 2.33 or more colonies per square would

be 2.5%. If the expected number were as large as 2.89, the probability of observing 2.33 or fewer

colonies per square would be 2.5%.

Example 5: Option exposure()

The number of “observations”— how ﬁnely the Petri dish is divided—makes no difference. The

Poisson distribution is a function only of the count. In example 4, we observed a total of 2.33×36 = 84

colonies and a conﬁdence interval of [ 1.86 ×36, 2.89 ×36 ] = [ 67, 104 ]. We would obtain the same

[ 67, 104 ] conﬁdence interval if our dish were divided into, say, 49 squares rather than 36.

For the counts, it is not even important that all the squares be of the same size. For rates, however,

such differences do matter but in an easy-to-calculate way. Rates are obtained from counts by dividing

by exposure, which is typically a number multiplied by either time or an area. For our Petri dishes,

we divide by an area to obtain a rate, but if our example were cast in terms of being infected by a

disease, we might divide by person-years to obtain the rate. Rates are convenient because they are

easier to compare: we might have 2.3 colonies per square inch or 0.0005 infections per person-year.

So let’s assume that we wish to obtain the number of colonies per square inch and, moreover, that

not all the “squares” on our dish are of equal size. We have a variable called area that records the

area of each square:

8 ci — Conﬁdence intervals for means, proportions, and variances

. ci means count, exposure(area)

Poisson exact

Variable Exposure Mean Std. err. [95% conf. interval]

count 3 28 3.055051 22.3339 34.66591

The rates are now in more familiar terms. In our sample, there are 28 colonies per square inch, and

the 95% conﬁdence interval is [ 22.3, 34.7 ]. When we did not specify exposure(), ci means with

option poisson assumed that each observation contributed 1 to exposure.

Technical note

If there were no colonies on our dish, ci means with option poisson would calculate a one-sided

conﬁdence interval:

. use https://www.stata-press.com/data/r18/petrinone

. ci means count, poisson

Poisson exact

Variable Exposure Mean Std. err. [95% conf. interval]

count 36 0 0 0 .1024689*

(*) one-sided, 97.5% confidence interval

Conﬁdence intervals for proportions

The ci proportions command assumes binary (0/1) data and computes binomial conﬁdence

intervals.

Example 6: Exact binomial (Clopper–Pearson) conﬁdence interval

We have data on employees, including a variable marking whether the employee was promoted

last year.

. use https://www.stata-press.com/data/r18/promo

. ci proportions promoted

Binomial exact

Variable Obs Proportion Std. err. [95% conf. interval]

promoted 20 .1 .067082 .0123485 .3169827

The exact binomial, also known as the Clopper–Pearson (1934) interval, is computed by default.

Nominally, the interpretation of a 95% conﬁdence interval is that under repeated samples or

experiments, 95% of the resultant intervals would contain the unknown parameter in question.

However, for binomial data, the actual coverage probability, regardless of method, usually differs

from that interpretation. This result occurs because of the discreteness of the binomial distribution,

which produces only a ﬁnite set of outcomes, meaning that coverage probabilities are subject to

discrete jumps and that the exact nominal level cannot always be achieved. Therefore, the term “exact

conﬁdence interval” refers to its being derived from the binomial distribution, the distribution exactly

generating the data, rather than resulting in exactly the nominal coverage.

ci — Conﬁdence intervals for means, proportions, and variances 9

For the Clopper–Pearson interval, the actual coverage probability is guaranteed to be greater

than or equal to the nominal conﬁdence level, here 95%. Because of the way it is calculated—see

Methods and formulas—it may also be interpreted as follows: If the true probability of being promoted

were 0.012, the chances of observing a result as extreme or more extreme than the result observed

(20 × 0.1 = 2 or more promotions) would be 2.5%. If the true probability of being promoted were

0.317, the chances of observing a result as extreme or more extreme than the result observed (two

or fewer promotions) would be 2.5%.

Example 7: Other conﬁdence intervals

The Clopper–Pearson interval is desirable because it guarantees nominal coverage; however, by

dropping this restriction, you may obtain accurate intervals that are not as conservative. In this vein,

you might opt for the Wilson (1927) interval,

. ci proportions promoted, wilson

Wilson

Variable Obs Proportion Std. err. [95% conf. interval]

promoted 20 .1 .067082 .0278665 .3010336

the Agresti–Coull (1998) interval,

. ci proportions promoted, agresti

Agresti--Coull

Variable Obs Proportion Std. err. [95% conf. interval]

promoted 20 .1 .067082 .0156562 .3132439

or the Bayesian-derived Jeffreys interval (Brown, Cai, and DasGupta 2001),

. ci proportions promoted, jeffreys

Jeffreys

Variable Obs Proportion Std. err. [95% conf. interval]

promoted 20 .1 .067082 .0213725 .2838533

Picking the best interval is a matter of balancing accuracy (coverage) against precision (average

interval length) and depends on sample size and success probability. Brown, Cai, and DasGupta (2001)

recommend the Wilson or Jeffreys interval for small sample sizes (≤40) yet favor the Agresti–Coull

interval for its simplicity, decent performance for sample sizes less than or equal to 40, and performance

comparable to Wilson or Jeffreys for sample sizes greater than 40. They also deem the Clopper–

Pearson interval to be “wastefully conservative and [. . .] not a good choice for practical use”, unless

of course one requires, at a minimum, the nominal coverage level.

Finally, the binomial Wald conﬁdence interval is obtained by specifying the wald option. The

Wald interval is the one taught in most introductory statistics courses and, for the above, is simply,

for level 1 −α, Proportion±z

α/2

(Std. err.), where z

α/2

is the 1 −α/2 quantile of the standard

normal. Because its overall poor performance makes it impractical, the Wald interval is available

mainly for pedagogical purposes. The binomial Wald interval is also similar to the interval produced

by treating binary data as normal data and using ci means, with two exceptions. First, the calculation

of the standard error in ci proportions uses denominator n rather than n − 1, used for normal

data in ci means. Second, conﬁdence intervals for normal data are based on the t distribution rather

than the standard normal. Of course, both discrepancies vanish as sample size increases.

10 ci — Conﬁdence intervals for means, proportions, and variances

Technical note

Let’s repeat example 6, but this time with data in which there are no promotions over the observed

period:

. use https://www.stata-press.com/data/r18/promonone

. ci proportions promoted

Binomial exact

Variable Obs Proportion Std. err. [95% conf. interval]

promoted 20 0 0 0 .1684335*

(*) one-sided, 97.5% confidence interval

The conﬁdence interval is [ 0, 0.168 ], and this is the conﬁdence interval that most books publish. It

is not, however, a true 95% conﬁdence interval because the lower tail has vanished. As Stata notes,

it is a one-sided, 97.5% conﬁdence interval. If you wanted to put 5% in the right tail, you could type

ci proportions promoted, level(90).

Technical note

ci proportions ignores any variables that do not take on the values 0 and 1 exclusively. For

instance, with our automobile dataset,

. use https://www.stata-press.com/data/r18/auto

(1978 automobile data)

. ci proportions mpg foreign

Binomial exact

Variable Obs Proportion Std. err. [95% conf. interval]

foreign 74 .2972973 .0531331 .196584 .4148353

Note: The results are produced only for binary (0/1) variables.

We also requested the conﬁdence interval for mpg, but Stata ignored us. It does that so you can type

ci proportions and obtain correct conﬁdence intervals for all the variables that are 0/1 in your

data.

Conﬁdence intervals for variances

The ci variances command computes conﬁdence intervals for the variances or, if the sd option is

speciﬁed, for the standard deviations. The default is a normal-based conﬁdence interval that assumes

the data are normal and uses a χ

distribution to construct the conﬁdence intervals. If normality is

suspect, you may consider using the bonett option to compute Bonett (2006) conﬁdence intervals,

which are more robust to nonnormality.

Example 8: Normal-based conﬁdence intervals

So far, we have restricted our attention to conﬁdence intervals for means and proportions. Typically,

when people think of statistical inference, they usually have in mind inferences concerning population

means. However, the population parameter of interest will vary from one situation to another. In many

scenarios, the population variance is as important as the population mean. For example, in a quality

ci — Conﬁdence intervals for means, proportions, and variances 11

control study, a machine that ﬁlls 16-ounce canned peas is investigated at regular time intervals. A

random sample of n = 8 containers is selected every hour. Ideally, the amount of peas in a can should

vary only slightly about the 16-ounce value. If the variance was large, then a large proportion of

cans would be either underﬁlled, thus cheating the customer, or overﬁlled, thus resulting in economic

loss to the manufacturing company. Suppose that the weights of 16-ounce cans ﬁlled by the machine

are normally distributed. The acceptable variability in the weights is expected to be 0.09 with the

respective standard deviation of 0.3 ounces. To monitor the machine’s performance, we can compute

conﬁdence intervals for the variance of the weights of cans:

. use https://www.stata-press.com/data/r18/peas_normdist

(Weights of canned peas, normal distribution)

. ci variances weight

Variable Obs Variance [95% conf. interval]

weight 8 .3888409 .1699823 1.610708

The command reports the sample estimate of the variance of 0.39 with the 95% conﬁdence interval

of [ 0.17, 1.61 ].

Instead of the variance, we may be interested in conﬁdence intervals for the standard deviation.

We can specify the sd option to compute such conﬁdence intervals.

. ci variances weight, sd

Variable Obs Std. dev. [95% conf. interval]

weight 8 .6235711 .4122891 1.269137

The 95% conﬁdence interval for the standard deviation of the weights is [0.41, 1.27]. Because the

desired value for the standard deviation, 0.3 ounces, falls outside the interval, the machine may require

some tuning.

Conﬁdence intervals in example 8 are based on the assumption that the random sample is selected

from a population having a normal distribution. Nonnormality of the population distribution, in the

form of skewness or heavy tails, can have a drastic impact on the asymptotic coverage probability of

the normal-based conﬁdence intervals. This is the case even for distributions that are similar to normal.

Scheff

e (1959, 336) showed that the normal-based interval has an asymptotic coverage probability

of about 0.76, 0.63, 0.60, and 0.51 for the logistic, t with seven degrees of freedom, Laplace, and t

with ﬁve degrees of freedom distributions, respectively. Miller (1997, 264) describes this situation as

“catastrophic” because these distributions are symmetric and not easily distinguishable from a normal

distribution unless the sample size is large. Hence, it is judicious to evaluate the normality of the

data prior to constructing the normal-based conﬁdence intervals for variances or standard deviations.

Bonett (2006) proposed a conﬁdence interval that performs well in small samples under moderate

departures from normality. His interval performs only slightly worse than the exact normal-based

conﬁdence interval when sampling from a normal distribution. A larger sample size provides Bonett

conﬁdence intervals with greater protection against nonnormality.

Example 9: Bonett conﬁdence interval for normal data

We will repeat example 8 and construct a Bonett conﬁdence interval for the standard deviation by

specifying the bonett option. The results are similar, and both examples lead to the same inferential

conclusion.

12 ci — Conﬁdence intervals for means, proportions, and variances

. ci variances weight, sd bonett

Bonett

Variable Obs Std. dev. [95% conf. interval]

weight 8 .6235711 .3997041 1.288498

The Bonett conﬁdence interval is wider than the normal-based conﬁdence interval in example 8.

For normal data, Bonett (2006) suggested that if Bonett conﬁdence interval is used for a sample of

size n + 3, then its average width will be about the same as the average width of the normal-based

conﬁdence interval from a sample size of n. Sampling three more observations may be a small

price to pay because Bonett conﬁdence intervals perform substantially better than the normal-based

conﬁdence intervals for nonnormal data.

Example 10: Bonett conﬁdence interval for nonnormal data

The following data have been generated from a t distribution with ﬁve degrees of freedom to

illustrate the effect of wrongfully using the normal-based conﬁdence interval when the data-generating

process is not normal.

. use https://www.stata-press.com/data/r18/peas_tdist

(Weights of canned peas, t distribution)

. ci variances weight, sd

Variable Obs Std. dev. [95% conf. interval]

weight 8 2.226558 1.472143 4.531652

The standard deviation of a t distribution with ﬁve degrees of freedom is

5/3 ≈ 1.29 and falls

outside the conﬁdence interval limits. If we suspect that data may not be normal, the Bonett conﬁdence

interval is typically a better choice:

. ci variances weight, sd bonett

Bonett

Variable Obs Std. dev. [95% conf. interval]

weight 8 2.226558 1.137505 5.772519

The value 1.29 is within the limits of the Bonett conﬁdence interval [ 1.14, 5.77 ]

Immediate form

So far, we computed conﬁdence intervals for various parameters using data in memory. We can

also compute conﬁdence intervals using only data summaries, without any data in memory. Each of

the considered ci commands has an immediate cii version that computes the respective conﬁdence

intervals using data summaries.

Example 11: Conﬁdence interval for a normal mean

We are reading a soon-to-be-published paper by a colleague. In it is a table showing the number

of observations, mean, and standard deviation of the 1980 median family income for the Northeast

and West. We correctly think that the paper would be much improved if it included the conﬁdence

intervals. The paper claims that for 166 cities in the Northeast, the average of median family income

is $19,509 with a standard deviation of $4,379:

ci — Conﬁdence intervals for means, proportions, and variances 13

For the Northeast:

. cii means 166 19509 4379

Variable Obs Mean Std. err. [95% conf. interval]

166 19509 339.8763 18837.93 20180.07

For the West:

. cii means 256 22557 5003

Variable Obs Mean Std. err. [95% conf. interval]

256 22557 312.6875 21941.22 23172.78

Example 12: Conﬁdence interval for a Poisson mean

The number of reported trafﬁc accidents in Santa Monica over a 24-hour period is 27. We need

know nothing else to compute a conﬁdence interval for the mean number of accidents for a day:

. cii means 1 27, poisson

Poisson exact

Variable Exposure Mean Std. err. [95% conf. interval]

1 27 5.196152 17.79317 39.28358

Example 13: Conﬁdence interval for a proportion

We ﬂip a coin 10 times, and it comes up heads only once. We are shocked and decide to obtain

a 99% conﬁdence interval for this coin:

. cii proportions 10 1, level(99)

Binomial exact

Variable Obs Proportion Std. err. [99% conf. interval]

10 .1 .0948683 .0005011 .5442871

Example 14: Conﬁdence interval for a variance

A company ﬁlls 32-ounce tomato juice jars with a quantity of juice having a normal distribution

with a claimed variance not exceeding 0.2. A random sample of 15 jars is collected to evaluate this

claim. The sample variance is 0.5:

. cii variances 15 0.5

Variable Obs Variance [95% conf. interval]

15 .5 .2680047 1.243621

Because the advertised value of 0.2 does not fall inside the conﬁdence interval, the company is

allowing too much variation in the amount of tomato juice per jar.

14 ci — Conﬁdence intervals for means, proportions, and variances

Example 15: Conﬁdence interval for a standard deviation

Suppose the director of statistical development at a statistical software company is a big soccer

fan and requires all developers to play on the company team in the city’s local soccer league. Ten

developers are randomly selected to participate in the game. To ensure an advantage over other

teams, the director requires each of the 10 developers to cover 6 miles on average each game. Being

merciful, she will tolerate a standard deviation of 0.3 miles across different players, arguing that this

will keep the team’s performance consistent. The distance covered by each player is measured using

a pedometer. At the end of the game, the sample standard deviation of the distances covered by the

10 players was 0.56 miles:

. cii variances 10 0.56, sd

Variable Obs Std. dev. [95% conf. interval]

10 .56 .3851877 1.022342

Because the conﬁdence interval does not include the designated value for the standard deviation, 0.3

miles, it is clear the team is not meeting standards, and an unpleasant meeting is planned.

Example 16: Conﬁdence interval for a standard deviation of nonnormal data

Continuing with example 15, a clever statistician points out that distances covered by company

players in a soccer match do not follow the normal distribution because some players, mostly

econometricians, walk on the ﬁeld, while others, mostly statisticians, do all the running. Therefore,

the normal-based conﬁdence interval (which assumes normality) is not valid. Instead, we should use

the Bonett conﬁdence interval, which additionally requires an estimate of kurtosis; see Methods and

formulas. If kurtosis is estimated to be 5, we would obtain the following:

. cii variances 10 0.56 5, sd bonett

Bonett

Variable Obs Std. dev. [95% conf. interval]

10 .56 .2689449 1.45029

The Bonett conﬁdence interval now contains the speciﬁed value for the standard deviation, 0.3 miles.

The director of statistics concludes that overall team performance is acceptable. An uncomfortable

meeting is still planned but for a smaller group.

Stored results

ci means and cii means store the following in r():

Scalars

r(N) number of observations or, if poisson is speciﬁed, exposure

r(mean) mean

r(se) estimate of standard error

r(lb) lower bound of conﬁdence interval

r(ub) upper bound of conﬁdence interval

r(level) conﬁdence level of conﬁdence interval

Macros

r(citype) normal or poisson; type of conﬁdence interval

r(exposure) name of exposure variable with poisson

ci — Conﬁdence intervals for means, proportions, and variances 15

ci proportions and cii proportions store the following in r():

Scalars

r(N) number of observations

r(proportion) proportion

r(se) estimate of standard error

r(lb) lower bound of conﬁdence interval

r(ub) upper bound of conﬁdence interval

r(level) conﬁdence level of conﬁdence interval

Macros

r(citype) exact, wald, wilson, agresti, or jeffreys; type of conﬁdence interval

ci variances and cii variances store the following in r():

Scalars

r(N) number of observations

r(Var) variance

r(sd) standard deviation, if sd is speciﬁed

r(kurtosis) kurtosis, only if bonett is speciﬁed

r(lb) lower bound of conﬁdence interval

r(ub) upper bound of conﬁdence interval

r(level) conﬁdence level of conﬁdence interval

Macros

r(citype) normal or bonett, type of conﬁdence interval

Methods and formulas

Methods and formulas are presented under the following headings:

Normal mean

Poisson mean

Binomial proportion

Variance and standard deviation

Normal mean

Deﬁne n, x, and s

as, respectively, the number of observations, (weighted) average, and (unbiased)

estimated variance of the variable in question; see [R] summarize.

The standard error of the mean, s

, is deﬁned as

/n.

Let α be 1 −l/100, where l is the conﬁdence level speciﬁed by the user in the level() option.

Deﬁne t

α/2

as the two-sided t statistic corresponding to a signiﬁcance level of α with n −1 degrees

of freedom; t

α/2

is obtained from Stata as invttail(n-1,0.5*α). The lower and upper conﬁdence

bounds are, respectively, x − s

α/2

and x + s

α/2

Poisson mean

Given the total cases, k, the estimate of the expected count λ is k, and its standard error is

√

ci means with option poisson calculates the exact conﬁdence interval [ λ

, λ

] such that

Pr(K ≥ k|λ = λ

) = α/2

and

Pr(K ≤ k|λ = λ

) = α/2

where K is Poisson with mean λ. Solution is obtained by Newton’s method. If k = 0, the calculation

of λ

is skipped. All values are then reported as rates, which are the above numbers divided by the

total exposure.

16 ci — Conﬁdence intervals for means, proportions, and variances

Binomial proportion

Given k successes of n trials, the estimated probability of a success is bp = k/n with standard

error

bp(1 − bp)/n. ci calculates the exact (Clopper–Pearson) conﬁdence interval [ p

, p

] such that

Pr(K ≥ k|p = p

) = α/2

and

Pr(K ≤ k|p = p

) = α/2

where K is distributed as binomial(n, p). The endpoints may be obtained directly by using Stata’s

invbinomial() function. If k = 0 or k = n, the calculation of the appropriate tail is skipped.

The Wald interval is bp ± z

α/2

bp(1 − bp)/n, where z

α/2

is the 1 −α/2 quantile of the standard

normal. The interval is obtained by inverting the acceptance region of the large-sample Wald test of

: p = p

versus the two-sided alternative. That is, the conﬁdence interval is the set of all p

such

that



bp − p

−1

bp(1 − bp)



≤ z

α/2

The Wilson interval is a variation on the Wald interval, using the null standard error

−1

(1 −p

)

in place of the estimated standard error

−1

bp(1 − bp) in the above expression. Inverting this

acceptance region is more complicated yet results in the closed form

k + z

α/2

n + z

α/2

1/2

n + z

α/2

(

bp(1 − bp) +

α/2

)

1/2

The Agresti–Coull interval is basically a Wald interval that borrows its center from the Wilson

interval. Deﬁning

k = k + z

α/2

/2, en = n + z

α/2

, and (hence) ep =

k/en, the Agresti–Coull interval

ep ± z

α/2

ep(1 − ep)/en

When α = 0.05, z

α/2

is near enough to 2 that ep can be thought of as a typical estimate of proportion

where two successes and two failures have been added to the sample (Agresti and Coull 1998).

This typical estimate of proportion makes the Agresti–Coull interval an easy-to-present alternative

for introductory statistics students.

The Jeffreys interval is a Bayesian credible interval and is based on the Jeffreys prior, which

is the Beta(1/2, 1/2) distribution. Assigning this prior to p results in a posterior distribution for

p that is Beta with parameters k + 1/2 and n − k + 1/2. The Jeffreys interval is then taken to

be the 1 − α central posterior probability interval, namely, the α/2 and 1 − α/2 quantiles of the

Beta(k + 1/2, n − k + 1/2) distribution. These quantiles may be obtained directly by using Stata’s

invibeta() function. See [BAYES] bayesstats summary for more details about credible intervals.

Variance and standard deviation

Let X

, . . . , X

be a random sample and assume that X

∼ N(µ, σ

). Because (n −1)s

/σ

∼

n−1

, we have Pr{χ

n−1,α/2

≤ (n − 1)s

/σ

≤ χ

n−1,1−α/2

} = 1 − α, where χ

n−1,α/2

and

n−1,1−α/2

are the α/2 and 1 − α/2 quantiles of the χ

n−1

distribution. Thus, the normal-based

conﬁdence interval for the population variance σ

with 100(1 −α)% conﬁdence level is given by

normal

(n − 1)s

n−1,1−α/2

(n − 1)s

n−1,α/2

ci — Conﬁdence intervals for means, proportions, and variances 17

n−1,1−α/2

and χ

n−1,α/2

are obtained from Stata as invchi2tail(n-1,0.5*α) and invchi2(n-

1,0.5*α), respectively.

The normal-based conﬁdence interval is very sensitive to minor departures from the normality

assumption, and its performance does not improve with increasing sample size. For scenarios in

which the population distribution is not normal, the actual coverage probability of the normal-based

conﬁdence interval can be drastically lower than the nominal conﬁdence level α.

Bonett (2006) proposed an alternative to the normal-based conﬁdence interval that is nearly exact

under normality and has coverage probability close to 1 −α under moderate nonnormality. It also has

1 −α asymptotic coverage probability for nonnormal distributions with ﬁnite fourth moment. Instead

of assuming that X

∼ N(µ, σ

), Bonett’s approach requires continuous i.i.d. random variables with

ﬁnite fourth moment. The variance of s

may be expressed as σ

{γ

− (n − 3)/(n − 1)}/n (see

Casella and Berger [2002, ex. 5.8, 257]), where γ

= µ

/σ

is the kurtosis and µ

= E (X

− µ)

is the population fourth central moment. The variance-stabilizing transformation ln





and the delta

method can be used to construct an asymptotic 100(1 − α)% conﬁdence interval for σ



exp







− z

α/2



, exp







+ z

α/2



where se = {bγ

− (n − 3)/(n − 1)}/n ≈ Var







and bγ

is an estimate of the kurtosis.

Bonett introduced three adjustments to improve the small-sample properties of the above conﬁdence

interval. First, he swapped the inner and outer denominator in the expression for se and changed it

to {bγ

− (n − 3)/n}/(n − 1). This was suggested by Shoemaker (2003) who used it to improve

the small-sample performance of his variance test. Second, with regard to the estimation of kurtosis,

Bonett proposed bγ

= n

− m)



− X



, where m is a trimmed mean with a trim-

proportion equal to 1/



2(n − 4)

1/2



. This kurtosis estimator reduces the negative bias in symmetric

and skewed heavy-tailed distributions. Last, he empirically derived a small-sample correction factor

c = n/(n − z

α/2

) that helps equalize the tail probabilities. These modiﬁcations yield

Bonett



exp







− z

α/2



, exp







+ z

α/2



where z

α/2

is the 1 −α/2 quantile of the standard normal and se = c [{bγ

− (n − 3)/n}/(n − 1)].

Taking the square root of the endpoints of both intervals gives conﬁdence intervals for the standard

deviation σ.

 

Edwin Bidwell (E. B.) Wilson (1879–1964) majored in mathematics at Harvard and studied and

taught at Yale and MIT before returning to Harvard in 1922. He worked in mathematics, physics,

and statistics. His method for binomial intervals can be considered a precursor, for a particular

problem, of Neyman’s concept of conﬁdence intervals.

Jerzy Neyman (1894–1981) was born in Bendery, Russia, now Moldavia. He studied and then

taught at Kharkov University, moving from physics to mathematics. In 1921, Neyman moved

to Poland, where he worked in statistics at Bydgoszcz and then Warsaw. Neyman received

a Rockefeller Fellowship to work with Karl Pearson at University College London. There he

collaborated with Egon Pearson, Karl’s son, on the theory of hypothesis testing. Life in Poland

became progressively more difﬁcult, and Neyman returned to UCL to work there from 1934 to 1938.

At this time, he published on the theory of conﬁdence intervals. He then was offered a post in

California at Berkeley, where he settled. Neyman established an outstanding statistics department

and remained highly active in research, including applications in astronomy, meteorology, and

medicine. He was one of the great statisticians of the 20th century.

 

18 ci — Conﬁdence intervals for means, proportions, and variances

Acknowledgment

We thank Nicholas J. Cox of the Department of Geography at Durham University, UK, who is

coeditor of the Stata Journal and author of Speaking Stata Graphics for his assistance with the

jeffreys and wilson options.

References

Agresti, A., and B. A. Coull. 1998. Approximate is better than “exact” for interval estimation of binomial proportions.

American Statistician 52: 119–126. https://doi.org/10.1080/00031305.1998.10480550.

Bonett, D. G. 2006. Approximate conﬁdence interval for standard deviation of nonnormal distributions. Computational

Statistics and Data Analysis 50: 775–782. https://doi.org/10.1016/j.csda.2004.10.003.

Brown, L. D., T. T. Cai, and A. DasGupta. 2001. Interval estimation for a binomial proportion. Statistical Science

16: 101–133. https://doi.org/10.1214/ss/1009213286.

Campbell, M. J., D. Machin, and S. J. Walters. 2007. Medical Statistics: A Textbook for the Health Sciences. 4th

ed. Chichester, UK: Wiley.

Casella, G., and R. L. Berger. 2002. Statistical Inference. 2nd ed. Paciﬁc Grove, CA: Duxbury.

Clopper, C. J., and E. S. Pearson. 1934. The use of conﬁdence or ﬁducial limits illustrated in the case of the binomial.

Biometrika 26: 404–413. https://doi.org/10.1093/biomet/26.4.404.

Cook, A. 1990. Sir Harold Jeffreys, 2 April 1891–18 March 1989. Biographical Memoirs of Fellows of the Royal

Society 36: 303–333. https://doi.org/10.1098/rsbm.1990.0034.

Earnest, A. 2017. Essentials of a Successful Biostatistical Collaboration. Boca Raton, FL: CRC Press.

Jeffreys, H. 1946. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society

of London, Series A 186: 453–461. https://doi.org/10.1098/rspa.1946.0056.

Lash, T. L., T. J. VanderWeele, S. Haneuse, and K. J. Rothman. 2021. Modern Epidemiology. 4th ed. Philadelphia:

Wolters Kluwer.

Lindley, D. V. 2001. Harold Jeffreys. In Statisticians of the Centuries, ed. C. C. Heyde and E. Seneta, 402–405. New

York: Springer.

Miller, R. G., Jr. 1997. Beyond ANOVA: Basics of Applied Statistics. London: Chapman and Hall.

Reid, C. 1982. Neyman—from Life. New York: Springer.

Scheff

e, H. 1959. The Analysis of Variance. New York: Wiley.

Shoemaker, L. H. 2003. Fixing the F test for equal variances. American Statistician 57: 105–114.

https://doi.org/10.1198/0003130031441.

Stigler, S. M. 1997. Wilson, Edwin Bidwell. In Leading Personalities in Statistical Sciences: From the Seventeenth

Century to the Present, ed. N. L. Johnson and S. Kotz, 344–346. New York: Wiley.

Ward, B. W. 2019. kg nchs: A command for Korn–Graubard conﬁdence intervals and National Center for Health

Statistics’ Data Presentation Standards for Proportions. Stata Journal 19: 510–522.

Wilson, E. B. 1927. Probable inference, the law of succession, and statistical inference. Journal of the American

Statistical Association 22: 209–212. https://doi.org/10.2307/2276774.

ci — Conﬁdence intervals for means, proportions, and variances 19

Also see

[R] ameans — Arithmetic, geometric, and harmonic means

[R] bitest — Binomial probability test

[R] centile — Report centile and conﬁdence interval

[R] prtest — Tests of proportions

[R] sdtest — Variance-comparison tests

[R] summarize — Summary statistics

[R] ttest — t tests (mean-comparison tests)

[D] pctile — Create variable containing percentiles

Stata, Stata Press, and Mata are registered trademarks of StataCorp LLC. Stata and

Stata Press are registered trademarks with the World Intellectual Property Organization

of the United Nations. StataNow and NetCourseNow are trademarks of StataCorp

LLC. Other brand and product names are registered trademarks or trademarks of their

respective companies. Copyright

 1985–2023 StataCorp LLC, College Station, TX,

For suggested citations, see the FAQ on citing Stata documentation.