Title stata.com
prtest Tests of proportions
Description Quick start Menu Syntax
Options for prtest Options for prtesti Remarks and examples Stored results
Methods and formulas References Also see
Description
prtest performs tests on the equality of proportions using large-sample statistics. The test can be
performed for one sample against a hypothesized population value or for no difference in population
proportions estimated from two samples. Clustered data are supported.
prtesti is the immediate form of prtest; see [U] 19 Immediate commands.
Quick start
One-sample test that the proportion of 1s in v is equal to 0.1
prtest v == 0.1
Same as above, but using the 90% confidence level and adjusting for clustering with clusters defined
by cvar and an intraclass correlation of 0.5
prtest v == 0.1, level(90) cluster(cvar) rho(0.5)
Test that the proportion of 1s in v is equal between two groups defined by catvar
prtest v, by(catvar)
Same as above, and adjust for clustering with clusters defined by cvar and an intraclass correlation
of 0.5 in the two groups
prtest v, by(catvar) cluster(cvar) rho(0.5)
Test equality of proportions between v1 and v2
prtest v1 == v2
Test p
1
= p
2
if bp
1
= 0.10, bp
2
= 0.17, n
1
= 29, and n
2
= 36
prtesti 29 0.10 36 0.17
Menu
prtest
Statistics > Summaries, tables, and tests > Classical tests of hypotheses > Proportion test
prtesti
Statistics > Summaries, tables, and tests > Classical tests of hypotheses > Proportion test calculator
1
2 prtest Tests of proportions
Syntax
One-sample test of proportion
prtest varname == #
p
if
in
, onesampleopts
Two-sample test of proportions using groups
prtest varname
if
in
, by(groupvar)
twosamplegropts
Two-sample test of proportions using variables
prtest varname
1
== varname
2
if
in
, level(#)
Immediate form of one-sample test of proportion
prtesti #
obs1
#
p1
#
p2
, level(#) count
Immediate form of two-sample test of proportions
prtesti #
obs1
#
p1
#
obs2
#
p2
, level(#) count
onesampleopts Description
Main
level(#) confidence level; default is level(95)
cluster(varname) variable defining the clusters
rho(#) intraclass correlation
twosamplegropts Description
Main
by(groupvar) variable defining the groups
level(#) confidence level; default is level(95)
cluster(varname) variable defining the clusters
rho(#) common intraclass correlation
rho1(#) intraclass correlation for group 1
rho2(#) intraclass correlation for group 2
by(groupvar) is required.
by is allowed with prtest, and collect is allowed with prtest and prtesti; see [U] 11.1.10 Prefix commands.
Options for prtest
Main
by(groupvar) specifies a numeric variable that contains the group information for a given observation.
This variable must have only two values. Do not confuse the by() option with the by prefix; both
may be specified.
prtest Tests of proportions 3
level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is
level(95) or as set by set level; see [U] 20.8 Specifying the width of confidence intervals.
cluster(varname) specifies the variable that identifies clusters. The cluster() option is required
to adjust the computation for clustering.
rho(#) specifies the intraclass correlation for a one-sample test or the common intraclass correlation
for a two-sample test. The rho() option is required to adjust the computation for clustering for
a one-sample test.
rho1(#) specifies the intraclass correlation of the first group for a two-sample test using groups.
The rho() option or both rho1() and rho2() options are required to adjust the computation for
clustering.
rho2(#) specifies the intraclass correlation of the second group for a two-sample test using groups.
The rho() option or both rho1() and rho2() options are required to adjust the computation for
clustering.
Options for prtesti
level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is
level(95) or as set by set level; see [U] 20.8 Specifying the width of confidence intervals.
count specifies that integer counts instead of proportions be used in the immediate forms of prtest.
In the first syntax, prtesti expects that #
obs1
and #
p1
are counts#
p1
#
obs1
and #
p2
is a
proportion. In the second syntax, prtesti expects that all four numbers are integer counts, that
#
obs1
#
p1
, and that #
obs2
#
p2
.
Remarks and examples stata.com
Remarks are presented under the following headings:
Tests of proportions
Adjust for clustering
Immediate form
Tests of proportions
The prtest output follows the output of ttest in providing a lot of information. Each proportion
is presented along with a confidence interval. The appropriate one- or two-sample test is performed,
and the two-sided and both one-sided results are included at the bottom of the output. For a two-
sample test, the calculated difference is also presented with its confidence interval. This command
may be used for both large-sample testing and large-sample interval estimation. For one-sample tests
of proportions with small-sample sizes and to obtain exact p-values, researchers should use bitest;
see [R] bitest.
Example 1: One-sample test of proportion
In the first form, prtest tests whether the mean of the sample is equal to a known constant. Assume
that we have a sample of 74 automobiles. We wish to test whether the proportion of automobiles that
are foreign is different from 40%.
. use https://www.stata-press.com/data/r18/auto
(1978 automobile data)
4 prtest Tests of proportions
. prtest foreign == 0.4
One-sample test of proportion Number of obs = 74
Variable Mean Std. err. [95% conf. interval]
foreign .2972973 .0531331 .1931583 .4014363
p = proportion(foreign) z = -1.8034
H0: p = 0.4
Ha: p < 0.4 Ha: p != 0.4 Ha: p > 0.4
Pr(Z < z) = 0.0357 Pr(|Z| > |z|) = 0.0713 Pr(Z > z) = 0.9643
The test indicates that we cannot reject the hypothesis that the proportion of foreign automobiles is
0.40 at the 5% significance level.
Example 2: Two-sample test of proportions
We have two headache remedies that we give to patients. Each remedy’s effect is recorded as 0
for failing to relieve the headache and 1 for relieving the headache. We wish to test the equality of
the proportion of people relieved by the two treatments.
. use https://www.stata-press.com/data/r18/cure
. prtest cure1 == cure2
Two-sample test of proportions cure1: Number of obs = 50
cure2: Number of obs = 59
Variable Mean Std. err. z P>|z| [95% conf. interval]
cure1 .52 .0706541 .3815205 .6584795
cure2 .7118644 .0589618 .5963013 .8274275
diff -.1918644 .0920245 -.372229 -.0114998
under H0: .0931155 -2.06 0.039
diff = prop(cure1) - prop(cure2) z = -2.0605
H0: diff = 0
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(Z < z) = 0.0197 Pr(|Z| > |z|) = 0.0394 Pr(Z > z) = 0.9803
We find that the proportions are statistically different from each other at any level greater than 3.9%.
Adjust for clustering
When observations are not independent and can be grouped into clusters, we need to adjust for
clustering in a proportion test. For example, in a cluster randomized design, groups of individuals are
randomized instead of individuals. To adjust for clustering, we need to specify the cluster identifier
variable in the cluster() option. In the case of a one-sample proportion test, we need to also specify
the intraclass correlation in the rho() option. In the case of a two sample proportions test, we need
to also specify the common population intraclass correlation in the rho() option or group-specific
population intraclass correlations in the rho1() and rho2() options.
prtest Tests of proportions 5
Example 3: One-sample test of proportion, adjusting for clusters
Consider data from Hujoel, Moulton, and Loesche (1990) on the accuracy of an enzymatic diagnostic
test (EDT) of bacterial infections for 29 patients with multiple sites. The EDT was conducted on each
site, a specific area in a patient’s mouth, to determine infection by two strings of bacteria. A separate
reference test was also conducted on each site with an antibody assay against the two strings of
bacteria. The data record whether there was a positive EDT result at each infected site, a true positive
result.
We want to test whether the proportion of infected sites that were correctly diagnosed by the EDT is
different from 0.6. Because we have multiple infections per patient, we cluster by the patient-identifier
subject and use a value of 0.2 from Ahn, Heo, and Zhang (2015, 33) for the intrapatient correlation.
To perform the test, we specify the cluster(subject) and rho(0.2) options:
. use https://www.stata-press.com/data/r18/infection
(Target infections detected by EDT (Hujoel, Moulton, and Loesche 1990))
. prtest infect == 0.6, cluster(subject) rho(0.2)
One-sample test of proportion Number of obs = 142
Cluster variable: subject Number of clusters = 29
Avg. cluster size = 4.90
CV cluster size = 0.2419
Intraclass corr. = 0.2000
Variable Mean Std. err. [95% conf. interval]
infection .6619718 .0537974 .5565308 .7674129
p = proportion(infection) z = 1.1123
H0: p = 0.6
Ha: p < 0.6 Ha: p != 0.6 Ha: p > 0.6
Pr(Z < z) = 0.8670 Pr(|Z| > |z|) = 0.2660 Pr(Z > z) = 0.1330
We do not find statistical evidence to reject the null hypothesis of H
0
: P
infection
= 0.6 versus the
two-sided alternative H
a
: P
infection
6= 0.6 at the 5% significance level; the p-value = 0.2660 > 0.05.
Example 4: Two-sample test of proportions using groups, adjusting for clusters
Consider a dataset provided by Hayes and Moulton (2009), which contains a random subsample
of the original participants in a cluster randomized trial of a pneumococcal conjugate vaccine in
American Indian populations in the southwestern United States. There are two groups of infants with
18 clusters in each group. The control group received a meningococcal C conjugate vaccine (MnCC),
and the experimental group received the seven-valent pneumococcal conjugate vaccine (PnCRM7). The
two groups are identified by the vaccine variable, and the pneumonia variable records 1 if an infant
had at least one bacterial pneumonia episode and 0 otherwise. These data are originally from O’Brien
et al. (2003).
We want to test the equality of the proportion of cases of pneumonia in the two vaccine groups.
We assume a common known intraclass correlation of 0.02. To perform the test, we type
6 prtest Tests of proportions
. use https://www.stata-press.com/data/r18/pneumoniacrt
(Bacterial pneumonia episodes data from CRT (Hayes and Moulton 2009))
. prtest pneumonia, by(vaccine) cluster(cluster) rho(0.02)
Two-sample test of proportions
Cluster variable: cluster
Group: MnCC Group: PnCRM7
Number of obs = 238 Number of obs = 211
Number of clusters = 18 Number of clusters = 18
Avg. cluster size = 13.22 Avg. cluster size = 11.72
CV cluster size = 0.9605 CV cluster size = 0.7976
Intraclass corr. = 0.0200 Intraclass corr. = 0.0200
Group Mean Std. err. z P>|z| [95% conf. interval]
MnCC .2226891 .0329017 .1582029 .2871753
PnCRM7
.1658768 .0299027 .1072686 .224485
diff .0568123 .04446 -.0303278 .1439524
under H0: .0447641 1.27 0.204
diff = prop(MnCC) - prop(PnCRM7) z = 1.2691
H0: diff = 0
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(Z < z) = 0.8978 Pr(|Z| > |z|) = 0.2044 Pr(Z > z) = 0.1022
We do not find statistical evidence to reject the null hypothesis of H
0
: P
diff
= 0 versus the two-sided
alternative H
a
: P
diff
6= 0 at the 5% significance level; the p-value = 0.2044 > 0.05.
Immediate form
Example 5: Immediate form of one-sample test of proportion
prtesti is like prtest, except that you specify summary statistics rather than variables as
arguments. For instance, we are reading an article that reports the proportion of registered voters
among 50 randomly selected eligible voters as 0.52. We wish to test whether the proportion is 0.7:
. prtesti 50 0.52 0.70
One-sample test of proportion x: Number of obs = 50
Mean Std. err. [95% conf. interval]
x .52 .0706541 .3815205 .6584795
p = proportion(x) z = -2.7775
H0: p = 0.7
Ha: p < 0.7 Ha: p != 0.7 Ha: p > 0.7
Pr(Z < z) = 0.0027 Pr(|Z| > |z|) = 0.0055 Pr(Z > z) = 0.9973
prtest Tests of proportions 7
Example 6: Immediate form of two-sample test of proportions
To judge teacher effectiveness, we wish to test whether the same proportion of people from
two classes will answer an advanced question correctly. In the first classroom of 30 students, 40%
answered the question correctly, whereas in the second classroom of 45 students, 67% answered the
question correctly.
. prtesti 30 0.4 45 0.67
Two-sample test of proportions x: Number of obs = 30
y: Number of obs = 45
Mean Std. err. z P>|z| [95% conf. interval]
x .4 .0894427 .2246955 .5753045
y .67 .0700952 .532616 .807384
diff -.27 .1136368 -.4927241 -.0472759
under H0: .1169416 -2.31 0.021
diff = prop(x) - prop(y) z = -2.3088
H0: diff = 0
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(Z < z) = 0.0105 Pr(|Z| > |z|) = 0.0210 Pr(Z > z) = 0.9895
Stored results
One-sample prtest and prtesti store the following in r():
Scalars
r(N) sample size
r(P) sample proportion
r(se) standard error of sample proportion
r(lb) lower confidence bound of sample proportion
r(ub) upper confidence bound of sample proportion
r(z) z statistic
r(p l) lower one-sided p-value
r(p) two-sided p-value
r(p u) upper one-sided p-value
r(level) confidence level
Cluster-adjusted one-sample prtest also stores the following in r():
Scalars
r(K) number of clusters K
r(M) cluster size M
r(rho) intraclass correlation
r(CV cluster) coefficient of variation for cluster sizes
Two-sample prtest and two-sample prtesti store the following in r():
Scalars
r(N1) sample size of population one
r(N2) sample size of population two
r(P1) sample proportion for population one
r(P2) sample proportion for population two
r(P diff) difference of proportions
r(se1) standard error of population-one sample proportion
r(se2) standard error of population-two sample proportion
8 prtest Tests of proportions
r(se diff) standard error of the difference of proportions
r(se diff0) standard error of the difference of proportions under H
0
r(lb1) lower confidence bound of population-one sample proportion
r(ub1) upper confidence bound of population-one sample proportion
r(lb2) lower confidence bound of population-two sample proportion
r(ub2) upper confidence bound of population-two sample proportion
r(lb diff) lower confidence bound of the difference of proportions
r(ub diff) upper confidence bound of the difference of proportions
r(z) z statistic
r(p l) lower one-sided p-value
r(p) two-sided p-value
r(p u) upper one-sided p-value
r(level) confidence level
Cluster-adjusted two-sample prtest using the by() option also stores the following in r():
Scalars
r(K1) population-one number of clusters K
1
r(K2) population-two number of clusters K
2
r(M1) population-one cluster size M
1
r(M2) population-two cluster size M
2
r(rho) common intraclass correlation
r(rho1) population-one intraclass correlation
r(rho2) population-two intraclass correlation
r(CV cluster1) population-one coefficient of variation for cluster sizes
r(CV cluster2) population-two coefficient of variation for cluster sizes
Methods and formulas
Remarks are presented under the following headings:
One-sample test
Two-sample test
For all the tests below, the test statistic z has an asymptotic standard normal distribution, and the
p-value is computed as
p =
1 Φ (z) for an upper one-sided test
Φ (z) for a lower one-sided test
2 {1 Φ (|z|)} for a two-sided test
where Φ(·) is the cdf of a standard normal distribution and |z| is an absolute value of z.
See Acock (2023, 158–164) for additional examples of tests of proportions using Stata.
One-sample test
Let n be the number of observations, bp be the observed proportion, and bq = 1 bp.
The one-tailed and two-tailed tests of a population proportion use an asymptotically normally
distributed test statistic calculated as
z =
bp p
0
s
0
where p
0
is the hypothesized proportion, q
0
= 1 p
0
, and s
0
=
p
p
0
q
0
/n is the standard error of
bp under the null hypothesis of p = p
0
.
prtest Tests of proportions 9
A large-sample 100(1 α)% confidence interval for a proportion p is
bp ± z
1α/2
s
where s =
p
bp bq/n and z
1α/2
is the (1 α/2)th quantile of the standard normal distribution.
With clustered data, suppose that there are K clusters, each of size M
i
such that n =
P
K
i=1
M
i
.
Let ρ be the intraclass correlation. Following Ahn, Heo, and Zhang (2015), we assume that the cluster
sizes M
i
are independent and identically distributed. Let C
adj
be the adjustment to the standard error
for clustered data,
C
adj
=
v
u
u
t
K
X
i=1
M
i
{1 + ρ(M
i
1)}/n
such that s
0,cl
= C
adj
s
0
and s
cl
= C
adj
s.
C
adj
can be equivalently written as
C
adj
=
q
1 + ρ(M 1) + ρMCV
2
cl
where M =
P
K
i=1
M
i
/K is the average cluster size and CV
cl
is the coefficient of variation for cluster
sizes:
CV
cl
=
q
P
K
i=1
(M
i
M )
2
/K
M
To adjust the test statistic z and the confidence interval for clustering, replace s
0
with s
0,cl
and s with s
cl
in the corresponding formulas. In the presence of clustering, the test statistic z is
asymptotically normally distributed conditional on the empirical distribution of M
i
s.
Two-sample test
Let n
1
be the number of observations in population one and n
2
be the number of observations in
population two, bp
1
be the observed proportion in population one and bp
2
be the observed proportion
in population two, and bq
1
= 1 bp
1
and bq
2
= 1 bp
2
. Let x
1
and x
2
be the total number of successes
in the two populations.
A test of the difference of two proportions uses an asymptotically normally distributed test statistic
calculated as
z =
bp
1
bp
2
s
d0
where s
d0
=
p
bp
p
bq
p
(1/n
1
+ 1/n
2
) is the standard error of bp
1
bp
2
under the null hypothesis of
p
1
= p
2
, with bp
p
= (x
1
+ x
2
)/(n
1
+ n
2
) and bq
p
= 1 bp
p
.
The 100(1 α)% confidence interval for the difference of two proportions is given by
(bp
1
bp
2
) ± z
1α/2
q
s
2
1
+ s
2
2
10 prtest Tests of proportions
where s
1
=
p
bp
1
bq
1
/n
1
and s
2
=
p
bp
2
bq
2
/n
2
are the standard errors of the two sample proportions
and z
1α/2
is the (1 α/2)th quantile of the standard normal distribution.
With clustered data, suppose that there are K
1
and K
2
clusters in population one and population
two with the corresponding average cluster sizes of M
1
and M
2
. Let ρ
1
and ρ
2
be the intraclass
correlations and CV
cl,1
and CV
cl,2
be the coefficients of variation for cluster sizes for population one
and population two. Let C
adj,1
and C
adj,2
be the adjustments to standard errors of the two sample
proportions for clustered data, defined analogously to C
adj
in One-sample test for each population.
Let s
d0,cl
=
r
bp
p
bq
p
C
2
adj,1
/n
1
+ C
2
adj,2
/n
2
be the standard error of bp
1
bp
2
under the null
hypothesis of p
1
= p
2
adjusted for clustered data. Also, let s
1,cl
= C
adj,1
s
1
and s
2,cl
= C
adj,2
s
2
be
the standard errors of bp
1
and bp
2
adjusted for clustered data. To adjust the two-sample test statistic
and the confidence interval for clustering, replace s
d0
with s
d0,cl
, s
1
with s
1,cl
, and s
2
with s
2,cl
in
the corresponding formulas.
References
Acock, A. C. 2023. A Gentle Introduction to Stata. Rev. 6th ed. College Station, TX: Stata Press.
Ahn, C., M. Heo, and S. Zhang. 2015. Sample Size Calculations for Clustered and Longitudinal Outcomes in Clinical
Research. Boca Raton, FL: CRC Press.
Hayes, R. J., and L. H. Moulton. 2009. Cluster Randomised Trials. Boca Raton, FL: CRC Press.
. 2017. Cluster Randomised Trials. 2nd ed. Boca Raton, FL: CRC Press.
Hujoel, P. P., L. H. Moulton, and W. J. Loesche. 1990. Estimation of sensitivity and specificity of site-specific
diagnostic tests. Journal of Periodontal Research 25: 193–196. https://doi.org/10.1111/j.1600-0765.1990.tb00903.x.
O’Brien, K. L., L. H. Moulton, R. Reid, R. Weatherholt, J. Oski, L. B. Brown, G. Kumar, A. Parkinson, D. Hu,
J. Hackell, I. Chang, R. Kohberger, G. Siber, and M. Santosham. 2003. Efficacy and safety of seven-valent
conjugate pneumococcal vaccine in American Indian children: Group randomised trial. Lancet 362: 355–361.
https://doi.org/10.1016/S0140-6736(03)14022-6.
Also see
[R] bitest Binomial probability test
[R] proportion Estimate proportions
[R] ttest t tests (mean-comparison tests)
[MV] hotelling Hotelling’s T
2
generalized means test
[PSS-2] power oneproportion Power analysis for a one-sample proportion test
[PSS-2] power oneproportion, cluster Power analysis for a one-sample proportion test, CRD
[PSS-2] power twoproportions Power analysis for a two-sample proportions test
[PSS-2] power twoproportions, cluster Power analysis for a two-sample proportions test, CRD
Stata, Stata Press, and Mata are registered trademarks of StataCorp LLC. Stata and
Stata Press are registered trademarks with the World Intellectual Property Organization
of the United Nations. StataNow and NetCourseNow are trademarks of StataCorp
LLC. Other brand and product names are registered trademarks or trademarks of their
respective companies. Copyright
c
19852023 StataCorp LLC, College Station, TX,
USA. All rights reserved.
®
For suggested citations, see the FAQ on citing Stata documentation.