prtest — Tests of proportions

Title stata.com

Description Quick start Menu Syntax

Options for prtest Options for prtesti Remarks and examples Stored results

Methods and formulas References Also see

Description

prtest performs tests on the equality of proportions using large-sample statistics. The test can be

performed for one sample against a hypothesized population value or for no difference in population

proportions estimated from two samples. Clustered data are supported.

prtesti is the immediate form of prtest; see [U] 19 Immediate commands.

Quick start

One-sample test that the proportion of 1s in v is equal to 0.1

prtest v == 0.1

Same as above, but using the 90% conﬁdence level and adjusting for clustering with clusters deﬁned

by cvar and an intraclass correlation of 0.5

prtest v == 0.1, level(90) cluster(cvar) rho(0.5)

Test that the proportion of 1s in v is equal between two groups deﬁned by catvar

prtest v, by(catvar)

Same as above, and adjust for clustering with clusters deﬁned by cvar and an intraclass correlation

of 0.5 in the two groups

prtest v, by(catvar) cluster(cvar) rho(0.5)

Test equality of proportions between v1 and v2

prtest v1 == v2

Test p

= p

if bp

= 0.10, bp

= 0.17, n

= 29, and n

= 36

prtesti 29 0.10 36 0.17

prtest

Statistics > Summaries, tables, and tests > Classical tests of hypotheses > Proportion test

prtesti

Statistics > Summaries, tables, and tests > Classical tests of hypotheses > Proportion test calculator

2 prtest — Tests of proportions

Syntax

One-sample test of proportion

prtest varname == #





, onesampleopts



Two-sample test of proportions using groups

prtest varname







, by(groupvar)



twosamplegropts



Two-sample test of proportions using variables

prtest varname

== varname





, level(#)



Immediate form of one-sample test of proportion

prtesti #

obs1



, level(#) count



Immediate form of two-sample test of proportions

prtesti #

obs1

obs2



, level(#) count



onesampleopts Description

Main

level(#) conﬁdence level; default is level(95)

cluster(varname) variable deﬁning the clusters

rho(#) intraclass correlation

twosamplegropts Description

Main

∗

by(groupvar) variable deﬁning the groups

level(#) conﬁdence level; default is level(95)

cluster(varname) variable deﬁning the clusters

rho(#) common intraclass correlation

rho1(#) intraclass correlation for group 1

rho2(#) intraclass correlation for group 2

∗

by(groupvar) is required.

by is allowed with prtest, and collect is allowed with prtest and prtesti; see [U] 11.1.10 Preﬁx commands.

Options for prtest



 

Main



by(groupvar) speciﬁes a numeric variable that contains the group information for a given observation.

This variable must have only two values. Do not confuse the by() option with the by preﬁx; both

may be speciﬁed.

prtest — Tests of proportions 3

level(#) speciﬁes the conﬁdence level, as a percentage, for conﬁdence intervals. The default is

level(95) or as set by set level; see [U] 20.8 Specifying the width of conﬁdence intervals.

cluster(varname) speciﬁes the variable that identiﬁes clusters. The cluster() option is required

to adjust the computation for clustering.

rho(#) speciﬁes the intraclass correlation for a one-sample test or the common intraclass correlation

for a two-sample test. The rho() option is required to adjust the computation for clustering for

a one-sample test.

rho1(#) speciﬁes the intraclass correlation of the ﬁrst group for a two-sample test using groups.

The rho() option or both rho1() and rho2() options are required to adjust the computation for

clustering.

rho2(#) speciﬁes the intraclass correlation of the second group for a two-sample test using groups.

The rho() option or both rho1() and rho2() options are required to adjust the computation for

clustering.

Options for prtesti

level(#) speciﬁes the conﬁdence level, as a percentage, for conﬁdence intervals. The default is

level(95) or as set by set level; see [U] 20.8 Specifying the width of conﬁdence intervals.

count speciﬁes that integer counts instead of proportions be used in the immediate forms of prtest.

In the ﬁrst syntax, prtesti expects that #

obs1

and #

are counts—#

≤ #

obs1

—and #

is a

proportion. In the second syntax, prtesti expects that all four numbers are integer counts, that

obs1

≥ #

, and that #

obs2

≥ #

Remarks and examples stata.com

Remarks are presented under the following headings:

Tests of proportions

Adjust for clustering

Immediate form

Tests of proportions

The prtest output follows the output of ttest in providing a lot of information. Each proportion

is presented along with a conﬁdence interval. The appropriate one- or two-sample test is performed,

and the two-sided and both one-sided results are included at the bottom of the output. For a two-

sample test, the calculated difference is also presented with its conﬁdence interval. This command

may be used for both large-sample testing and large-sample interval estimation. For one-sample tests

of proportions with small-sample sizes and to obtain exact p-values, researchers should use bitest;

see [R] bitest.

Example 1: One-sample test of proportion

In the ﬁrst form, prtest tests whether the mean of the sample is equal to a known constant. Assume

that we have a sample of 74 automobiles. We wish to test whether the proportion of automobiles that

are foreign is different from 40%.

. use https://www.stata-press.com/data/r18/auto

(1978 automobile data)

4 prtest — Tests of proportions

. prtest foreign == 0.4

One-sample test of proportion Number of obs = 74

Variable Mean Std. err. [95% conf. interval]

foreign .2972973 .0531331 .1931583 .4014363

p = proportion(foreign) z = -1.8034

H0: p = 0.4

Ha: p < 0.4 Ha: p != 0.4 Ha: p > 0.4

Pr(Z < z) = 0.0357 Pr(|Z| > |z|) = 0.0713 Pr(Z > z) = 0.9643

The test indicates that we cannot reject the hypothesis that the proportion of foreign automobiles is

0.40 at the 5% signiﬁcance level.

Example 2: Two-sample test of proportions

We have two headache remedies that we give to patients. Each remedy’s effect is recorded as 0

for failing to relieve the headache and 1 for relieving the headache. We wish to test the equality of

the proportion of people relieved by the two treatments.

. use https://www.stata-press.com/data/r18/cure

. prtest cure1 == cure2

Two-sample test of proportions cure1: Number of obs = 50

cure2: Number of obs = 59

Variable Mean Std. err. z P>|z| [95% conf. interval]

cure1 .52 .0706541 .3815205 .6584795

cure2 .7118644 .0589618 .5963013 .8274275

diff -.1918644 .0920245 -.372229 -.0114998

under H0: .0931155 -2.06 0.039

diff = prop(cure1) - prop(cure2) z = -2.0605

H0: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(Z < z) = 0.0197 Pr(|Z| > |z|) = 0.0394 Pr(Z > z) = 0.9803

We ﬁnd that the proportions are statistically different from each other at any level greater than 3.9%.

Adjust for clustering

When observations are not independent and can be grouped into clusters, we need to adjust for

clustering in a proportion test. For example, in a cluster randomized design, groups of individuals are

randomized instead of individuals. To adjust for clustering, we need to specify the cluster identiﬁer

variable in the cluster() option. In the case of a one-sample proportion test, we need to also specify

the intraclass correlation in the rho() option. In the case of a two sample proportions test, we need

to also specify the common population intraclass correlation in the rho() option or group-speciﬁc

population intraclass correlations in the rho1() and rho2() options.

prtest — Tests of proportions 5

Example 3: One-sample test of proportion, adjusting for clusters

Consider data from Hujoel, Moulton, and Loesche (1990) on the accuracy of an enzymatic diagnostic

test (EDT) of bacterial infections for 29 patients with multiple sites. The EDT was conducted on each

site, a speciﬁc area in a patient’s mouth, to determine infection by two strings of bacteria. A separate

reference test was also conducted on each site with an antibody assay against the two strings of

bacteria. The data record whether there was a positive EDT result at each infected site, a true positive

result.

We want to test whether the proportion of infected sites that were correctly diagnosed by the EDT is

different from 0.6. Because we have multiple infections per patient, we cluster by the patient-identiﬁer

subject and use a value of 0.2 from Ahn, Heo, and Zhang (2015, 33) for the intrapatient correlation.

To perform the test, we specify the cluster(subject) and rho(0.2) options:

. use https://www.stata-press.com/data/r18/infection

(Target infections detected by EDT (Hujoel, Moulton, and Loesche 1990))

. prtest infect == 0.6, cluster(subject) rho(0.2)

One-sample test of proportion Number of obs = 142

Cluster variable: subject Number of clusters = 29

Avg. cluster size = 4.90

CV cluster size = 0.2419

Intraclass corr. = 0.2000

Variable Mean Std. err. [95% conf. interval]

infection .6619718 .0537974 .5565308 .7674129

p = proportion(infection) z = 1.1123

H0: p = 0.6

Ha: p < 0.6 Ha: p != 0.6 Ha: p > 0.6

Pr(Z < z) = 0.8670 Pr(|Z| > |z|) = 0.2660 Pr(Z > z) = 0.1330

We do not ﬁnd statistical evidence to reject the null hypothesis of H

: P

infection

= 0.6 versus the

two-sided alternative H

: P

infection

6= 0.6 at the 5% signiﬁcance level; the p-value = 0.2660 > 0.05.

Example 4: Two-sample test of proportions using groups, adjusting for clusters

Consider a dataset provided by Hayes and Moulton (2009), which contains a random subsample

of the original participants in a cluster randomized trial of a pneumococcal conjugate vaccine in

American Indian populations in the southwestern United States. There are two groups of infants with

18 clusters in each group. The control group received a meningococcal C conjugate vaccine (MnCC),

and the experimental group received the seven-valent pneumococcal conjugate vaccine (PnCRM7). The

two groups are identiﬁed by the vaccine variable, and the pneumonia variable records 1 if an infant

had at least one bacterial pneumonia episode and 0 otherwise. These data are originally from O’Brien

et al. (2003).

We want to test the equality of the proportion of cases of pneumonia in the two vaccine groups.

We assume a common known intraclass correlation of 0.02. To perform the test, we type

6 prtest — Tests of proportions

. use https://www.stata-press.com/data/r18/pneumoniacrt

(Bacterial pneumonia episodes data from CRT (Hayes and Moulton 2009))

. prtest pneumonia, by(vaccine) cluster(cluster) rho(0.02)

Two-sample test of proportions

Cluster variable: cluster

Group: MnCC Group: PnCRM7

Number of obs = 238 Number of obs = 211

Number of clusters = 18 Number of clusters = 18

Avg. cluster size = 13.22 Avg. cluster size = 11.72

CV cluster size = 0.9605 CV cluster size = 0.7976

Intraclass corr. = 0.0200 Intraclass corr. = 0.0200

Group Mean Std. err. z P>|z| [95% conf. interval]

MnCC .2226891 .0329017 .1582029 .2871753

PnCRM7

.1658768 .0299027 .1072686 .224485

diff .0568123 .04446 -.0303278 .1439524

under H0: .0447641 1.27 0.204

diff = prop(MnCC) - prop(PnCRM7) z = 1.2691

H0: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(Z < z) = 0.8978 Pr(|Z| > |z|) = 0.2044 Pr(Z > z) = 0.1022

We do not ﬁnd statistical evidence to reject the null hypothesis of H

: P

diﬀ

= 0 versus the two-sided

alternative H

: P

diﬀ

6= 0 at the 5% signiﬁcance level; the p-value = 0.2044 > 0.05.

Immediate form

Example 5: Immediate form of one-sample test of proportion

prtesti is like prtest, except that you specify summary statistics rather than variables as

arguments. For instance, we are reading an article that reports the proportion of registered voters

among 50 randomly selected eligible voters as 0.52. We wish to test whether the proportion is 0.7:

. prtesti 50 0.52 0.70

One-sample test of proportion x: Number of obs = 50

Mean Std. err. [95% conf. interval]

x .52 .0706541 .3815205 .6584795

p = proportion(x) z = -2.7775

H0: p = 0.7

Ha: p < 0.7 Ha: p != 0.7 Ha: p > 0.7

Pr(Z < z) = 0.0027 Pr(|Z| > |z|) = 0.0055 Pr(Z > z) = 0.9973

prtest — Tests of proportions 7

Example 6: Immediate form of two-sample test of proportions

To judge teacher effectiveness, we wish to test whether the same proportion of people from

two classes will answer an advanced question correctly. In the ﬁrst classroom of 30 students, 40%

answered the question correctly, whereas in the second classroom of 45 students, 67% answered the

question correctly.

. prtesti 30 0.4 45 0.67

Two-sample test of proportions x: Number of obs = 30

y: Number of obs = 45

Mean Std. err. z P>|z| [95% conf. interval]

x .4 .0894427 .2246955 .5753045

y .67 .0700952 .532616 .807384

diff -.27 .1136368 -.4927241 -.0472759

under H0: .1169416 -2.31 0.021

diff = prop(x) - prop(y) z = -2.3088

H0: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(Z < z) = 0.0105 Pr(|Z| > |z|) = 0.0210 Pr(Z > z) = 0.9895

Stored results

One-sample prtest and prtesti store the following in r():

Scalars

r(N) sample size

r(P) sample proportion

r(se) standard error of sample proportion

r(lb) lower conﬁdence bound of sample proportion

r(ub) upper conﬁdence bound of sample proportion

r(z) z statistic

r(p l) lower one-sided p-value

r(p) two-sided p-value

r(p u) upper one-sided p-value

r(level) conﬁdence level

Cluster-adjusted one-sample prtest also stores the following in r():

Scalars

r(K) number of clusters K

r(M) cluster size M

r(rho) intraclass correlation

r(CV cluster) coefﬁcient of variation for cluster sizes

Two-sample prtest and two-sample prtesti store the following in r():

Scalars

r(N1) sample size of population one

r(N2) sample size of population two

r(P1) sample proportion for population one

r(P2) sample proportion for population two

r(P diff) difference of proportions

r(se1) standard error of population-one sample proportion

r(se2) standard error of population-two sample proportion

8 prtest — Tests of proportions

r(se diff) standard error of the difference of proportions

r(se diff0) standard error of the difference of proportions under H

r(lb1) lower conﬁdence bound of population-one sample proportion

r(ub1) upper conﬁdence bound of population-one sample proportion

r(lb2) lower conﬁdence bound of population-two sample proportion

r(ub2) upper conﬁdence bound of population-two sample proportion

r(lb diff) lower conﬁdence bound of the difference of proportions

r(ub diff) upper conﬁdence bound of the difference of proportions

r(z) z statistic

r(p l) lower one-sided p-value

r(p) two-sided p-value

r(p u) upper one-sided p-value

r(level) conﬁdence level

Cluster-adjusted two-sample prtest using the by() option also stores the following in r():

Scalars

r(K1) population-one number of clusters K

r(K2) population-two number of clusters K

r(M1) population-one cluster size M

r(M2) population-two cluster size M

r(rho) common intraclass correlation

r(rho1) population-one intraclass correlation

r(rho2) population-two intraclass correlation

r(CV cluster1) population-one coefﬁcient of variation for cluster sizes

r(CV cluster2) population-two coefﬁcient of variation for cluster sizes

Methods and formulas

Remarks are presented under the following headings:

One-sample test

Two-sample test

For all the tests below, the test statistic z has an asymptotic standard normal distribution, and the

p-value is computed as

p =







1 − Φ (z) for an upper one-sided test

Φ (z) for a lower one-sided test

2 {1 − Φ (|z|)} for a two-sided test

where Φ(·) is the cdf of a standard normal distribution and |z| is an absolute value of z.

See Acock (2023, 158–164) for additional examples of tests of proportions using Stata.

One-sample test

Let n be the number of observations, bp be the observed proportion, and bq = 1 − bp.

The one-tailed and two-tailed tests of a population proportion use an asymptotically normally

distributed test statistic calculated as

z =

bp − p

where p

is the hypothesized proportion, q

= 1 − p

, and s

/n is the standard error of

bp under the null hypothesis of p = p

prtest — Tests of proportions 9

A large-sample 100(1 − α)% conﬁdence interval for a proportion p is

bp ± z

1−α/2

where s =

bp bq/n and z

1−α/2

is the (1 − α/2)th quantile of the standard normal distribution.

With clustered data, suppose that there are K clusters, each of size M

such that n =

i=1

Let ρ be the intraclass correlation. Following Ahn, Heo, and Zhang (2015), we assume that the cluster

sizes M

are independent and identically distributed. Let C

adj

be the adjustment to the standard error

for clustered data,

adj

i=1

{1 + ρ(M

− 1)}/n

such that s

0,cl

= C

adj

and s

= C

adj

can be equivalently written as

adj

1 + ρ(M − 1) + ρMCV

where M =

i=1

/K is the average cluster size and CV

is the coefﬁcient of variation for cluster

sizes:

i=1

− M )

To adjust the test statistic z and the conﬁdence interval for clustering, replace s

with s

0,cl

and s with s

in the corresponding formulas. In the presence of clustering, the test statistic z is

asymptotically normally distributed conditional on the empirical distribution of M

’s.

Two-sample test

Let n

be the number of observations in population one and n

be the number of observations in

population two, bp

be the observed proportion in population one and bp

be the observed proportion

in population two, and bq

= 1 − bp

and bq

= 1 − bp

. Let x

and x

be the total number of successes

in the two populations.

A test of the difference of two proportions uses an asymptotically normally distributed test statistic

calculated as

z =

− bp

where s

(1/n

+ 1/n

) is the standard error of bp

− bp

under the null hypothesis of

= p

, with bp

= (x

+ x

)/(n

+ n

) and bq

= 1 − bp

The 100(1 − α)% conﬁdence interval for the difference of two proportions is given by

(bp

− bp

) ± z

1−α/2

+ s

10 prtest — Tests of proportions

where s

and s

are the standard errors of the two sample proportions

and z

1−α/2

is the (1 − α/2)th quantile of the standard normal distribution.

With clustered data, suppose that there are K

and K

clusters in population one and population

two with the corresponding average cluster sizes of M

and M

. Let ρ

and ρ

be the intraclass

correlations and CV

cl,1

and CV

cl,2

be the coefﬁcients of variation for cluster sizes for population one

and population two. Let C

adj,1

and C

adj,2

be the adjustments to standard errors of the two sample

proportions for clustered data, deﬁned analogously to C

adj

in One-sample test for each population.

Let s

d0,cl



adj,1

+ C

adj,2



be the standard error of bp

− bp

under the null

hypothesis of p

= p

adjusted for clustered data. Also, let s

1,cl

= C

adj,1

and s

2,cl

= C

adj,2

the standard errors of bp

and bp

adjusted for clustered data. To adjust the two-sample test statistic

and the conﬁdence interval for clustering, replace s

with s

d0,cl

, s

with s

1,cl

, and s

with s

2,cl

the corresponding formulas.

References

Acock, A. C. 2023. A Gentle Introduction to Stata. Rev. 6th ed. College Station, TX: Stata Press.

Ahn, C., M. Heo, and S. Zhang. 2015. Sample Size Calculations for Clustered and Longitudinal Outcomes in Clinical

Research. Boca Raton, FL: CRC Press.

Hayes, R. J., and L. H. Moulton. 2009. Cluster Randomised Trials. Boca Raton, FL: CRC Press.

. 2017. Cluster Randomised Trials. 2nd ed. Boca Raton, FL: CRC Press.

Hujoel, P. P., L. H. Moulton, and W. J. Loesche. 1990. Estimation of sensitivity and speciﬁcity of site-speciﬁc

diagnostic tests. Journal of Periodontal Research 25: 193–196. https://doi.org/10.1111/j.1600-0765.1990.tb00903.x.

O’Brien, K. L., L. H. Moulton, R. Reid, R. Weatherholt, J. Oski, L. B. Brown, G. Kumar, A. Parkinson, D. Hu,

J. Hackell, I. Chang, R. Kohberger, G. Siber, and M. Santosham. 2003. Efﬁcacy and safety of seven-valent

conjugate pneumococcal vaccine in American Indian children: Group randomised trial. Lancet 362: 355–361.

https://doi.org/10.1016/S0140-6736(03)14022-6.

Also see

[R] bitest — Binomial probability test

[R] proportion — Estimate proportions

[R] ttest — t tests (mean-comparison tests)

[MV] hotelling — Hotelling’s T

generalized means test

[PSS-2] power oneproportion — Power analysis for a one-sample proportion test

[PSS-2] power oneproportion, cluster — Power analysis for a one-sample proportion test, CRD

[PSS-2] power twoproportions — Power analysis for a two-sample proportions test

[PSS-2] power twoproportions, cluster — Power analysis for a two-sample proportions test, CRD

Stata, Stata Press, and Mata are registered trademarks of StataCorp LLC. Stata and

Stata Press are registered trademarks with the World Intellectual Property Organization

of the United Nations. StataNow and NetCourseNow are trademarks of StataCorp

LLC. Other brand and product names are registered trademarks or trademarks of their

respective companies. Copyright

 1985–2023 StataCorp LLC, College Station, TX,

For suggested citations, see the FAQ on citing Stata documentation.