StatiSticS PaPer SerieS
NO 3 / SePtember 2013
QUaLitY meaSUreS
iN NON-raNDOm SamPLiNG
mFi iNtereSt rate StatiSticS
In 2013 all ECB
publications
feature a motif
taken from
the €5 banknote.
NOte: This Statistics Paper should not be reported
as representing the views of the European Central
Bank (ECB). The views expressed are those of the
authors and do not necessarily reect those of the ECB.
techNicaL exPert GrOUP
ON mFi iNtereSt rate StatiSticS
© European Central Bank, 2013
Address Kaiserstrasse 29, 60311 Frankfurt am Main, Germany
Postal address Postfach 16 03 19, 60066 Frankfurt am Main, Germany
Telephone +49 69 1344 0
Internet http://www.ecb.europa.eu
All rights reserved.
ISSN 2314-9248 (online)
EU Catalogue No QB-BF-13-003-EN-N (online)
Any reproduction, publication and reprint in the form of a different publication, whether printed or produced electronically, in whole
or in part, is permitted only with the explicit written authorisation of the ECB or the authors.
Information on all of the papers published in the ECB Statistics Paper Series can be found on the ECB’s website:
http://www.ecb.europa.eu/pub/scientic/stats/html/index.en.html
Technical Expert Group on MFI interest rate statistics
This report has been prepared by the participants of the ESCB Technical Expert Group on sampling issues on MFI interest rate statistics.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
1
CONTENTS
ABSTRACT 2
NON-TECHNICAL SUMMARY 3
1 INTRODUCTION 6
2 ASSESSING THE SAMPLING QUALITY IN MFI INTEREST RATE STATISTICS
ON THE BASIS OF MAE MEASURES 10
2.1 Construction of the MAE indicator 11
2.2 Selection of the error estimator 12
2.3 A synthetic indicator based on the MAE 13
2.4 Results for the synthetic indicator at the national level 15
3 ASSESSING SAMPLING QUALITY IN MIR ON THE BASIS OF DEPOSIT
AND LOANS BUSINESS VOLUMES 18
4 CONCLUSIONS 20
REFERENCES 21
TABLES AND CHARTS 22
MEMBERS OF THE TECHNICAL EXPERT GROUP 25
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
2
ABSTRACT
Traditional literature on sampling techniques focuses mainly on statistical samples and covers
non-random (non-statistical) samples only marginally. Nevertheless, there has been a recent
revival of interest in non-statistical samples, given their widespread use in certain fields like
government surveys and marketing research, or for audit purposes. This paper attempts to set up
common rules for non-statistical samples in which only data on the largest institutions within
each stratum are collected. This is done by focusing on the statistics compiled by the European
System of Central Banks (ESCB) on the interest rates of monetary financial institutions (MFIs)
in countries of the European Union. The paper concludes by proposing a way of establishing
common rules for non-statistical samples based on a synthetic measurement of a mean of
absolute errors.
JEL codes
C42, E43
Keywords
sampling, interest rates and non-statistical samples
List of country abbreviations
AT Austria
DE Germany
ES Spain
FR France
GR Greece
IE Ireland
IT Italy
LT Lithuania
NL Netherlands
PL Poland
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
3
NON-TECHNICAL SUMMARY
Traditional literature on sampling focuses on statistical samples and covers non-random (non-
statistical) samples only marginally. Sampling manuals stress that it is possible only in the case
of random samples not only to extrapolate features of the sample to the whole population, but
also to assign to those estimators a certain degree of uncertainty, which represents the quality of
the estimation and the sample.
Nevertheless, non-random samples are commonly used in several fields, like US Federal
surveys, markets research and auditing or tax inspections. That is also the case of the statistics
produced by the European System of Central Banks (ESCB) on the interest rates applied by
monetary financial institutions (MFIs), the so-called MFI interest rates (MIR) statistics, which
refer to a range of deposits and loans from/to households and non-financial corporations.
The interest rates statistics are collected on the basis of harmonised definitions, which ensure
the data quality and enable meaningful cross-country comparisons.
MIR statistics are crucial for monetary policy purposes. Important insights can be gained for the
analysis of the transmission of impulses of central bank’s interest rates to the real economy and,
in particular, on the consumption and investment expenditure and indirectly affecting price
developments. In fact, the bank interest rate pass-through process is an important link in the
process of monetary policy transmission. Central banks exert a dominant influence on money
market conditions and thereby steer money market interest rates. Changes in money market
interest rates in turn affect long-term market interest rates and bank interest rates. Bank
decisions regarding the yields applied to their assets and liabilities have an impact on the
consumption and investment expenditure through the behaviour of deposit holders and
borrowers and thus on economic activity. In other words, a quicker and more exhaustive pass-
through of official and market interest rates to bank interest rates strengthens the monetary
policy transmission. MIR statistics are published in respect of all EU countries but are
especially relevant for the euro area. For this reason the paper generally refers to the EU,
pointing to the euro area where appropriate.
This paper contributes to the renewed interest in non-statistical samples by exploring how to
establish a possible common quality measure on MFI interest rate (MIR) statistics. The
motivation for these investigations is the fact that data on MFI interest rates are collected at the
national level, i.e. by each national central bank in the EU on the basis of different national
stratifications of the potential MFI reporting population and different selections of the actual
reporting institutions. In order to select the actual reporting agents within each stratum, national
central banks (NCBs) can either include all institutions in the stratum or carry out random
sampling or select the largest institutions per stratum. In the case of random sample, the random
drawing of the institution within each stratum is carried out with equal probability for all
institutions or with probability proportional to size.
In order to compile MIR statistics, a majority of the EU countries select the largest institutions
within each stratum as the actual reporting population. The reasons for the prevalence of the
selection of the largest institutions in combination with stratifications are the good knowledge
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
4
by NCBs of their respective financial system and therefore how the reporting population should
be grouped, and the cost savings good coverage implied in the selection of the largest
institutions within each stratum. A random sampling methodology is therefore deemed as not
feasible in these cases since the selection of small institutions is not cost-effective. Interest rates
are then compiled by weighting them by the respective business volumes relating to the loans
and deposits involved.
A minimum national sample size in order to ensure data quality is compulsory in all cases. This
should guarantee that the maximum random error for interest rates on average over all
instrument categories does not exceed 10 basis points at a confidence level of 90%. However, in
view of the difficulties to calculate that measure alternative minimum requirements exist in
terms of number of institutions sampled (30%) or coverage in terms of euro-denominated loans
or deposits (75%). Nevertheless, the question remains on whether a measure of quality could be
applied to data compiled through the selection of the largest institutions beyond the above
indicators of coverage.
On that basis, the paper assumes that the stratification already provides groups of institutions
with similar features under each stratum and that the selection of the largest institutions is
therefore somehow representative of the whole stratum. However, the problem remains as to
how good this representation can be considered to be and how a minimum quality threshold can
be established to ensure sufficient quality and homogeneity in the compilation of these statistics
across borders in the EU, which should also permit the computation of meaningful euro area
aggregates.
In order to establish a common measure of quality, the paper examines an estimation of the
mean absolute error (MAE), calculated by way of a three-step approach. First, it is assumed that
each stratum could theoretically be divided in two substrata, namely a substratum from which
all institutions are sampled (the “take-all substratum”) and a substratum from which no
institution is sampled (the “take-none substratum”). Then it is assumed that a measure of
dispersion for the take-all substratum can serve to estimate the possible divergence of the take-
none stratum from the take-all substratum within each stratum and for each specific MIR
statistical indicator. Several measures of dispersion are obtained for the sampled data, namely
maximum, minimum, first and third quartiles, standard deviation and a recombination of
business volumes and rates. The different dispersion measures are used to estimate alternative
scenarios for the estimated rates for the take-none substrata and to obtain different MAEs for
each stratum, weighting sampled and non-sampled estimated rates by the respective business
volumes. In a second step, the alternative scenarios of MAEs for each MIR statistical indicator
in each stratum are combined to form alternative scenarios for each MIR statistical indicator at
the level of the whole population, and the most appropriate scenario, namely the use of a
combination of first and third quartiles, was subsequently chosen. The third step consists in the
establishment of a formula (a synthetic MAE) that combines several MIR statistical indicators
in a single measure, on which a threshold replicating a confidence interval can be established.
Finally, the paper discusses whether a threshold for a synthetic MAE could actually be
complemented by a threshold on business volume coverage quotas.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
5
Subject to the usual caveats on the non-representativeness of non-statistical samples, our
empirical findings are that, under the conditions above described, it is possible to establish a
common measure of the quality of non-statistical samples for MIR statistics on the basis of a
synthetic MAE.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
6
1 INTRODUCTION
This paper is the outcome of the work undertaken by the Technical Expert Group on sampling
issues on MFI interest rate statistics of the European System of Central Banks (ESCB) from
October 2011 to June 2012.
Traditional literature on sampling focuses on statistical samples and only marginally covers
non-random (non-statistical) samples. Sampling manuals, e.g. Cochran (1977) or Kish (1995),
stress that it is possible only in the case of statistical samples not only to extrapolate features of
the sample to the whole, but also to assign to those estimators a certain degree of uncertainty,
which represents the quality of the estimation and the sample.
Nevertheless, non-random samples are commonly used in several fields, for example in US
federal surveys
1
, market research, and audit and tax inspections. The provisions of the U.S.
Office of Management and Budget (2006) for designing surveys, for instance, specify that the
use of non-probability sampling methods is permissible if it is justified statistically and possible
to measure estimation errors. Market research sampling techniques are usually less strict and
include, for example, convenience sampling, judgemental sampling, quota sampling and
snowball sampling.
2
The extended use of non-random samples has recently revived interest in the theoretical
properties of the results obtained through these methods. In Guarte (2006), for instance, a
theoretical exercise is performed on purposive samples of the central part of a population
distribution by bootstrapping different distribution functions. The results show that purposive
sampling can produce reliable results even in severely heterogeneous populations. Given its
popularity, cut-off sampling has also received some attention (see, for example, Yorgason et al.
(2011), Landry (2011), Benedetti et al. (2010), or Haziza et al. (2010)).
A cut-off-style non-statistical sample is also used by several countries for the statistics collected
by the ESCB on the MFI interest rates applied to a range of deposits and loans from/to
households and non-financial corporations.
3
Interest rates applied by monetary financial institutions (essentially banks in this context) are of
high importance for the monetary transmission mechanism and the pass-through of interest rates
to households and non-financial corporations. As a consequence, MIR statistics are very
relevant for monetary analysis and policy as they reflect how the central bank decisions on the
key interest rate is transmitted by the banking sector to the rates applied on loans and deposits to
the real economy. MIR statistics are compiled in respect of all EU countries but are especially
relevant for the euro area. For this reason the paper generally refers to the EU, pointing to the
euro area where appropriate.
In line with their relevance, data on MIR are carefully compiled in the EU. Data on MFI
interest rates are collected at the national level, i.e. by each national central bank in the EU on
1
See, for instance, Yorgason et al. (2011).
2
See, among others, Fox (2010), Goodman (1961) and Babbie (1999).
3
For comprehensive information on MIR statistics, see Regulation (EC) No 63/2002.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
7
the basis of different national stratifications of the potential MFI reporting population and
different selections of the actual reporting institutions. In order to select the actual reporting
agents within each stratum, national central banks (NCBs) can include all institutions in the
stratum, carry out random sampling or select the largest institutions per stratum. In the case of
random sample, the random drawing of the institution within each stratum is carried out with
equal probability for all institutions or with probability proportional to size.
The actual system for the compilation of MIR statistics in most of the EU countries provides
for, first, stratifying the potential reporting population (all MFIs) in homogenous strata and then
selecting the overall largest institutions within each stratum. Currently 101 different interest
rates are collected from each reporting institution, divided into MIR on outstanding amounts,
which refers to interest rates applied on the current stock of loans and deposits, and MIR on new
business which refers to interest rates applied on the new or renegotiated loans and deposits. In
MIR statistics, there is no possibility, given the implied cost of covering smaller MFIs in
particular, of replacing the current non-statistical sample either by a census or by a statistical
sample. The smallest MFIs are likely to contribute very little to overall lending and total
deposits held, and the burden imposed on them would be excessive. Moreover, reporting for
MIR statistics requires an adaptation of the MFI’s own reporting system, with a high one-off
implementation cost. MFIs randomly selected for the purposes of MIR statistics would thus
have significant and measurable start-up costs, thereby limiting the possibility of rotating
samples. In that respect, the case of MIR statistics, given the selection of the largest institutions,
resembles the cut-off sampling techniques applied in US federal surveys, as described in
Yorgason et al. (2011). The reasons for using cut-off sampling include physical efficiencies, a
limitation of costs and reporting burdens and ensuring data quality. As in the case of MIR
statistics, important aspects of the cut-off sampling are the sample selection, stratification and
the optimisation of the cut-off points, the addition of new reporting units and the key issue of
assessing the quality of estimates. As shown in Yorgason et al. (2011), it is possible to construct
boundaries and quasi-confidence intervals to assess the quality of estimates.
In line with some of the recent literature described above, this paper explores the issue of how to
establish a possible common measure of quality in MIR statistics. The starting point for the
analysis contained in this paper is the fact that data on MFI interest rates are collected at
national level, i.e. by each national central bank in the EU on the basis of different national
stratifications of the potential MFI reporting population and by selecting the largest institutions
within each stratum as the actual reporting population. In particular, Regulation ECB/2001/18
stipulates that stratification criteria “should allow the subdivision of the potential reporting
population into homogeneous strata. Strata are considered homogeneous if the sum of the intra-
stratum variances of the sampling variables is substantially lower than the total variance in the
entire actual reporting population”.
4
In addition, a minimum national sample size in order to
ensure data quality is compulsory for the design of MIR statistics, and it should guarantee that
the maximum random error for interest rates on average over all instrument categories does not
exceed 10 basis points at a confidence level of 90%. However, in view of the difficulties to
4
ECB Regulations are directly applicable to the euro area countries. Non-euro area countries generally follow
statistical ECB regulations on a voluntary basis.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
8
calculate that measure, alternative minimum requirements exist in terms of number of
institutions sampled (30%) or coverage in terms of euro-denominated loans or deposits (75%).
Interest rates are then compiled by weighting them by the respective business volumes relating
to the loans and deposits involved. On that basis, this paper assumes that the stratification
results in strata composed of institutions with similar features and that the selection of the
largest institutions is therefore non-biased. However, it is considered necessary to go one step
further and define a common method to ensure that every EMU and EU country fulfils a
minimum quality threshold. In particular, in the case of selection of the largest institutions it is
not possible to apply standard sampling theory in order to calculate a maximum error and
confidence interval that comply with the requirement above.
In order to establish a common measure of quality, the paper proposes to estimate the mean
absolute error (MAE) as follows.
First, it is assumed that each stratum can theoretically be divided in two substrata, namely a
substratum from which all institutions are sampled (the “take-all substratum”) and a substratum
from which no institution is sampled (the “take-none substratum”). This approach is common
practice in business surveys (see, for example, Landry (2011)). Second, it is assumed that a
measure of dispersion for the take-all substratum can serve to estimate the possible divergence
of the take-none stratum from the take-all substratum within each strata and for each particular
MIR statistical indicator. A similar approach is used in Benedetti et al. (2010). However, there
is a crucial difference. While Benedetti et al. (2010) apply the cut-off on the basis of the
variable to be measured, the cut-off in the case of the MFI interest rates is applied in terms of
the total loans and deposits (or the total balance sheet), while the data collected refers to interest
rates.
Several measures of dispersion for the actually sampled MIR data were obtained for each
stratum, namely maximum, minimum, first and third quartiles, standard deviation and a
recombination of business volumes and rates. The dispersion measures, each based on a
different underlying proxy for dispersion, are used to estimate alternative scenarios for the rates
estimated for the take-none substrata and to obtain different MAEs for each stratum, weighting
sampled and non-sampled estimated rates by the respective business volumes.
In a second step, the alternative scenarios for the MAE for each MIR statistical indicator in each
stratum are combined to form alternative scenarios for each MIR statistical indicator at the level
of the whole population, and the most appropriate scenario, for the whole set of MIR indicator
and countries was chosen. This most appropriate scenario is a combination of the first and third
quartiles.
5
The third step consists in the establishment of a formula (synthetic MAE) that combines several
MIR statistical indicators in a single measure, on which a threshold can be established. A
threshold is proposed on the basis of the data.
5
Section 2.4 gives more details about this choice.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
9
Subject to the usual caveats on the non-representativeness of non-statistical samples,
6
our
empirical findings are that, under the conditions described above, it is possible to establish a
common measure of the quality of samples on the basis of a synthetic MAE.
6
See for example Cochran (1977) or Kish (1995)
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
10
2 ASSESSING THE SAMPLING QUALITY IN MFI
INTEREST RATE STATISTICS ON THE BASIS OF
MAE MEASURES
In a similar way as in the case of statistical samples, a measure of quality in the case of selection
of the largest institutions intends to provide a quantitative response to the question of the
possible error due to the deviation from the non-sampled part of the population to the sampled
part. The approach proposed in this paper tries to measure the size of the potential error in the
estimated interest rate for the whole population relative to the magnitude of the interest rates by
using some reasonable assumptions on the expected behaviour of the non-sampled part of the
population. The ultimate purpose is to get a sufficiently reliable measure of quality providing
robust control against a “worse case” scenario, in which the non-sampled population deviates
significantly (e.g. due to outliers) from the sampled population.
Another relevant aspect of the approach proposed is that the error measure of the interest rates is
calculated by weighting rates by their corresponding business volumes. This is consistent with
the compilation of MIR statistics, in which interest rates are weighted by business volumes at
each level of the compilation process. The ultimate rationale is that MIR statistics aims at
calculating average interest rates by giving to each euro the same weight and, consequently,
weighting differently across institutions depending on their respective business volumes.
Alternative approaches using unweighted measures could also be considered but are not further
explored for this paper.
One further consideration to be made, partially related to the weighting, is that the approach
discussed is conditioned by the available data at NCB level, which is restricted to average rates
per MIR statistical category applied by each reporting institution, without any intra-institution
dispersion. In other words, there is no information available on the dispersion of, for example,
the rates applied to the loans given by the same institution. As a result, the approach proposed is
not independent from the number of institutions or concentration of the markets. Further
investigations in that direction would be valuable were more granular information on interest
rates on loans (e,g. through credit registers) and deposits are collected in the future.
The rest of this section describes in further detail the alternative chosen for measuring the
overall data quality in sampling for MFI interest rate statistics. As already mentioned, this
approach consists in the construction of a synthetic indicator on the basis of an estimated mean
absolute error (MAE) for a particular estimator and for each MIR statistical indicator, i.e. for
each of the different interest rates reported by reporting agents. For a given country, this
synthetic indicator would provide an aggregated measure of quality for all reported series of the
different strata in which the data is collected under some assumptions of the data distribution of
non-sampled institutions. Finally, the values of the synthetic indicator applied to the data
reported by the NCBs give possible thresholds for considering the data reported to be of good
quality.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
11
2.1 CONSTRUCTION OF THE MAE INDICATOR
A sample that is divided into j strata can theoretically also be subdivided into two substrata,
namely j
o
for non-reporting institutions, i.e. what is known as the take-none substratum, and j
1
for reporting institutions, i.e. the so-called take-all substratum of stratum j. The interest rate of
the whole population in stratum j would ideally be obtained as:


∗



∗




(1)
where i
j
is the mean interest rate for stratum j, calculated as the mean interest rate for the take-
all substratum i
j1
and the mean interest rate for the take-none sub-stratum i
j0
, weighted by the
business volumes of take-all substrata B
j1
and B
j0
.
Given that the interest rate for the take-none substratum is not reported in the sampling context,
a number of assumptions are needed to estimate the average interest rate for the stratum and to
subsequently calculate the estimated error. The core assumption is that the stratification, as
required in the Regulation ECB/2001/18, results in homogeneous strata, so that the selection of
the largest institutions does not lead to any bias. Therefore, the results obtained from the take-all
substratum are applicable to the take-none substratum.
In order to formalise this idea, a theoretical construction along the lines of a super-populations
approach is needed, as is usually the case in the literature. In particular, we can consider both
the take-all and the take-none substratum to be samples taken from a theoretical super-
population. Consequently, statistics based on the take-all substratum are usable as estimators for
the take-none substratum.
Under this assumption, the reported interest rate for the take-all substratum, i
j1
, is used as the
best estimation of the not reported interest rate for the take-none substratum i
j0
:
î
j0
= i
j1
(2)
where î
j0
is the estimated interest rate for take-none substratum, which results in:
11 0 0
1
10
ˆ
ˆ
jj j j
j
jj
Bi B i
ii
BB

(3)
where î
j
is the estimated interest rate of the whole stratum j.
The actual error of the estimator î
j
of the interest rate would be the difference between the real
and the estimated value of the interest rate within a stratum j:
11 0 0 0
101
10 10
ˆˆ
error( ) ( )
jj j j j
jjj j jj
jj jj
Bi Bi B
iii i ii
BB BB


(4)
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
12
The total error within stratum j gives an approximation of the error in calculating the interest
rate i
j
, considering the business volume estimated for the non-reporting institutions B
j0
. In cases
where i
j0
is known and coincides with i
j1
, or where B
j0
is zero, i.e. where there are no non-
reporting institutions, the total error within stratum
j
, i.e. error(î
j
), is zero. Here, it is assumed
that the business volume associated to the interest rates corresponding to the non-reporting
institutions is known (for the volume related to MIR outstanding amounts), or is at least
expected to be estimated with a negligible error (for the volume related to MIR new business).
However, given that the interest rate for the take-none substratum is not known, the error for the
estimator î
j
cannot be obtained directly, but should rather be estimated. As the error cannot be
directly calculated, because i
j0
is not reported, it is actually only possible to obtain an estimate of
the error that is calculated on the basis of the take-all distribution function, which in turn is
assumed to be representative of the super-population distribution. In other words, for each value
j
, it is possible to calculate the weighted number of observations in the take-all stratum
included in an interval around the estimated average, e.g. i
j1
-
j
, i
j1
+
j
) or an interval around
the estimated mean. Inversely, it is also possible to first define a desired level of confidence,
apply it to the take-all substratum and calculate the value of
j
that complies with the confidence
level. Then
j
would be input into the estimated error equation for the stratum:
11 0 0
11
10 10
ˆ
ˆ
ˆ
error( ) ( )
jj j j j
j
jjj
jj jj
Bi B B
iii
BB BB


(5)
The estimated errors for the different strata would be aggregated in a single MAE, by weighting
the estimated error for each stratum by the business volume of that stratum:
01
ˆ
error( ) ( )
ˆ
()
j
jj
j
iB B
MAE
B
(6)
where error(î
j
) is as defined in equation (5), B
j1
and B
j0
are the volumes previously defined in
equation (1) and

0

1
is the total volume of all institutions in the whole population. The MAE can be interpreted as a
measure for which all the individual differences per stratum are weighted by the volume within
each stratum.
2.2 SELECTION OF THE ERROR ESTIMATOR
In order to carry out the estimations described in the previous section, values need to be found
that could be used for the estimator
, taking into account the implied pseudo confidence
interval attached to it in order to measure the MAE in a particular stratum j. For that purpose,
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
13
interest rates per institution weighted by their corresponding business volumes are put together,
resulting in a frequency distribution. On the basis of that distribution, several possible values for
the error estimator are selected. The possible values initially considered for
in a stratum j
include:
the minimum error estimator, i.e.
j
= min
j1
, defined as the lowest interest rate reported
for the MIR statistical category by the institutions in the stratum;
the maximum error estimator, i.e.,
j
= max
j1
, defined as the highest interest rate
reported for the MIR statistical category by the institutions in the stratum;
the first and third quartiles, i.e.
j
= 1
j1
and respectively
j
= 3
j1
, are defined as the
interest rate reported for the MIR statistical category for which 25% (and 75%
respectively) of the reported interest rates are lower than that number. The first and
third quartiles are calculated by previously weighting the rates by the business volumes
in that category and stratum.
the 2-sigma standard deviation, i.e.
j
= i
j1
± 2б
j1
, is the result of adding or subtracting
two times the standard deviation of the rates reported for the MIR statistical category by
the institutions in the stratum to the stratum-weighted average interest rate for the MIR
statistical category;
the lower and upper bounds recombining rates and volumes, i.e.
j
= LT
j1,up
and
j
=
LT
j1,low
, calculated by independently ranking the rates and volumes, and weighting the
rates by ranked volumes in direct or reverse order.
2.3 A SYNTHETIC INDICATOR BASED ON THE MAE
The MAE defined in Section 2.1 depends on the volatility and the magnitude of each series.
Some series could have a higher MAE, which could be due indirectly to the magnitude of the
interest rates, rather than to their relative level of dispersion. Moreover, since each individual
series would have a different MAE, it could be very difficult to establish an overall boundary
that would be representative for each particular country and series. In addition, series with a
high MAE but a low volume might distort the overall interpretation. A possible solution for
overcoming the problem of having an individual MAE for each particular series and at the same
time providing a single MAE figure would be to construct a synthetic MAE by weighting each
series by its respective volume and dividing it by its interest rate.
Expressed in greater detail, the synthetic MAE
S
for a given estimator
in a particular period can
be defined as:
))1/(1((
1
*
*)
ˆ
(
)
ˆ
(
11 jj
j
k
k
jj
S
iiB
BMAE
MAE
(7)
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
14
1
1
1
1
ˆ
()
ˆ
()
j
k
k
j
j
S
j
j
i
B
BMAE
MAE
i
(8)
Where MAE(
j
) is the MAE defined in the previous section for each series, B
j
is the total
volume B
j
= B
j0
+ B
j1
reported for this series and i
j1
is the (reported) aggregated interest rate of
this series in the particular period.
The synthetic MAE aggregates the MAEs for individual MIR statistical indicators by first
expressing them in relative terms in respect of the interest rate level and by weighting them by
their relative business volumes. The expression in terms of the rate level is calculated by using a
modified interest rate,
1
1 i
i
, in the denominator (instead of simply i), in order to avoid a too
large effect on rates very close to zero).
7
As shown in Figure 1, the modified interest rate
approaches the original interest rates for high interest levels, but does not fall below 1. In this
way, any possible impact of very low interest rates on the synthetic MAE is avoided.
Figure 1 Original and modified interest rate used in the formula for a synthetic MAE
This synthetic construction thus represents an efficient way of condensing the detailed
information on sampling errors for each estimator and series in a single figure. It should be
7
A slightly more general form of this smoothing formula is (1 + i
β
)
1/β
with β > 1, which approximates that described
above for
1.72
and which converges on min(l, i) for

.
0
2
4
6
8
10
12
024681012
interest rates
interest rates
original
modified
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
15
noted that the synthetic MAE is expressed in terms of pure units, as the formula includes both
rates and volumes in the numerator and the denominator.
2.4 RESULTS FOR THE SYNTHETIC INDICATOR AT THE NATIONAL LEVEL
Each of the NCBs in the European Union that participated in the exercise organised by the
Technical Expert Group calculated the MAE, as defined in Section 2.2, for the 43 series set out
for new business and outstanding amounts in Regulation ECB/2001/18 and then aggregated
each series to obtain national figures. The calculations were carried out for five different
periods, namely for September 2010, December 2010, March 2011, June 2011 and September
2011. A posteriori, these calculations were used to construct a synthetic MAE, as defined in
Section 2.3, for new business and for outstanding amounts, based on the first and third quartile
estimators (Q1 and Q3 respectively), with due consideration of their theoretical and empirical
robustness. From a theoretical point of view the selection measures like quartiles which are not
much influenced by extreme values seem the most appropriate. Furthermore the fact that the
distribution of interest rates is most probably not symmetric for most of MIR statistical
indicators supports the use of an average of quartiles rather than the separate use of Q1 or Q3.
Empirical results from the EU countries that performed the exercise (see Tables 1 and 2) also
support the theoretical considerations, resulting in stable indicators showing no sign of large
disparities along countries or time volatility.
The reasons why the remaining indicators were directly discarded depended on the estimator
under consideration. As “outliers” often play a significant role in the construction of statistics,
the minimum and maximum indicators should be interpreted as conceptually extreme terms of
reference for the MAEs (and not be used as actual measures of accuracy); similar caution in the
interpretation of the results should be adopted when the 2-standard deviations indicator is
considered, as the average interest rate increases by two times the standard deviation in the
weighted distributions. The outcome with respect to the interest rate may in this case end up
lying outside the distribution of the reported rates, and this could imply a very large MAE value
for a particular stratum. This outcome could be magnified when the calculus refers to strata
comprising a relatively small number of reporting agents.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
16
Tables 1 and 2 provide the results of the synthetic MAE for the mean of the first quartile (Q1)
and third quartile (Q3) estimators applied in each country. The figures are given in pure units, as
the synthetic MAE does not have a particular unit of measurement.
Table 1 Synthetic MAEs for new business (average of the five periods)
AT DE ES FR GR IE IT LT NL PL
Q1
1.43 1.40 0.67 1.41 0.14 0.70 1.02 0.51 1.19 0.46
Q3
1.35 1.11 0.60 1.17 0.16 0.88 1.02 0.37 0.56 0.81
Mean of Q1 and Q3
1.39 1.25 0.63 1.29 0.15 0.79 1.02 0.44 0.88 0.64
Table 2 Synthetic MAEs for outstanding amounts (average of the five periods)
AT DE ES FR GR IE IT LT NL PL
Q1
2.40 1.98 0.45 0.87 0.16 1.11 1.87 0.19 0.58 3.76
Q3
2.18 1.95 0.46 0.58 0.18 1.06 1.62 0.23 0.56 2.73
Mean of Q1 and Q3
2.29 1.97 0.46 0.73 0.17 1.09 1.74 0.21 0.57 3.25
It is important to note that the results that appear in Table 1 and Table 2 are not significant in
isolation. In order to make them comparable, it is necessary to use the synthetic MAE formula
to express a threshold in terms of the same units. A threshold for the synthetic MAE can be
calculated by assuming than the largest MAE dispersion at each stratum is not larger than 10
basis points, which is the current requirement with respect to the minimum sample size, in
particular the maximum deviation of 10 basis points, in Regulation ECB/2001/18.
8
In that case, the formula specified above can be rewritten as follows:
))1/(1((
1
**1.0
))1/(1((
1
*
*1.0
)
ˆ
(
1111 jj
j
j
j
j
jj
j
j
j
j
S
iiB
B
iiB
B
MAE
(9)
The expression in equation (9) can then be used as a possible threshold for assessing the quality
of each country. The results at the national level are presented for both new business and
outstanding amounts in the tables below.
8
Regulation ECB/2001/18, Annex I, Part 1, Section IV.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
17
Table 3 Synthetic MAEs for new business (average of the five periods)
AT DE ES FR GR IE IT LT NL PL
Threshold
4.54 4.26 3.82 3.80 2.43 4.07 3.71 4.81 4.47 2.47
Table 4 Synthetic MAEs for outstanding amounts (average of the five periods)
AT DE ES FR GR IE IT LT NL PL
Threshold
4.67 4.07 3.90 3.92 3.60 3.68 4.78 4.98 3.42 2.64
Comparing the results of Tables 3 and 4 with those presented in Table 1 and 2 shows that the
threshold is well above the actual synthetic MAE in all cases, i.e. that all countries would
comply with their individual threshold estimated by using the limit of 10 basis points currently
set out in the Regulation.
A single common threshold could be established for the whole EU/euro area by taking an
average or a figure in the range of 3-5 units. This measure could potentially substitute the
current requirements in MIR statistics on minimum national sample size.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
18
3 ASSESSING SAMPLING QUALITY IN MIR ON THE
BASIS OF DEPOSIT AND LOANS BUSINESS
VOLUMES
Taking a different perspective, the synthetic MAE is ultimately calculated as a function of
interest rates and corresponding business volumes. While the rates and volumes are combined at
the stratum level, the relationship between the synthetic MAE, as defined above, and the overall
coverage in terms of business volumes (reported separately by NCBs) can be assessed.
As presented in Figure 2, there is a relevant correlation between the overall volume coverage
and the synthetic MAE estimator for the euro area countries participating in the exercise. This
relationship between synthetic MAE and volume coverage seems to indicate that there should
be no reason to have a synthetic MAE beyond a certain threshold unless the reported volume
considerably decreases.
Figure 2 Country volume coverage versus MAE
The annex also includes synthetic MAEs for different categories of series, in particular for loans
to, and deposits from, households and non-financial corporations.
A possible alternative would be to focus only on the coverage of the total volume data, which
could be implemented by defining a certain volume threshold. Although this measure would be
very easy to calculate, the main disadvantage is that it would ignore interest rate dispersion and
some aspects of the sample features. By looking at Table 5, for those EU countries that
y = -0.1987x + 1.0475
R² = 0.6804
70%
75%
80%
85%
90%
95%
100%
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80
Volume coverage (percentages)
Mean of MAE(Q1) and MAE(Q3) (all periods; basis points)
GR
ES
IE
NL
IT
AT
DE
FR
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
19
participated in the exercise, the percentage of the total volume covered in MIR statistics is
generally high, but in practice the volume is that of the largest credit institutions within a
country. If we use the threshold calculated for the MAE of 3 units, which corresponds to the
largest MAE observed in the previous section, by using the regression line shown in Figure 1,
this would be equivalent to covering broadly 50% of the total volume.
Table 5 Country coverage in terms of business volumes and credit institutions sampled
AT DE ES FR GR IE IT NL
Total volume
84% 75% 92% 72% 99% 93% 81% 95%
Percentage of the number of credit
institutions sampled
15% 12% 35% 38% 56% 20% 14% 18%
Figure 3 shows that the number of credit institutions is correlated with the total volume reported
and that, in fact, countries with a smaller number of credit institutions have larger volume
coverage than those with a larger number of credit institutions.
Figure 3 Total volume sampled and total number of credit institutions, by country
(as of May 2012)
In the event that it is decided to assess the quality of business volumes in MIR statistics, a
certain threshold should be defined that would be feasible for all countries, based on the current
reporting scheme. A possibility might be to define this threshold on the basis of the figures for
the exercise.
y = -0.0001x + 0.9713
R² = 0.7306
60%
65%
70%
75%
80%
85%
90%
95%
100%
0 500 1000 1500 2000
Volume coverage (percentages)
Number of credit institutions
GR
ES
IE
NL
IT
AT
DE
FR
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
20
4 CONCLUSIONS
MIR Statistics refer to interest rates applied by monetary financial institutions (MFIs) to
deposits and loans vis-à-vis households and non-financial corporations. These statistics are
important for monetary policy, and, in particular, for consumption and investment expenditure
and indirectly affecting price developments.
The MFI interest rates statistics are collected by NCBs on the basis of harmonised definitions,
which ensure the data quality and enable meaningful cross-country comparisons. They are
collected, by dividing the potential reporting population into strata and selecting the largest
institutions within each stratum.
This paper has revisited the quality of sampling for MIR statistics from a new perspective, by
not applying sampling theory strictly, but rather using some simpler assumptions on the possible
estimation of errors. That approach is somewhat similar to the methods used in other cut-off
samples in the recent literature. In contrast to these methods, however, there is no information
available in the case of MIR statistics on the variable studied, namely the interest rate, for the
overall population. In order to address that issue, a number of assumptions are used in this
paper, namely that the stratification in MIR statistics results in homogeneous strata, that there is
no correlation within each stratum between the interest rate and the size of the institution, and
that the possible error due to the use of estimated new business volumes is small and need not
be considered. On that basis, the paper finds that the selection of the largest institutions can be
accepted in a scheme with two substrata, a take-all and a take-none. Furthermore, both substrata
can be deemed to be samples of a super population and, therefore, the statistics obtained from
the take-all substratum can be applied to the take-none substratum. Several possible measures of
dispersion obtained from the take-all substratum were selected and applied to national data.
These measures were applied at the level of each interest rate indicator, expressed relative to the
level of interest rates and then aggregated by weighting them by the respective business volume
into a single indicator for the main categories, new business and outstanding amounts.
The results showed (a) the usability of the proposed synthetic MAE to measure the quality of
non-random samples, (b) its applicability to the case of MIR statistics, in particular that based
on the combined use of the first and third quartiles offers the most robust behaviour both over
time and across countries, (c) confirmation of the quality of the current MIR data, given that
empirical results on the MAE compares favourably to a measure of deviation in the order of
10 basis points across all strata, (d) the possibility of establishing a common minimum
requirement for the quality of the sample in terms of MAE (e.g. 3-5 units). It could also be
conceivable to apply that threshold in combination with other measures (e.g. on business
coverage) in order to ensure that the current coverage in terms of business volumes is kept.
Finally, the paper finds that the assumptions established in order to reach the conclusions should
be the subject of further research.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
21
REFERENCES
Babbie, E. (1999), The basics of social research, Thomson, Wadsworth.
Baillargeon, S. and Rivest, L.P. (1995), A general algorithm for univariate stratification,
Université Laval, Quebec City.
Benedetti, R., Bee, M. and Espa, G. (2010), “A framework for cut-off sampling business survey
design”, Journal of Official Statistics, Vol. 26, No 4, pp. 651-671.
Cochran, W. (1977), Sampling techniques, John Wiley.
Fox, R.J. (2010), “Non-probability sampling”, Wiley International Encyclopaedia of Marketing.
Goodman, L.A. (1961), “Snowball sampling”, Annals of Mathematical Statistics, Vol. 32, No 1,
pp. 148–170.
Guarte, J.M. and Barrios, E.B. (2006), “Estimation under purposive sampling”,
Communications in Statistics – Simulations and Computation, Vol. 35, pp. 277-284.
Haziza, D., Chauvet, G. and Deville, J.C. (2010), “Sampling and estimation in the presence of
cut-off sampling”, Australian & New Zealand Journal of Statistics, Vol. 52, No 3, pp. 303-319.
Kish, L.(1995), Survey sampling, John Wiley & Sons.
Landry, S. (2011), “Managing response burden by controlling sample selection and survey
coverage”, 2011 Joint Statistical Meetings, American Statistical Association.
Tallarini, T.D. Jr. (2000), “Risk-sensitive real business cycles”, Journal of Monetary
Economics, Vol. 45, No 3, pp. 507-532.
Regulation (EC) No 63/2002 of the European Central Bank of 20 December 2001 concerning
statistics on interest rates applied by monetary financial institutions to deposits and loans vis-à-
vis households and non-financial corporations (ECB/2001/18)
Regulation (EC) No 290/2009 of the European Central Bank of 31 March 2009 amending
Regulation (EC) No 63/2002 (ECB/2001/18) concerning statistics on interest rates applied by
monetary financial institutions to deposits and loans vis-à-vis households and non-financial
corporations (ECB/2009/7)
U.S. Office of Management and Budget (2006), Standards and Guidelines for Statistical
Surveys, September.
Yorgason, D., Bridgman, B., Cheng, Y., Dorfman, A.H., Lent, J., Liu, Y.K., Miranda, J. and
Rumburg, S. (2011), “Cutoff Sampling in Federal Establishment Surveys: An Inter-Agency
Review”, 2011 Joint Statistical Meetings, American Statistical Association.
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
22
TABLES AND CHARTS
SYNTHETIC MIR SERIES, WEIGHTED BY VOLUME AND BY INTEREST RATE
(total new business)
AT DE ES FR GR IE IT LT NL PL
Q1
1.43 1.40 0.67 1.41 0.14 0.70 1.02 0.51 1.19 0.46
Q3
1.35 1.11 0.60 1.17 0.16 0.88 1.02 0.37 0.56 0.81
Mean of Q1 and Q3
1.39 1.25 0.63 1.29 0.15 0.79 1.02 0.44 0.88 0.64
SYNTHETIC MIR SERIES, WEIGHTED BY VOLUME AND BY INTEREST RATE
(deposits received from households)
AT DE ES FR GR IE IT LT NL PL
Q1
1.93 2.02 0.67 0.86 0.11 0.06 2.08 0.92 1.84 0.43
Q3
2.36 1.65 0.77 0.43 0.12 0.07 2.77 0.65 1.70 0.95
Mean of Q1 and Q3
2.14 1.83 0.72 0.65 0.12 0.07 2.43 0.78 1.77 0.69
0.0
0.5
1.0
1.5
2.0
AT DE ES FR GR IE IT LT NL PL
Q1
Q3
mean of Q1
and Q3
0.0
1.0
2.0
3.0
AT DE ES FR GR IE IT LT NL PL
Q1
Q3
mean of Q1
and Q3
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
23
SYNTHETIC MIR SERIES, WEIGHTED BY VOLUME AND BY INTEREST RATE
(deposits received from non-financial corporations)
AT DE ES FR GR IE IT LT NL PL
Q1
0.80 0.37 1.13 2.09 0.17 1.92 0.07 0.00 2.49 0.44
Q3
0.72 0.28 0.74 1.51 0.25 2.51 0.07 0.00 0.95 0.82
Mean of Q1 and Q3
0.76 0.33 0.93 1.80 0.21 2.22 0.07 0.00 1.72 0.63
SYNTHETIC MIR SERIES, WEIGHTED BY VOLUME ONLY AND BY INTEREST RATE
(loans to households)
AT DE ES FR GR IE IT LT NL PL
Q1
3.41 1.58 0.77 0.79 0.10 0.11 1.03 0.86 0.09 0.73
Q3
2.78 1.38 0.67 0.79 0.12 0.13 0.96 0.63 0.11 1.02
Mean of Q1 and Q3
3.10 1.48 0.72 0.79 0.11 0.12 0.99 0.74 0.10 0.87
0.0
1.0
2.0
3.0
AT DE ES FR GR IE IT LT
N
LPL
Q1
Q3
mean of Q1
and Q3
0.0
1.0
2.0
3.0
4.0
AT DE ES FR GR IE IT LT NL PL
Q1
Q3
mean of Q1
and Q3
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
24
SYNTHETIC MIR SERIES, WEIGHTED BY VOLUME AND BY INTEREST RATE
(loans to non-financial corporations)
AT DE ES FR GR IE IT LT NL PL
Q1
0.97 1.79 0.43 1.69 0.29 0.22 0.75 0.12 0.31 0.66
Q3
0.75 1.35 0.44 1.52 0.26 0.21 0.56 0.16 0.24 0.59
Mean of Q1 and Q3
0.86 1.57 0.43 1.61 0.27 0.21 0.66 0.14 0.28 0.62
SYNTHETIC MIR SERIES, WEIGHTED BY VOLUME AND BY INTEREST RATE
(total outstanding amounts)
AT DE ES FR GR IE IT LT NL PL
Q1
2.40 1.98 0.45 0.87 0.16 1.11 1.87 0.19 0.58 3.76
Q3
2.18 1.95 0.46 0.58 0.18 1.06 1.62 0.23 0.56 2.73
Mean of Q1 and Q3
2.29 1.97 0.46 0.73 0.17 1.09 1.74 0.21 0.57 3.25
0.0
1.0
2.0
AT DE ES FR GR IE IT LT
N
LPL
Q1
Q3
mean of Q1
and Q3
0.0
1.0
2.0
3.0
4.0
AT DE ES FR GR IE IT LT
N
LPL
Q1
Q3
mean of Q1
and Q3
ECB Statistics
Paper Series No 3
/ Septembe
r
2013
25
MEMBERS OF THE TECHNICAL EXPERT GROUP
European Central Bank Piotr Bojaruniec
Javier Huerga
Sébastien Pérez-Duarte
Josep Maria Puigvert
Patrick Sandars
Danmarks Nationalbank Justyna Anna Wijas-Jensen
Rasmus Kofoed Mandsberg
Deutsche Bundesbank Christiane Hofer
Jörg Reddig
Central Bank of Ireland Jean Goggin
Bank of Greece Starida Eleni
Vasilis Georgakopoulos
Stamatina Nega
Banco de España Antonio Casado
Banque de France Jérémi Montornes
Banca d’Italia Maria Rosaria Buzzi
Massimiliano Stacchini
Lietuvos Bankas Tomas Švedas
De Nederlandsche Bank Wim Goes
Oesterreischische National Bank Martin Bartmann
Norodowy Bank Polski Norbert Ciesla
Bank of England Fenella Maitland-Smith
Anisha Tibrewal
This paper is the outcome of the work undertaken by the Technical Expert Group on sampling
issues on MFI interest rate statistics of the European System of Central Banks (ESCB) from
October 2011 to June 2012.
9
The Technical Expert Group was chaired by Javier Huerga. The
paper was edited out by Javier Huerga, Sébastien Pérez-Duarte and Josep Maria Puigvert on
behalf of the group.
10
9
The Technical Expert Group was a temporary group reporting to the Working Group on Monetary and Financial
Statistics (WG MFS) chaired by Jean-Marc Israël. The Technical Expert Group would like to express its thanks to
the WG MFS and its Chair for their support and trust.
10
The editors are grateful for the comments provided by Aurel Schubert, Jean-Marc Israël and Patrick Sandars. The
editors would also like to thank Piotr Bojaruniec for his technical assistance, Henry Meyer for the language edition,
and Beatriz Sanz and the Editorial Board of the European Central Bank Statistics Paper Series (SPS) for their
comments.