Benchmarking and Reconciliation With Time-Varying Cross-Coefficients

José Luis Rojo-García*^a, José Antonio Sanz-Gómez^a

[a] Department of Applied Economics, Faculty of Business and Economics, University of Valladolid, Valladolid, Spain.

Methodology, 2020, Vol. 16(4), 316–334, https://doi.org/10.5964/meth.4529

Received: 2019-09-27. Accepted: 2020-02-29. Published (VoR): 2020-12-22.

*Corresponding author at: University of Valladolid, Department of Applied Economics, Avda. Valle Esgueva, 6. 47011-Valladolid, Spain. E-mail: rojo_s@eco.uva.es

This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

In this paper, the authors propose a method to obtain explicit solutions for simultaneous benchmarking and reconciliation problems for a system of variables when the cross-restrictions use time-varying coefficients. The method is based on a hierarchical Bayesian model with a normal-gamma specification for the prior distributions. The proposed solution provides explicit (not sequential) feasible estimations, including measurements for its statistical accuracy. One interesting feature of the proposed procedure is that it allows users to include one or several performance indicators and to estimate disaggregated values for incomplete years. The method is applied to obtain Quarterly Regional Accounts for the Spanish economy.

Keywords: time series reconciliation, time series benchmarking, Bayesian models, regional accounts

In Applied Statistics, the term reconciliation usually refers to adjusting several time series, corresponding to sectoral or geographically disaggregated areas, to their aggregated total. Time series reconciliation is often combined with another concept known as temporal benchmarking, which provides an adjustment of the estimated high-frequency series to known low-frequency temporal series.

Even though the problem was first posed over seventy years ago (Deming & Stephan, 1940), new tools, and in particular new information technology tools, have sparked revived interest in the matter. A non-exhaustive list of old key references includes Di Fonzo and Filosa (1987), Cholette (1988), Chen and Dagum (1997), or Di Fonzo (1994, 2003).

Mention should be made of two highly innovative contributions. First, Di Fonzo and Marini (2005, 2011, 2015) used the so-called Denton’s movement preservation principle (Denton, 1971) and Causey and Trager’s (1981; available as an Appendix in Bozik & Otto [1988]) growth rates preservation principle to solve several reconciliation problems, including temporal or contemporaneous aggregation constraints, one or two-way time series systems or marginal benchmarking problems.

Second, the book by Dagum and Cholette (2006) analyses and solves benchmarking, calendarisation or reconciliation problems by using regression-based models. Said authors also derive solutions based on autoregressive integrated moving average (ARIMA) and structural time series models. Their approach includes distribution or interpolation problems. As in the Di Fonzo and Marini (2005) papers, the authors deal with one and two-way problems.

Present-day interest in studying the main topics concerning benchmarking and reconciliation for time series is evidenced by two recent publications: Firstly, the new version of the Quarterly National Accounts (QNA) Manual (International Monetary Fund [IMF], 2018) which dedicates chapter 6 to the systematic review of benchmarking methods that are relevant for compiling Quarterly National Accounts, although these methods may indeed prove helpful in addressing other problems, and secondly a special issue of Statistica Neerlandica (Chen, Di Fonzo, & Mushkudiani, 2018) devoted to the state-of-the-art of these topics. Several papers address specific aspects and tools (for example, Chen, Di Fonzo, Howells, & Marini 2018; Guerrero & Corona, 2018; or Bisio & Moauro, 2018, among others), whilst others offer a compilation of methods or procedures. For example, Quilis’ (2018) paper analyses different benchmarking procedures "in terms of practical feasibility, ease of use, and availability of dedicated software" (p. 448).

In this paper, the authors propose a method applicable to problems simultaneously involving both reconciliation and temporal benchmarking. The technique is herein applied to a Laspeyres-type volume index by employing a method derived from a proposal by Rojo and Sanz (2017), modified for use when cross-sectional restrictions employ weighted sums with time-varying weights. The solution is one-step, thereby simultaneously optimising benchmarking and reconciliation. The relevance of the problem is evident, for example, in the QNA Manual (IMF, 2018, pp. 194-195) which explores the possible inconsistences among aggregate QNA and their disaggregated estimates, with these inconsistencies deriving from the non-additivity of the Annual-Overlapping method. Said manual suggests reducing this inconvenience by presenting only percent measures of components’ contribution to the aggregated variable. The transversal non-additivity for the popular Annual-Overlapping method is a key challenge in cross-sectoral reconciliation for quarterly estimates of volume index.

The proposed method allows several indicators to be used, and does not require them to be approximations for the value to be estimated. It should also be pointed out that the stochastic nature of the model proposed by the authors enables the dispersion of the solution obtained to be estimated, thereby providing Bayesian confidence intervals, see Zellner (1971, pp. 27-28), for said solution, and also obtains the expression of the linear model which relates the indicators to the high-frequency series to be estimated.

Di Fonzo and Marini (2015) or Dagum and Cholette (2006) propose alternative methods, albeit focusing on the case in which (only) one indicator approximates the high-frequency series to be estimated. Cuevas et al. (2011, 2015) designed a method focused on benchmarking and reconciliation for National and Regional Accounts. These authors resolve the two aspects separately and, therefore, do not ensure that the final solution respects the set of restrictions. Other methodological differences concern the use of indicators. Although the Cuevas et al. (2015) method does allow for the use of multiple indicators, it initially combines them by means of dynamic factor analysis, such that in methodological terms it is a single indicator procedure like the previous ones.

In the following section, a detailed description is given for the assumptions and development of the proposed methodology, obtaining the explicit expression of the solution. The third section applies the proposed solution to obtain Quarterly Regional Accounts (QRA) for Spanish regions, with both the Annual National Accounts (ANA) and QNA for the whole of Spain being known. The method may obviously be applied to any country’s QNA and ANA. This example uses the same data as in the Cuevas et al. (2011) proposal, albeit over a longer time period. The final section summarises the most relevant findings. The work includes Supplementary Materials that contains colour tables and illustrations that are of collateral importance.

Bayesian Benchmarking and Reconciliation in the Context of Time-Varying Aggregation

Let $a_{T}^{i}, T = 1,..., N, i = 1,..., R$ be a non-stochastic annual variable, observed for $N$ consecutive years and for $R$ disaggregated entities (typically, the disaggregation is linked to a sectoral or geographic classification).

We also assume the annual variables resulting from the disaggregation over the classification, $a_{T}^{i}, T = 1,..., N$ , to be known. Even though in the simplest applications the aggregation is achieved through either the sum or the mean, we consider a more general scheme such that the aggregate series is a ‘linear combination’ of the disaggregated ones, with the coefficients of the combination being time-varying, however. Specifically, we suppose

1

a_{T} = \sum_{i = 1}^{R} c_{T}^{i} \cdot a_{T}^{i}, T = 1,..., N

The high or sub-annual frequency (usually monthly or quarterly)¹ aggregated series is also assumed to be known,

q_{t, T}, T = 1,..., N, t = 1,..., m

where $t, T$ refers to period $t$ of year $T$ , and where the number of periods for a year will be denoted by $m$ . There may be an additional ‘incomplete’ year, in other words, with $r$ sub-annual periods to be estimated, $q_{t, N + 1}, t = 1,..., r$ , for $r < m$ .

The aim is to estimate the sub-annual series for each disaggregated area,

q_{t, T}^{i}, i = 1,..., R, t = 1,..., m, T = 1,..., N

including, if any, the high-frequency values relative to the incomplete year, $q_{t, N + 1}^{i}, t = 1,..., r$ , with $r < m$ .

We assume that the cross-aggregation links at the sub-annual level are the same as are used for the annual series; that is to say

2

q_{t, T} = \sum_{i = 1}^{R} c_{T}^{i} \cdot q_{t, T}^{i}, t = 1,..., m, T = 1,..., N

considering, if appropriate, the same relation for the incomplete year.

Finally, we should establish the temporal aggregation scheme for both levels of cross-aggregation. The arithmetic mean has been taken, although other classical schemes (sum, first or last values) are developed in an analogous fashion. Specifically, we assume

a_{T} = \frac{1}{m} \sum_{t = 1}^{m} q_{t, T}, T = 1,..., N

and

3

a_{T}^{i} = \frac{1}{m} \sum_{t = 1}^{m} q_{t, T}^{i}, i = 1,..., R, T = 1,..., N

All of the above classical schemes are consistent with the Equation 1 and Equation 2 cross-aggregation schemes, including the equality of coefficients $c_{T}^{i}, T = 1,..., N, i = 1,..., R$ for both expressions.

Before building the relevant distributions, we should stack the relevant series in matrix form. Specifically, we denote by $q^{i} *$ the column vector including the sub-annual chained series for the $i$ -th individual area, i.e., for $i = 1,..., R$ , $q^{i} * = (q_{1,1}^{i},..., q_{m, N}^{i})'$ , if the last year is complete, and $q^{i} * = (q_{1,1}^{i},..., q_{r, N + 1}^{i})'$ for incomplete last year schemes. We also denote by $q *$ the contemporaneously aggregated chain series $q * = (q_{1,1}^{},..., q_{m, N}^{})'$ or $q * = (q_{1,1}^{},..., q_{r, N + 1}^{})'$ for both schemes.

In addition, we denote by $a * = (a_{1}^{},..., a_{N}^{})'$ the column vector of annual chained series for the aggregated area, and by $a^{i} *$ the column vector of annual chained series for the $i$ -th individual area, i.e., $a^{i} * = (a_{1}^{i},..., a_{N}^{i})', i = 1,..., R$ . The cross-disaggregated series may then be stacked into the column vectors

x = (q^{1} *',..., q^{R} *')'

and

y = (a^{1} *',..., a^{R} *')'

Before continuing, one clarification concerning the dimensions of the matrices involved should be made; we denote by $d_{x} = N m R$ the column dimension of $x$ , $d_{y} = N R$ the column dimension of $y$ , and by $d_{q} = N m$ the number of high-frequency periods. These dimension values are, respectively, $d_{x} = (N m + r) R$ , $d_{y} = N R$ and $d_{q} = N m + r$ for incomplete year schemes. Obviously, $d_{x} = R d_{q}$ . By using this notation, we can compactly write sub-annual data as

q^{i} * = (q_{1}^{i},..., q_{d_{q}}^{i})', i = 1,..., R, q * = (q_{1}^{},..., q_{d_{q}}^{})'

In sum, we have the annual and sub-annual series for the aggregated area, respectively, $a *$ and $q *$ . We also have the annual series $a^{i} *$ for the different disaggregated areas, $i = 1,..., R$ . The main aim is to estimate the sub-annual series for those disaggregated units, $q^{i} *, i = 1,..., R$ , combined with the appropriate measures for the accuracy of the estimation.

As usual, Bayesian strategy initially states the prior distributions for the parameters and variables involved. These distributions include a certain number of non-random hyperparameters, whose values shall be stated by using allocation procedures. In addition, we then establish a behavioural linear model explaining the sub-annual variables as a function of one or more relevant indicators or approximated series. This linear model allows the likelihood function to be established and, consequently, the posterior distribution for the quarterly series and other relevant parameters to be derived.

Joint Prior Distribution

Following Rojo and Sanz (2005, 2017), the authors propose a hierarchical Bayes normal-gamma approach. Specifically, we assume a normal conditional prior density for $x$ , given the precision, $τ$ ,

π (x | τ, y) \propto τ^{d_{x} / 2} \exp \{- \frac{τ}{2} [(x - μ)' P (x - μ) + x' D' D x]\}

obeying restrictions (Equation 2) and (Equation 3), where $μ$ is the $(d_{x} \times 1)$ average vector for $x$ , $τ P$ is the $(d_{x} \times d_{x})$ precision matrix (the inverse of the variance-covariance matrix), and $D = I_{R} \otimes D^{*}$ , with $I_{R}$ being the identity matrix of order $R$ and $D^{*}$ the $(d_{q} - 1) \times d_{q}$ first-difference matrix² (with rank $d_{q} - 1$ ). A greater degree of smoothness can be achieved by substituting $D$ for $D_{2} = I_{R} \otimes D_{2}^{*}$ , with $D_{2}^{*}$ being the $(d_{q} - 2) \times d_{q}$ matrix of rank $d_{q} - 2$ that provides second order differences. Although only first order differences are used in the formal derivation of the estimates, the results for second order differences are obtained by simply replacing $D' D$ with $D_{2}^{}' D_{2}$ in the following formulae.

We assume a gamma³ prior distribution for $τ$ ,

4

π (τ |y) \propto τ^{α - 1} e^{- τ β}, τ > 0

In sum, we obtain a normal-gamma prior joint distribution for $x$ and $τ$ ,

5

π (x, τ |y) \propto τ^{d_{x} / 2 + α - 1} \exp \{- \frac{τ}{2} [2 β + (x - μ)' P (x - μ) + x' D' D x]\}

where $x$ obeys restrictions (Equation 2) and (Equation 3).

The prior distributions include a large number of parameters and, generally, a limited amount of data. These parameters are $α$ and $β$ (from prior distribution for $τ$ ), matrices $\bar{δ}$ and $P_{δ}$ (involved in the prior for $δ$ ), matrices $μ$ and $P$ from the distribution for $x$ and, finally, matrix $P_{ε}$ from the likelihood (see below).

Some authors suggest the use of Jeffreys’ rule to assign ‘vague’ or ‘non-informative’ priors. Examples include Box and Tiao (1973), Zellner (1971) or Gamerman (1997), among others. Other authors, Young (1996), Spiegelhalter et al. (1999), Broemeling (1985) or Rojo and Sanz (1999), proposed using ‘almost’ or ‘approximately’ non-informative distributions as priors.

Non-informative priors are accurate for problems in which we could take large-sized samples, allowing for a dominance of likelihood over priors. By contrast, for many Macroeconomic and Psychological studies, or for numerous analyses related with Natural Sciences, we have only a small amount of statistical information and, consequently, prior distributions dominate likelihood. The authors have chosen the option of using ‘informative’ priors, thus allocating reasonable values for the hyperparameters by using the information included in the marginal prior for the relevant parameters and variables. Readers can find the method in Rojo and Sanz (2005).

Likelihood Function and Posterior Joint Distribution

In order to establish the likelihood function, we assume a linear model relating the disaggregated sub-annual series with $k_{i}$ indicators (maybe, thus, in different amounts for each disaggregated entity)

q_{t, T}^{i} = \sum_{j = 1}^{k_{i}} δ_{j i} z_{t, T}^{j i} + ε_{t, T}^{i}, i = 1,..., R, t = 1, ..., m, T = 1,..., N

The model has a similar expression if the last year is incomplete.

We now write a matricial version for the proposed likelihood, which will provide more compact reasoning. We denote $δ = (δ_{1}^{'},..., δ_{R}^{'})'$ with $δ_{i} = (δ_{1, i},..., δ_{k_{i}, i})', i = 1,..., R$ the parameters of the likelihood model for each disaggregated area, and we include the values of the indicators in the matrix $Z = d i a g {\{Z_{i}\}}_{i = 1}^{R}$ , block diagonal matrix of size $(d_{x} \times \sum_{i = 1}^{R} k_{i})$ whose elements are, for $i = 1,..., R$ , $Z_{i} = (z_{i,1},..., z_{i, k_{i}})$ being $z_{i j} = (z_{1}^{j i},..., z_{d_{q}}^{j i})'$ .

Denoting the perturbations for the models as $ε = (ε_{1}^{'},..., ε_{R}^{'})'$ , with $ε_{i} = (ε_{_{1}}^{i},..., ε_{_{d_{q}}}^{i})'$ the linear model could be written as

x = Z δ + ε

The likelihood is completely established by defining the prior for $ε$ given $τ$ , $(ε | τ)$ as $N (0, τ^{- 1} P_{ε}^{- 1})$ , assuming that $C o v (ε_{i}, ε_{j}) = 0, i \neq j$ and $V C o v (ε_{i}) = P_{ε}^{i}$ , and being the hyperparameter $P_{ε}$ a block diagonal matrix, $P_{ε} = d i a g {\{P_{ε}^{i}\}}_{i = 1}^{R}$ .

Hence, the likelihood function is given by

6

L (x, δ, τ | y, Z) \propto τ^{d_{x} / 2} \exp \{- \frac{τ}{2} (x - Z δ)' P_{ε} (x - Z δ)\}

being $y$ the annual data vector for the disaggregated areas previously defined.

We assume a normal prior distribution for $δ$ , given $τ$ ,

7

π (δ | τ, y) \propto τ^{k / 2} \exp \{- \frac{τ}{2} (δ - \bar{δ})' P_{δ} (δ - \bar{δ})\}

with $k = \sum_{i = 1}^{R} k_{i}$ , and $P_{δ}$ being the precision matrix, i.e., the conditional prior for $δ$ is a $N_{k} (\bar{δ}, τ^{- 1} P_{δ}^{- 1})$ .

Using Equation 5, Equation 6 and Equation 7, and assuming prior independence between $δ$ and the remaining parameters, except $τ$ , the posterior distribution can be expressed as

8

\begin{array}{l} p (x, δ, τ | y, Z) \propto τ^{d_{x} + α - 1 + k / 2} \cdot \\ \cdot \exp \{- \frac{τ}{2} [2 β + (x - μ)' P (x - μ) + x' D' D x + (δ - \bar{δ})' P_{δ} (δ - \bar{δ}) + (x - Z δ)' P_{ε} (x - Z δ)]\} \end{array}

under restrictions (Equation 2) and (Equation 3).

A well-known result (Zellner, 1971, p. 30) states that for a quadratic loss function

L [(τ, x, δ), (τ_{0}, x_{0}, δ_{0})] = (τ - τ_{0}, x' - x_{0}', δ' - δ_{0}') C (τ - τ_{0}, x' - x_{0}', δ' - δ_{0}')'

with $C$ being a positive definite matrix, the minimum of the quadratic risk function (the average of the quadratic loss function with the posterior distribution) is achieved by taking for $(τ, x, δ)$ the mean of that posterior joint distribution. The solution, therefore, involves obtaining said posterior means.

Note, however, that the restricted distribution for $x$ is a degenerate one. Obtaining the posterior average for $x$ may be achieved by including the temporal and contemporaneous restrictions in the posterior distribution. This substitution removes several components of $x$ . We then obtain the posterior mean for the ‘active’ parameters and variables and, taking into account the linearity of the restrictions, we derive the outcome for the remaining parameters.

Specifically, our aim is to write $x = W x^{r} + y_{v}$ , where $W$ and $y_{v}$ are non-random matrix and vector, respectively, and $x^{r}$ will group the components that are not subject to restrictions for the sub-annual series. The temporal constraints (Equation 1) lead to one sub-annual period being excluded for each year (specifically, we have excluded the last sub-annual period for each year). Furthermore, the transversal aggregation constraint (Equation 2) implies the linear dependence of one disaggregated sub-annual series. Thus, the $R$ -th series has also been excluded.

Denoting by $q^{i, r}$ ( $r$ for ‘restricted’) the column vector whose components are the ‘independent’ values for the chained series corresponding to the $i$ -th disaggregated area ( $q^{i, r} = (q_{1,1}^{i},..., q_{m - 1,1}^{i},..., q_{1, N}^{i},..., q_{m - 1, N}^{i})'$ or $q^{i, r} = (q_{1,1}^{i},..., q_{m - 1,1}^{i},..., q_{1, N}^{i},..., q_{m - 1, N}^{i}, q_{1, N + 1}^{i},..., q_{r, N + 1}^{i})'$ , depending on the presence of an incomplete last year), these independent values can be grouped into the column vector $x^{r} = (q^{1, r}',..., q^{R - 1, r}')'$ . Note that the dimension of $x^{r}$ is equal to $d_{x_{r}} = d_{x} - d_{q} - N (R - 1)$ .

Note S1 in Supplementary Materials provides the linear relation linking $x$ and $x^{r}$

9

x = W \cdot x^{r} + y_{v}

Now, integrating density (Equation 8), we obtain the posterior marginal distributions for $x$ and for $δ$ and, replacing restriction Equation 9 provides us with the posterior restricted distributions for both $x^{r}$ and $δ$ ,

10

p (x_{r} | y, Z) \propto {[1 + (x_{r} - a_{r})' \frac{W' M W}{b_{r}} (x_{r} - a_{r})]}^{- (d_{x} + α)}

being $M = P + D' D + P_{ε} - P_{ε} Z P_{δ ε}^{- 1} Z' P_{ε}$ a square matrix, with $P_{δ ε} = P_{δ} + Z' P_{ε} Z$ , $b_{r} = b + (M^{- 1} u - y_{v})' M [M^{- 1} - W {(W' M W)}^{- 1} W'] M (M^{- 1} u - y_{v})$ , for $b = s - u' M^{- 1} u$ and $s = 2 β + μ' P μ + \bar{δ}' P_{δ} \bar{δ} - \bar{δ}' P_{δ} P_{δ ε}^{- 1} P_{δ} \bar{δ}$ scalars, and $u = P_{ε} Z P_{δ ε}^{- 1} P_{δ} \bar{δ} + P μ$ a vector, being $a_{r} = {(W' M W)}^{- 1} W' M (M^{- 1} u - y_{v})$ .

Posterior density (Equation 10) is a Student multivariate (henceforth, MS-t)⁴ with $υ_{x_{r}} = d_{q} (1 + R) + 2 α + N (R - 1)$ degrees of freedom. The scale matrix is $\frac{υ_{x_{r}}}{b_{r}} W' M W$ and the position vector $a_{r}$ . Thus, the posterior variance-covariance matrix for $x_{r}$ could be written as $V (x_{r}) = \frac{υ_{x_{r}}}{υ_{x_{r}} - 2} \frac{b_{r}}{υ_{x_{r}}} {(W' M W)}^{- 1}$ .

The posterior first and second order moments for $x$ are then

11

E (x | y, Z) = W a_{r} + y_{v}

and

12

V (x) = \frac{b_{r}}{υ_{x_{r}} - 2} W {(W' M W)}^{- 1} W'

Posterior density for $δ$ is obtained in a similar manner. We obtain

13

π (δ) \propto {[b_{δ} + (δ - a_{δ})' M_{δ} (δ - a_{δ})]}^{- \frac{k + (2 d_{x} - d_{x_{r}} + 2 α)}{2}}

with $M_{δ} = P_{δ} + Z' P_{ε} (P_{ε}^{- 1} - Q_{r}) P_{ε} Z$ a square matrix with dimension $k$ for $Q_{r} = W {(W' P_{r} W)}^{- 1} W'$ , a singular square matrix with dimension $d_{x}$ and rank $d_{x_{r}}$ and being $P_{r} = P + D' D + P_{ε}$ , and with

b_{δ} = 2 β + y_{v}^{'} D' D y_{v} + \bar{δ}' P_{δ} \bar{δ} + (μ - y_{v})' P (μ - y_{v}) + y_{v}^{'} P_{ε} y_{v} - (P μ - P_{r} y_{v})' Q_{r} (P μ - P_{r} y_{v})

and being $a_{δ}$ the column vector $a_{δ} = M_{δ}^{- 1} [P_{δ} \bar{δ} + Z' P_{ε} (y_{v} + Q_{r} (P μ - P_{r} y_{v}))]$ .

The posterior distribution (Equation 13) for $δ$ is, thus, an MS-t with $2 d_{x} - d_{x_{r}} + 2 α$ degrees of freedom, position $a_{δ}$ and scale matrix $(2 d_{x} - d_{x_{r}} + 2 α) \frac{M_{δ}}{b_{δ}}$ . The variance-covariance matrix is then equal to $V (δ) = \frac{b_{δ}}{2 d_{x} - 2 d_{x_{r}} + 2 α - 2} M_{δ}^{- 1}$ .

An Example From the Quarterly Regional Accounts

We now present an example, which illustrates the above method. The resulting estimated variables are the quarterly regional chained volume series for Spanish regional Gross Domestic Product (GDP)⁵.

Temporal restrictions impose consistency among the regional estimations (quarterly regional chained volume series) and annual chained volume series provided by the annual regional accounts (ARA) from official Spanish regional statistics.

We also force the transversal consistency of these quarterly regional series with the Spanish national chained series provided by the QNA, a consistency which states that the QNA chained series is a weighted sum of the regional chained volume indices.

Many national statistics institutes have used annual chain-linking series for ANA and corresponding quarterly series for QNA. Specifically, the US Bureau of Economic Analysis (BEA) has used quarterly chain-linking volume series since 1996. In the European Union (EU), a European task force was set up in 2007, co-chaired by Eurostat and the European Central Bank. The growing popularity of chain-linking series for both QNA and ANA has led to the need for efficient tools in the reconciliation and benchmarking of quarterly chain-linking series. Cuevas et al. (2011, 2015), for example, proposed a two-step-method for use with the annual overlap derivation of chain-linked QNA, a very popular technique frequently used in the EU (see Eiglsperger, 2008 for a detailed listing of its dissemination).

Chen, Di Fonzo, Howells, and Marini (2018) developed approaches for reconciling annual (preliminary) estimates of US national accounts aggregates subject to quinquennial benchmarks available from detailed input-output tables. Furthermore, in an update of the 2001 version of the IMF QNA manual (Shrestha [2013], a document prepared by M. Marini and Th. Alexander), the authors suggest that "the compilation of consistent quarterly estimates satisfying both low frequency benchmarks and accounting identities at the quarterly level has become more and more challenging for compilers" (Shrestha, 2013, p. 7). Although the UN’s System of National Accounts (SNA) recommends use of the Fisher-type index, the authors of the manual recognise that the Laspeyres volume indices are an acceptable alternative in national accounts.

More specifically, the Spanish National Statistics Institute (INE) provides the annual GDP (chain-linked volumes, reference year 2010⁶) for the 17 autonomous regions and for the two autonomous cities (hereinafter, 19 regions) for the nineteen-year period 2000-2018. The INE also estimates total quarterly GDP (from QNA), both at market prices (in euros) and by the volume-chained series (the annual overlap method is used here). In both cases, the raw series as well as the seasonally and working-day adjusted series (SA) are presented. At the time of writing this paper, the quarterly series was composed of 76 quarters⁷, from the first quarter of 2000 to the fourth quarter of 2018.

As mentioned before, the authors’ aim is to estimate the 19-quarterly regional chained series, all being consistent with the annual regional chained series and with the total quarterly one. We only estimate the SA regional series, with the estimation for raw series following a similar development. We apply the procedure obtained in the second section for the period from quarter 2000:1 to 2018:4.

Cuevas et al. (2011) solve this problem by using a multivariate two-step extension of the Denton (1971) method. They use a procedure proposed by Di Fonzo and Marini (2005). The former authors use total employment (regional social security contributors [SSC]) as high-frequency indicators, among others, due to its close linkage to real output. Although the widespread debate concerning the extent of cyclical synchrony between employment and output is well known, we do not lead or lag the SSC series in order to replicate their data selection. As already pointed out, Cuevas et al. (2011) use a two-stage procedure that is not simultaneous, unlike the one proposed in this work.

Taking into account that the estimated quarterly regional series will be seasonally adjusted, the SSC series have first been seasonally adjusted using the X12 method, as implemented in Eviews (Version 6) software. The procedure proposed by the authors was implemented in MATLAB (Version R2012b).

Before the method can be applied, an extra adjustment is needed. This is due to the lack of consistency among the whole set of annual regional chained-series and the total Spanish one, resulting from the existence of ‘extra-regio’ territories (extra-regional economic activities such as embassies, military or scientific bases or resource deposits in international waters, among others).

To sum up, we have regional series concerning the annual GDP for the 19 regions for the period 2000-2018, both at current prices and in terms of volume. We have also estimated Spanish quarterly GDP (without extra-regio), both raw series and SA series, also at current prices and in chained-linking terms. Finally, we know the raw SSC series for each region, and have previously derived the corresponding SA series. As pointed out earlier, our objective is to estimate the Quarterly Regional chained volume series for Spanish regional GDP.

We then apply the method obtained in the second section, whose notations we now describe.

The INE provides regional annual GDP at market prices, in euros, $v_{T}^{i}, T = 1,..., 19, i = 1,...,19$ , for the 17 autonomous regions and the two autonomous cities ( $R = 19$ disaggregated areas) and for $N = 19$ years (from 2000 to 2018). National annual GDP at market prices $v_{T}^{}, T = 1,..., 19$ is obtained by aggregation. It also provides the annual volume chain series, $a_{T}, a_{T}^{i}, T = 1,..., 19, i = 1,...,19$ at the national and regional level, respectively. Furthermore, the QNA provide the national chained volume series, $q_{t, T}, T = 1,...,19, t = 1,...,4$ , seasonally and working-day adjusted. Finally, the Labour, Migration & Social Security Ministry provides the indicator used, $Z_{i} = {\{z_{t, T}^{i}\}}_{T = 1,...,19, t = 1,...,4}, i = 1,...,19$ , the social security contributors, SSC, seasonally corrected by the authors.

The ‘annual overlap’ method employed by the INE establishes the links between high frequency (quarterly) and low-frequency (annual) volume indices by using arithmetic means, both for national and for regional volume index. We thus impose temporal benchmarking

a_{T} = \frac{1}{4} \sum_{t = 1}^{4} q_{t, T} and a_{T}^{i} = \frac{1}{4} \sum_{t = 1}^{4} q_{t, T}^{i}, i = 1,...,19, T = 1,...,19

Furthermore, the cross-restrictions are

14

a_{T} = a_{T - 1} \cdot \sum_{i = 1}^{19} ω_{T - 1}^{i} \cdot \frac{a_{T}^{i}}{a_{T - 1}^{i}}, T = 1,...,19

at the annual level and

15

q_{t, T} = a_{T - 1} \cdot \sum_{i = 1}^{19} ω_{T - 1}^{i} (\frac{q_{t, T}^{i}}{a_{T - 1}^{i}}), t = 1,...,4, T = 1,...,19

at the quarterly one, being

ω_{T - 1}^{i} = v_{T - 1}^{i} / v_{T - 1}, i = 1,...,19, T = 1,...,19

Readers may easily note that the above Equation 14 and Equation 15 can be written as

a_{T} = \sum_{i = 1}^{19} c_{T}^{i} \cdot a_{T}^{i}, T = 1,...,19

and

q_{t, T} = \sum_{i = 1}^{19} c_{T}^{i} \cdot q_{t, T}^{i}, T = 1,...,19, i = 1,...,19

with $c_{T}^{i} = a_{T - 1} ω_{T - 1}^{i} / a_{T - 1}^{i}$ , $T = 1,...,19, i = 1,...,19$ . We are, therefore, in the context foreseen in section 2, such that the method proposed allows us to estimate the chained volume series $q_{t, T}^{i}, T = 1,...,19, t = 1,...,4, i = 1,...,19$ , grouped in vector $x$ , and to obtain its variance-covariance matrix.

Some representative tables and charts for the regional chain-linking SA series are shown in the Supplementary Materials. The authors have also obtained the results for the regional raw series, although these are not displayed in the work for reasons of length. Readers may observe the similarities and differences between our tables and charts and those obtained by Cuevas et al. (2011) for another time interval. These are shown in said reference.

Table S1 in the Supplementary Materials presents the Pearson correlations between SSC and estimated series, and also between the growth rates (annual grow rates, $t_{(1,1)} = (y_{t} - y_{t - 1}) / y_{t - 1}$ and quarterly growth rates, $t_{(4,1)} = (y_{t} - y_{t - 4}) / y_{t - 4}$ ) for both series. Broadly speaking, the highest values are obtained for annual growth rates, except for Ceuta and Melilla⁸ (the two autonomous cities). One surprising result is the low correlation in levels for Galicia, the Canary Islands, and the Comunidad Valenciana.

Tables S2 to S4 in the Supplementary Materials provide an overview of the results obtained for the Spanish regions, showing in particular regional growth behaviour during the recent Great Recession.

For its part, Figures S1 and S2 offer a graphic overview of said regional growth path.

Table S5 in the Supplementary Materials presents the comparison between the cyclical signal of each region and the turning points for the aggregate Spanish reference by using ratios, as defined in Abad and Quilis (2004). Only the turning points are taken into account, and no specific attention is paid to the numerical values for the two series.

The first and second column show the so-called ‘conformity ratio’, comparing the paired⁹ turning points of each regional cyclical signal with the individual turning points. R_x thus compares such paired turning points as a percentage of the turning points for regional cycles, and R_y compares them as a percentage of national ones. The conformity ratio varies between 0% and 100%, showing the extent to which the paired turning points reflect the overall cyclical signal of the region.

Readers may note that, with some exceptions, the agreement between regional and national cycles shown by indices Rx and Ry is relevant, with the exceptions corresponding to the Spanish regions of Ceuta and Castilla La Mancha.

The third column shows the global median delay (GMD) between regional and national cycles. Thus, the series are classified as coincident, lagged or leading with respect to the national cycle, respectively, for small (we take between -1 and +1), positive or negative GMD values. It should be noted that the Balearic Islands lead the national economy by two and a half quarters, and that Cantabria lags the national economy by two quarters.

The last column shows an index of cyclical coincidence¹⁰ between regional and national series. Values near 1 suggest that regional and national series are highly procyclical, being highly countercyclical if the index approaches the value -1. For values near zero, we conclude that the regional series is non-classifiable compared to the national one. It should be noted that all of the values are positive, but that only Andalusia, Cataluña, the Comunidad Valenciana, Madrid and the Basque Country display high values.

We also performed the estimation with incomplete years. Now $T = 18$ (between 2000 and 2017). The regions involved do not change, and we take $r = 3$ ; in other words, we estimate the chained volume series for the three first quarters of 2018. The estimates are shown only for one region (Andalucía) in Figure S3 in the Supplementary Materials. Table S6 shows the Bayesian confidence intervals at 95% for this estimation (see Supplementary Materials).

Conclusions

In this article, the authors propose a method to obtain explicit solutions for simultaneous benchmarking and reconciliation problems for a system of time series when the cross-restrictions use time-varying coefficients. The method provides explicit solutions to the estimation problem and deals with concurrently solving temporal restrictions (benchmarking annual and sub-annual frequency series) and contemporaneous ones (reconciliation among disaggregated and aggregated sub-annual frequency series).

The Bayesian model involved belongs to the frequently used normal-gamma family and minimizes a risk function derived from a quadratic loss function. In addition, the design of the method allows users to include one or several performance indicators through the likelihood model, and to estimate quarterly values for incomplete years.

The stochastic nature of the proposed model allows Bayesian confidence intervals to be obtained for each of the values of the high-frequency series estimated. These intervals are particularly interesting when estimating incomplete years.

Comparisons with alternative methods are not, broadly speaking, feasible since the methods have a different statistical base (mathematical methods compared to statistical methods) and are not nested models (none of them is a generalisation of the other). Nevertheless, certain differences may be pointed out.

Compared to the proposal put forward by Cuevas et al. (2011, p. 6, 2015, p. 631) it is worth highlighting that our procedure allows for the use of several indicators when estimating the sub-annual disaggregate series. Said indicators need not even be the same for all of them. In contrast, the proposal of Cuevas et al., previously synthesises the indicators in a single indicator through dynamic factor analysis. In addition, these authors’ procedure initially resolves temporal benchmarking and, subsequently, reconciliation of the disaggregated areas with the total aggregate. It is therefore (in any case) a sequentially optimal procedure, whereas the one proposed in this paper is globally optimal.

As regards the procedure of Di Fonzo and Marini (2011, 2015), we highlight that the design of their model implies using a single indicator. In fact, said authors follow the original idea of Denton (1971), such that the problem is one of adjustment through a single indicator, which is an approximation of the target series (for example, in Di Fonzo & Marini, 2011), they state that: "This procedure performs the constrained optimization of an objective function according to which the proportionate difference between the benchmarked and the original series ... must be as constant as possible through time" (emphasis ours; p. 148). In the words of IMF (2018), "the objective is to combine the quarterly movements of the indicator with the annual levels of the ANA variables" (p. 87). Strictly speaking, therefore, it involves not having an indicator but rather an approximation to the sub-annual series to be estimated.

Since it is an approximation, Di Fonzo and Marini (2011) suggest that one of the advantages of their method involves the similarity of the series estimated, whether either one or more, to the approximation (indicator) used. This apparent advantage proves to be a drawback when the indicator used as an approximation is too volatile, as tends to happen for small disaggregated areas. As seen in the second section, the proposed inclusion of an additional factor in the prior distribution seeks to correct, at least partially, that volatility, should it occur. The authors have simulated examples at different levels of volatility for the indicators, and have shown the ability of the Bayesian proposal to obtain smooth estimates. The comparison is carried out only for temporal benchmarking procedure and not for the reconciliation, given that Di Fonzo and Marini did not provide solutions for the reconciliation when time-varying links are present.

An example related to Spanish quarterly accounts, combining national and regional volume series, is presented, and evidences the method’s feasibility and appropriateness. In addition, individual regional behaviour during the recent twin economic crisis is analysed using the estimated quarterly volume series.

As previously pointed out (and illustrated in Figure S3 and Table S6 in the Supplementary Materials), the proposed method allows, as in those of Di Fonzo and Marini (2011) and Dagum and Cholette (2006), the target series to be estimated when the final year is incomplete. The additional advantage of the proposal put forward is that it enables Bayesian confidence bands to be built based on the posterior distribution of the estimated series.

Notes

1) Other specifications, such as weekly or daily frequencies, are also possible.

2) Series differentiation is often used to eliminate the tendency of non-stationary temporal series. In this work, on the other hand, it is used to penalize volatile behaviour in the series to be estimated.

3) We use the version for a gamma distribution from Zellner (1971, p. 369) with $β = 1 / γ$ .

4) A random variable $Z$ is said to have a Multivariate Student distribution if its density function takes the form $f (z) \propto {[1 + \frac{1}{ν} (z - μ_{z})' A (z - μ_{z})]}^{- (d + ν) / 2}$ , $d$ being the dimension of $Z$ , $ν$ the degrees of freedom, $μ_{z}$ the position matrix and $A$ the scale matrix. The variance-covariance matrix is $[ν / (ν - 2)] A^{- 1}$ for $ν$ greater than 2. See, e.g., Zellner, (1971), p. 383, who justifies the choice of this version of Multivariate Student distribution for Bayesian problems.

5) Cuevas et al. (2011) use the same example as an illustration of their method, but with another time interval.

6) The INE also provides total GDP at market prices in euros.

7) INE, quarterly Spanish national accounts, series from quarter 1/1995 up to last published (4/2018). Data were extracted on May 2019, from http://www.ine.es/dynt3/inebase/es/index.html?padre=1691&dh=1..

8) Cuevas et al. (2011) suggest that their small size and particular economic structure might cause these effects.

9) For each region, we have paired its turning points with the closest Spanish ones.

10) Defining $V_{x}$ ( $V_{y}$ ) as 1 for the growth phases and -1 for the declining phases, the coincidence index is then defined as $C O I N C_{x, y} = (1 / N) \sum_{t = 1}^{N} V_{x, t} \cdot V_{y, t}$ .

Funding

The authors have no funding to report.

Competing Interests

The authors have declared that no competing interests exist.

Acknowledgments

The authors have no support to report.

Supplementary Materials

For this article the following supplementary materials are available (Rojo-García & Sanz-Gómez, 2020):

Note S1. Linear relation linking the sub-annual disaggregated series with its ‘independent’ components.
Table S1. Pearson correlations between Indicator and Estimated Series.
Table S2. Regional quarterly growth rates for the last 28 quarters.
Table S3. Regional quarterly growth rates for the 2009 Great Recession.
Table S4. Regional quarterly growth rates for the 2012 Recession.
Table S5. Cyclical agreement among regions.
Table S6. Bayesian confidence bands for the chain volume series for incomplete years (Andalucía).
Figure S1. Multiple regional box-plot summarizing annual growth rates for the quarters.
Figure S2. Multiple quarterly box-plot summarizing annual growth rates for the regions.
Figure S3. Bayesian confidence bands for the chain volume series for incomplete years (Andalucía).

Index of Supplementary Materials

Rojo-García, J. L., & Sanz-Gómez, J. A. (2020). Supplementary materials to: Benchmarking and reconciliation with time-varying cross-coefficients [Tables, Figures].PsychOpen. https://doi.org/10.23668/psycharchives.4421

References

Abad, A., & Quilis, E. M. (2004). Programs for cyclical analysis: <F>, <G>, and <FDESC> (User’s Guide). Retrieved from http://www.ine.es/en/daco/daco42/daco4214/manual_en.pdf
Bisio, L., & Moauro, F. (2018). Temporal disaggregation by dynamic regressions: Recent developments in Italian quarterly national accounts. Statistica Neerlandica, 72(4), 471-494. https://doi.org/10.1111/stan.12156
Box, G. E. P., & Tiao, G. C. (1973). Bayesian inference in statistical analysis. Boston, MA, USA: Addison Wesley.
Bozik, J. E., & Otto, M. C. (1988). Benchmarking: Evaluating methods that preserve month-to-month changes (CENSUS/SRD/RR-88/07). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=156A5CFD28DDC5B4DD7963ED81D6F912?doi=10.1.1.66.393&rep=rep1&type=pdf
Broemeling, L. D. (1985). Bayesian analysis of linear models. New York, NY, USA: Marcel Dekker.
Causey, B., & Trager, M. L. (1981). Derivation of solution to the benchmarking problem: Trend revision (Unpublished research notes) [Available as an Appendix in Bozik & Otto (1988)]. Washington, DC, USA: US Census Bureau.
Chen, Z. G., & Dagum, E. B. (1997). A recursive method for predicting variables with temporal and contemporaneous constraints. In Proceedings of the Business and Economic Statistics Section, American Statistical Association (pp. 229-233). Washington, DC, USA: American Statistical Association.
Chen, B., Di Fonzo, T., Howells, T., & Marini, M. (2018). The statistical reconciliation of time series of accounts between two benchmark revisions. Statistica Neerlandica, 72(4), 533-552. https://doi.org/10.1111/stan.12154
Chen, B., Di Fonzo, T., & Mushkudiani, N. (2018). Benchmarking, temporal disaggregation, and reconciliation of systems of time series. Statistica Neerlandica, 72(4), 402-405. https://doi.org/10.1111/stan.12157
Cholette, P. A. (1988). Benchmarking systems of socio-economic time series (Working Paper No. TSRA-88-017E). Ottawa, Canada: Statistics Canada.
Cuevas, A., Quilis, E. M., & Espasa, A. (2011). Combining benchmarking and chain linking for short-term regional forecasting (Working Paper No 2011/4). Madrid, Spain: Department of Statistics Universidad Carlos III de Madrid.
Cuevas, A., Quilis, E., & Espasa, A. (2015). Quarterly regional GDP flash estimates by means of benchmarking and chain linking. Journal of Official Statistics, 31(4), 627-647. https://doi.org/10.1515/jos-2015-0038
Dagum, E. B., & Cholette, P. A. (2006). Benchmarking, temporal distribution and reconciliation methods of time series. New York, NY, USA: Springer Science+Business Media.
Deming, W. E., & Stephan, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics, 11(4), 427-444. https://doi.org/10.1214/aoms/1177731829
Denton, F. T. (1971). Adjustment of monthly or quarterly series to annual totals: An approach based on quadratic minimization. Journal of the American Statistical Association, 66(333), 99-102. https://doi.org/10.1080/01621459.1971.10482227
Di Fonzo, T. (1994). Temporal disaggregation of a system of time series when the aggregate is known: Optimal vs. adjustment methods [Paper presentation]. Workshop on Quarterly National Accounts, Eurostat, Theme 2 Economy and Finance (pp. 63-77), Paris-Bercy, France.
Di Fonzo, T. (2003). Temporal Disaggregation of Economic Time Series: Towards a dynamic extension (Working Paper). Luxembourg, Luxembourg: Office for Official Publication of the European Communities. Retrieved from https://ec.europa.eu/eurostat/documents/3888793/5816173/KS_AN-03-035-EN.PDF/21c4417c-dbec-45ec-b440-fe8bf95661b7?version=1.0
Di Fonzo, T., & Filosa, R. (1987, May 18-20). Methods of estimation of quarterly national account series: A comparison [Paper presentention]. Journée Franco-Italienne de Comptabilité Nationale (Journée de Statistique), Lausanne, Switzerland.
Di Fonzo, T., & Marini, M. (2005). Benchmarking systems of seasonally adjusted time series. Journal of Business Cycle Measurement and Analysis, 2015(1), 89-123. https://doi.org/10.1787/jbcma-2005-5km7v1835wr5
Di Fonzo, T., & Marini, M. (2011). Simultaneous and two-step reconciliation of systems of time series: Methodological and practical issues. Journal of the Royal Statistical Society: Series C, 60(2), 143-164. https://www.jstor.org/stable/41057568
Di Fonzo, T., & Marini, M. (2015). Reconciliation of systems of time series according to a growth rates preservation principle. Statistical Methods & Applications, 24, 651-669. https://doi.org/10.1007/s10260-015-0322-y
Eiglsperger, M. (2008). Seasonal adjustment of chain-linked volume measures in Quarterly National Accounts: Findings of a European Task Force. Retrieved from https://nanopdf.com/download/residential-property-prices-improving-the-quality_pdf
Gamerman, D. (1997). Markov Chain Monte Carlo: Stochastic simulation for Bayesian inference. London, United Kingdom: Chapman & Hall.
Guerrero, V. M., & Corona, F. (2018). Retropolating some relevant series of Mexico’s system of National Accounts at constant prices: The case of Mexico City’s GDP. Statistica Neerlandica, 72(4), 495-519. https://doi.org/10.1111/stan.12162
International Monetary Fund. (2018). Quarterly national accounts manual. Washington, DC, USA: IMF.
Quilis, E. M. (2018). Temporal disaggregation of economic time series: The view from the trenches. Statistica Neerlandica, 72(4), 447-470. https://doi.org/10.1111/stan.12150
Rojo, J. L., & Sanz, J. A. (1999). Una propuesta bayesiana para la distribución de Contabilidades regionales por procedimientos indirectos. In Cambios Regionales en la U.E. y Nuevos Retos Territoriales, AECR, ISBN: 84-607-3322-X, Madrid, 2001, pp. 1-19.
Rojo, J. L., & Sanz, J. A. (2005). A Bayesian benchmarking method with applications to the Quarterly National Accounts. Luxembourg, Luxembourg: Office for Official Publications of the European Communities. Retrieved from https://ec.europa.eu/eurostat/documents/3888793/5836929/KS-DT-05-013-EN.PDF/7baf7214-58e3-4601-9ade-b6b4956d5e5b
Rojo, J. L., & Sanz, J. A. (2017). Benchmarking and reconciliation of time series. An applied Bayesian method. Methodology, 13(4), 123-134. https://doi.org/10.1027/1614-2241/a000136
Shrestha, M. (2013). Update of the Quarterly National Accounts manual: An outline (Real Sector Division, IMF Statistics Department). Washington, DC, USA: IMF.
Spiegelhalter, D., Thomas, A., & Best, N. (1999). WinBUGS (Version 1.2) [User Manual]. Cambridge, United Kingdom: MRC Biostatistics Unit.
Young, M. R. (1996). Robust seasonal adjustment by Bayesian modelling. Journal of Forecasting, 15(5), 355-367. https://doi.org/10.1002/(SICI)1099-131X(199609)15:5<355::AID-FOR625>3.0.CO;2-K
Zellner, A. (1971). An introduction to Bayesian inference in econometrics. New York, NY, USA: John Wiley & Sons.