^{a}

^{a}

^{b}

^{b}

^{c}

Although measures such as sensitivity and specificity are used in the study of diagnostic test accuracy, these are not appropriate for integrating heterogeneous studies. Therefore, it is essential to assess in detail all related aspects prior to integrating a set of studies so that the correct model can then be selected. This work describes the scheme employed for making decisions regarding the use of the R, STATA and SAS statistical programs. We used the R Program Meta-Analysis of Diagnostic Accuracy package for determining the correlation between sensitivity and specificity. This package considers fixed, random and mixed effects models and provides excellent summaries and assesses heterogeneity. For selecting various cutoff points in the meta-analysis, we used the STATA module for meta-analytical integration of diagnostic test accuracy studies, which produces bivariate outputs for heterogeneity.

Diagnostic accuracy plays a central role in the evaluation of diagnostic tests, where accuracy can be expressed as sensitivity, specificity, positive predictive value, negative predictive value, and reasons of probability. However, predictive values depend directly on the prevalence of the disease in question and, therefore, cannot be directly compared in different situations. By contrast, it is believed that test sensitivity and specificity do not vary with the prevalence of disease.

This is also the case for reasons of probability. Since they depend on sensitivity and specificity they are believed to remain constant, although variability with regard to prevalence does exist. However, some studies (

Several studies have indicated that the variability of sensitivity and specificity may be related to differences in thresholds (

In situations of low prevalence, where the test being employed provides a high number of true negatives and a small number of true positives, the percentage of cases correctly classified does allow different tests to be compared. This is because true positives will be very high, even when the number of false positives is equal to or greater than the number of true positives, which is a situation that can cause the test to be rejected and declared as being inefficient.

Meta-Analysis of Diagnostic Accuracy (MADA) libraries are among the statistical packages available that can be used with the most relevant models (

This paper describes the main models used in this context, as well as the available software. Since it is not always easy for researchers to decide on the model most appropriate for their study or to choose the correct software for interpreting their results, we have created a guide for carrying out a meta-analysis on diagnostic tests. According to the assumptions that fulfill the analyzed data, we present findings regarding the most suitable model, the software that allows this model to be used and how the results obtained can be interpreted.

To investigate the effect of a cutoff point on sensitivity and specificity, the results have been presented in the form of a receiver operating characteristic (ROC) curve. In addition, one way to summarize the behavior of a diagnostic test from multiple studies is to calculate the mean sensitivity and specificity (

In diagnostic tests, the assumption of methodological homogeneity in studies is not met and thus it becomes important to evaluate heterogeneity. Assessing the possible presence of statistical heterogeneity in the results can be done (in a classical way) by presenting the sensitivity and specificity of each study in a forest plot.

A characteristic source of heterogeneity is that which arises because the studies included in the analysis may have considered different thresholds for defining positive results; this effect is known as the threshold effect.

The most robust statistical methods proposed for meta-analysis take this threshold effect into account and do so by estimating a summary ROC curve (SROC) of the studies being analyzed. However, on some occasions the results of the primary studies are homogeneous and the presence of both threshold effect and other sources of heterogeneity can be ruled out. This statistical modelling can be done using either a fixed-effect model or a random-effects model, depending on the magnitude of heterogeneity. Several statistical methods for estimating the SROC curve have been proposed. The first, proposed by (

The discriminatory capacity of a test is commonly expressed in terms of two measures (sensitivity and specificity) and there is usually an inverse relationship between the two due to the variability of the thresholds. Some of the recommended methods for meta-analysis of diagnostic tests, such as the bivariate model, focus on estimating a summary sensitivity and specificity at a common threshold. The HSROC model, on the other hand, focuses on estimating a summary curve from studies that have used different thresholds.

The test can be based on a biomarker or a more complex diagnostic procedure. However, the value of the index that the test provides may not be completely reliable. The starting information is a 2×2 table showing the concordance between the test results in binary form and information associated with disease (see

Test result | Disease state |
Total | |
---|---|---|---|

D+ | D− | ||

T+ | TP | FP | TP + FP |

T− | FN | TN | FN + TN |

Total | _{1} |
_{2} |

_{1} = patients who actually have the disease; _{2} = patients who are disease free. T+ = a positive result; T− = a negative result; TP = true positives; FP = false positives; TN = true negatives; FN = false negatives (

The results of a meta-analysis of diagnostic tests are usually reported as a pair, representing both sensitivity and specificity. However, some attempts have been made to consolidate the result as a single number. The most common approach is the use of diagnostic odds ratio (DOR;

Let

A consequence of the overlapping of the distributions of

The objective of this model is to transform true positive rate (TPR) and false positive rate (FPR) so that the relationship becomes linear; thus making an adjustment for the points given (

Various useful statistical methods have been proposed to summarize a SROC curve. The most common is the area under the curve (AUC), which summarizes the diagnostic performance of the test in a single number (

The Moses model does present some limitations. On one hand, it does not take into account the different levels of precision with which sensitivity and specificity are estimated in each study, nor does it incorporate heterogeneity between studies. To overcome these limitations, more complex regression models have been proposed. The first of these is a bivariate random effects model (

It should be noted that the SROC model does not quantify the error in _{k}_{k}

Alternatively, the covariance can be parameterized by the correlation coefficient ρ and standard errors in such a way that

Means,

Variances

Covariance

The inclusion of covariates in the sensitivity or specificity, or both, is done by replacing one or both means

The authors parameterized the sensitivities and specificities as follows (

where _{k}

Finally, the specification of the hierarchical model is completed by choosing a priori the distributions of the parameters. In short, the model has five parameters (

the mean and variance of the cutoff points

the mean and variance of the accuracy

the shape parameter

A value of β = 0 would represent a symmetric curve in the ROC space (_{k}

The above expression is equivalent to

Further details can be found in some related papers (

In more generally, the mean sensitivity and specificity can be modeled through linear regressions of study-level covariates (

where the coefficients γ and ν quantify the weight of the covariate

The MADA package of the statistical program R is a tool that allows the meta-analysis of diagnostic tests to be accurately carried out. Although there are many methods for diagnostic meta-analysis, it is still not a routine procedure. One of the reasons may be due to the complexity of the bivariate approach. The MADA statistical package offers some current approaches to diagnostic meta-analysis, as well as functions that allow for statistical methods for a data set include sensitivity, specificity, true/false positives, true/false negatives, and their DOR (

the Mantel–Haenszel (MH) method, for a fixed effect model (

the model is formulated in terms of DOR logarithms and is a weighted estimator;

the proportional model of Hazards (

In meta analysis of diagnostic tests, the relationship between sensitivity and specificity is negative. Since these quantities are related to each other, the bivariate approach for meta-analysis in the accuracy of the diagnosis has been welcomed. Using the Reitsma function in the MADA library, it is possible to use the aforementioned model. Finally, the HSROC library that contains the HSROC function is used to estimate the HSROC hierarchical model, which makes the necessary adjustments in the model.

The MIDAS package is a comprehensive program of statistical and graphical routines used to understand the meta-analysis of diagnostic tests in STATA, which is a statistical software package that was created by StataCorp in 1985. It provides statistical and graphical functions that allow us to study the accuracy of diagnostic tests. The modeling of primary data is done through a binary regression of bivariate mixed effects. Model fitting, estimation, and prediction are performed by adaptive quadrature. Using the values of the coefficients and the variance-covariance matrices, the sensitivity and specificity are estimated with their respective zones of confidence and prediction in the ROC space (

MetaDas is a high-performance SAS program, which adjusts the parameters of bivariate and HSROC models to analyze the accuracy of diagnostic tests using Proc nonlinear mixed models (NLMIXED;

Once the systematic review of the diagnostic tests has been performed, it is necessary to integrate the results using the approaches described above. For this reason, we propose the following four steps:

Perform a descriptive statistical analysis of the studies using the R language and the MADA and META libraries together with the madad and mslSORC functions, respectively, which provide the following results and graphs.

Sensitivity per study with their respective confidence intervals (IC)

Specificity per study, IC

DOR per study, IC

Chi-square test that allows comparing the sensitivity and specificity of the studies

LR+ and LR−

Correlation between sensitivity (Se) and specificity (Sp)

Rate of false-positive (RFP) per study, IC

Forest plot for sensitivity and specificity

Crosshair and RocEllipse chart

SROC Curve of Moses model

If there is independence between sensitivity and specificity, a univariate analysis is then performed using the madauni and phm functions of the MADA library of the R language. This analysis uses the Mantel–Haenszel (fixed effects), DerSimonian-Laird (random effects) models and the Hazards proportional approach (fixed and random effects), which generate the following results.

DOR and DOR logarithm with their respective confidence interval

Forest plot for sensitivity and specificity with their respective confidence intervals

τ^{2} with confidence interval

^{2}

AUC

Forest plot with summary measures for DOR, LR+ and LR− log

Chi-square test of homogeneity between studies

Chi-square test of heterogeneity between studies

Curve SROC with RocEllipse

If the sensitivity and specificity are related, i.e., there are different cutoff points in the meta-analysis and the data is adjusted to a normal bivariate distribution, a bivariate analysis is performed using the R and STATA languages using the MADA and MIDAS libraries. Note that for using the bivariate approach in R, the reitsma function is used. This bivariate analysis generates the following results, see

R Language | Stata Language |
---|---|

Logit of consensus sensitivity with confidence interval | Forest plot for sensitivity with and with- out measure summary and their confi- dence intervals |

Logit of false-positive rate with confidence interval | Forest plot for specificity with and with- out measure summary and their confi- dence intervals |

Sensitivity consensus with confidence interval | DOR, LR+, and LR− consensus with their respective confidence intervals |

False-positive consensus rate with confidence intervals | ^{2} |

SROC curve with sensitivity and false- positive consensus rate | AUC |

Matrix of variances between studies | Sensitivity and specificity, consensus with their respective |

Correlation matrix | SROC curve with sensitivity, specificity consensus and confidence intervals |

HSROC model parameters | Fagan plot |

If the effect of the characteristics or the study on the threshold, accuracy, and shape of the SROC curve must be determined, a hierarchical approach HSROC should be used. The data must conform to this hierarchical approach using the HSROC and MetaDas packages of the R and SAS languages, respectively, which generate the following main outputs, see

R Language | SAS Language |
---|---|

A priori values of the model parameters | Information on covariates |

A posteriori values of the model parameters | Initial values of the model and state of convergence and adjustment of the model |

Sensitivity and specificity by studies with their respective confidence intervals | Sensitivity, specificity, DOR, LR+, LR− consensus |

Sensitivity and specificity, consensus with their respective confidence intervals | Confidence intervals and prediction of model parameters |

SROC curve with sensitivity and specificity consensus and its confidence intervals | Predictive values of sensitivity and specificity for studies, histogram and normal probability graphs of Bayesian empirical estimates of random effects |

A graphic representation of the above is detailed in

The Moses model uses true and false positive rate logit functions to build a linear regression model where the response variable (test accuracy) is explained by the proportion of positive test results (relative to the threshold). The SROC curve is symmetrical if the statistical relationship between precision and threshold is zero, i.e. constant DOR. This modeling is characterized by a fixed effect since the variation is attributed to the threshold and the sampling error. This model generates errors, which makes the statistical inference invalid (

Hierarchical models capture the stochastic relationship between sensitivity, specificity, and variability of test accuracy in all studies by incorporating random effects into the modeling. Bivariate and HSROC models differ in their parameterization but are mathematically equivalent when covariates are not included (

The bivariate model models random effects to estimate sensitivity and specificity, as well as to construct 95% credibility intervals. The model is based on logit transformations of sensitivity and specificity as bivariate normal distributions. The estimation of the correlation parameter is achieved from the subsequent means of sensitivity and specificity (

The HSROC model is a reference in the study of diagnostic test accuracy and can be seen as a generalization of the Moses SROC approach, in which TPR and FPR are modeled directly. (

The HSROC model and the bivariate model are different settings of the same underlying model, and both approaches can be used to calculate estimates of the SROC curve and random effects. Moreover, there is a difference in the software packages that can fit them. While the HSROC model requires a non-linear mixed model program like NLMIXED in SAS, the bivariant only requires a linear mixed model program and can be installed in R and Stata.

Since the bivariate model is parameterized in terms of sensitivity and mean specificity (logit), it is often claimed that this is the preferred model for estimating the mean operating points. However, in practice, it is possible to obtain estimates of both the average operating point and the summary ROC curve from both HSROC modes. Therefore, the estimation of average operating points depends on the homogeneity of the thresholds included in the analysis, not on the choice of the statistical model. The bivariate model allows covariates to be included in sensitivity and/or specificity, while the HSROC model facilitates the inclusion of covariates that affect threshold and/or accuracy (

We suggest that meta-analysts carefully explore and inspect their data using a forest plot and an SROC curve before performing meta-analyses. These first analyses will quantify stochastic heterogeneity and the dispersion of study points in the ROC space (

The hierarchical approach can be used in different situations such as (1) the presence or absence of heterogeneity and (2) cutoff points being homogeneous among studies. This is the reason we recommend using this model in situations of low prevalence, because it better handles the variability between and within studies. Thus, this model is an approach suitable for fixed and random effects depending on the nature of the data.

The bivariate model allows covariates to be included in sensitivity and/or specificity, while the HSROC model facilitates the inclusion of covariates that affect threshold and/or accuracy (

The selection of the statistical model in the meta-analysis of diagnostic tests of low-prevalence diseases is essential for the integration of the study results. Regardless of the software used, the rigorous application of the decision-making scheme will help to guarantee high quality results and facilitate the analysis and interpretation of the results.

The authors have no funding to report.

The authors have declared that no competing interests exist.

The first author’s research was supported by financial assistance from Escuela Superior Politécnica del Litoral, Ecuador. The second author’s research was supported by financial assistance from Escuela Superior Politécnica del Litoral, Ecuador.