^{1}

Researchers often examine whether two continuous variables (X and Y) are linearly related. Pearson’s correlation (r) is a widely-employed statistic for assessing bivariate linearity. However, the accuracy of r is known to decrease when data contain outliers and/or leverage observations, a circumstance common in behavioral and social sciences research. This study compares 11 robust correlations with r and evaluates the associated bootstrap confidence intervals [bootstrap standard interval (BSI), bootstrap percentile interval (BPI), and bootstrap bias-corrected-and-accelerated interval (BCaI)] across conditions with and without outliers and/or leverage observations. The simulation results showed that the median-absolute-deviation correlation (r-MAD), median-based correlation (r-MED), and trimmed correlation (r-TRIM) consistently outperformed the other estimates, including r, when data contain outliers and/or leverage observations. This study provides an easy-to-use R code for computing robust correlations and their associated confidence intervals, offers recommendations for their reporting, and discusses implications of the findings for future research.

Behavioral and social sciences researchers often examine whether or not two continuous variables (

Bivariate linear relationships can be described in equation as (

where the parameter that measures the level of linearity between

where

In theory, using ^{1}

Shevlyakov and Smirnov’s generated

To better understand how

Previous simulations investigating the performance of ^{1}

In light of this, the present study evaluates the performance of 11 robust correlations, as compared to

According to

For Case A (outlier

As noted above, a first robust approach is to directly replace the linear summation of _{CME}

A second type of robust correlation makes use of a deviation estimate, called the median absolute deviation, i.e., _{MAD}

where

A third robust correlation is considered a derivative of _{MED}

where

A fourth robust correlation is called the biweight midcorrelation (

where

Another approach to robust correlation involves counting the signs or ranks of

where sign refers to the sign of deviations from median, and

A second robust correlation based on counting the

where _{i}_{i}_{j}_{j}_{i}_{j}_{i}_{j}_{i}_{j}_{i}_{j}_{i}_{j}_{i}_{j}_{i}_{j}_{i}_{j}

In addition, Spearman’s correlation (_{s}

where

_{wLS}

where

This approach focuses on discarding a certain percent of the top and bottom

where _{i}_{i}

Another trim-based robust correlation is called the Winsorized correlation (_{W}^{st}) > ^{nd}) >…>

A third trimmed correlation is known as the percentage bend correlation (_{pbc}

The bootstrap CIs I used in the simulation were constructed according to the following procedures (^{2}

The

Second, for each of the

Third, given the

where ^{3}

A BSI could extend beyond the range of possible correlations (i.e., below -1 or above +1).

.However, BSI was found to be non-robust to asymmetric distributions because of the equal widths above and below the point estimate as a result of the

where

A third type of bootstrap CI is the bootstrap bias-corrected-and-accelerated CI (BCaI). It has been shown to improve the accuracy of the lower and upper limits in BPI especially when the distribution of bootstrap

where

where

where

Four levels— .10, .30, .50, and .80—were evaluated. The first three levels (.10, .30, .50) generally refer to a small, medium, and large effect size, respectively, which are commonly found in behavioral and social science research (

Three levels—50, 100, and 200—were examined, which correspond to relatively small, moderate, and large sample sizes frequently used in simulation studies (

A total of six conditions that correspond to 3 different proportions of outliers and/or leverage points with 2 different

where

The ideal condition in

The method used in the current simulation is based on the fact that a 5%, 10% or 20% random sample of the original

Four cases are evaluated. For Case A (uniform

In sum, the four factors were combined to produce a design with

Two criteria were used to evaluate the performance of

Regarding the performance of the BSI, BPI, and BCaI, given that 95% CIs were constructed, the coverage was expected to be 950 out of 1,000 replications [or coverage probability (CP) = .95]. However, it is impossible for one to obtain a perfect CP of .95 in the presence of sampling error. Thus, an observed CP that falls within the range (.925, .975) is considered acceptable (

The results showed that the presence of O-LO was the most influential factor affecting the performance of

As predicted,

Correlation |
||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Distribution | Statistic | |||||||||||||

Normality | Mean | -.108 | ||||||||||||

SD | .009 | .013 | .061 | .058 | .012 | .021 | .011 | .014 | .016 | .053 | .032 | .015 | ||

Min | -.017 | -.065 | -.224 | -.210 | -.030 | -.055 | -.022 | -.065 | .006 | -.189 | -.165 | -.066 | ||

Max | .013 | -.015 | .006 | .018 | .018 | .023 | .028 | -.019 | .052 | .019 | -.058 | -.024 | ||

% |
1.000 | 1.000 | .917 | .917 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | .833 | .333 | 1.000 | ||

MAPE | .108 | |||||||||||||

Case A | Mean | -.305 | -.122 | |||||||||||

SD | .097 | .099 | .039 | .036 | .056 | .031 | .025 | .028 | .058 | .044 | .033 | .031 | ||

Min | -.495 | -.333 | -.197 | -.205 | -.060 | -.149 | -.115 | -.160 | -.220 | -.250 | -.208 | -.157 | ||

Max | -.124 | .000 | -.003 | -.015 | .186 | .102 | .012 | -.034 | .079 | -.012 | -.063 | -.033 | ||

% |
.000 | .681 | .861 | .806 | .847 | .972 | .986 | .750 | .944 | .694 | .250 | .694 | ||

MAPE | .305 | .122 | ||||||||||||

Case B | Mean | -.123 | .170 | .299 | -.124 | |||||||||

SD | .033 | .092 | .035 | .039 | .135 | .026 | .015 | .031 | .217 | .039 | .031 | .034 | ||

Min | -.159 | -.431 | -.171 | -.212 | -.003 | -.119 | -.066 | -.197 | .045 | -.202 | -.231 | -.197 | ||

Max | .033 | -.034 | .104 | .041 | .539 | .070 | .056 | -.034 | 1.093 | .045 | -.063 | -.033 | ||

% |
.958 | .556 | .958 | .958 | .375 | .986 | 1.000 | .681 | .125 | .931 | .222 | .722 | ||

MAPE | .123 | .170 | .299 | .124 | ||||||||||

Case C | Mean | -.776 | -.326 | -.120 | -.199 | -.294 | -.331 | -.258 | -.286 | |||||

SD | .160 | .224 | .039 | .042 | .070 | .070 | .111 | .130 | .193 | .043 | .090 | .145 | ||

Min | -.987 | -.798 | -.187 | -.213 | -.054 | -.354 | -.416 | -.519 | -.673 | -.176 | -.430 | -.543 | ||

Max | -.389 | -.104 | .055 | .036 | .245 | -.012 | -.061 | -.127 | -.030 | .025 | -.123 | -.110 | ||

% |
.000 | .000 | .917 | .847 | .569 | .528 | .292 | .000 | .167 | .792 | .000 | .000 | ||

MAPE | .776 | .326 | .120 | .199 | .294 | .331 | .258 | .286 | ||||||

Case D | Mean | -1.536 | -.533 | -.236 | -.402 | -.498 | -1.170 | -.390 | -.482 | |||||

SD | .313 | .382 | .049 | .053 | .097 | .130 | .219 | .233 | .592 | .048 | .161 | .259 | ||

Min | -1.984 | -1.433 | -.210 | -.225 | -.269 | -.553 | -.751 | -.844 | -2.436 | -.226 | -.639 | -.911 | ||

Max | -.789 | -.167 | .024 | .003 | .289 | -.003 | -.121 | -.206 | -.199 | -.014 | -.174 | -.178 | ||

% |
.000 | .000 | .764 | .653 | .694 | .125 | .000 | .000 | .000 | .611 | .000 | .000 | ||

MAPE | 1.536 | .533 | .236 | .402 | .498 | 1.170 | .390 | .482 |

_{CME}_{MAD}_{MED}_{bm}_{Q}_{τ}_{s}_{wLS}_{TRIM}_{W}_{pbc}

The performance of

As predicted,

As in Case A,

Unsurprisingly, Case D was found to be the data condition most detrimental to the accuracy of correlation estimates. For

The 3 bootstrap CIs constructed for

Correlation |
||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Distribution | CI | Statistic | ||||||||||||

Normality | BSI | Mean | ||||||||||||

SD | .013 | .013 | .012 | .013 | .010 | .010 | .008 | .011 | .010 | .019 | .038 | .008 | ||

Min | .914 | .935 | .954 | .950 | .926 | .945 | .930 | .930 | .904 | .935 | .834 | .931 | ||

Max | .951 | .977 | .995 | .994 | .958 | .981 | .954 | .969 | .941 | .993 | .962 | .954 | ||

% | .833 | .917 | .833 | .833 | 1.000 | .917 | 1.000 | 1.000 | .583 | .750 | .750 | 1.000 | ||

BPI | Mean | .987 | .985 | .983 | .915 | |||||||||

SD | .009 | .013 | .009 | .007 | .010 | .010 | .007 | .008 | .008 | .008 | .059 | .011 | ||

Min | .924 | .917 | .973 | .972 | .925 | .964 | .935 | .932 | .932 | .956 | .764 | .921 | ||

Max | .951 | .964 | 1.000 | .995 | .961 | .999 | .960 | .957 | .957 | .984 | .966 | .958 | ||

% | .917 | .917 | .167 | .083 | 1.000 | .250 | 1.000 | 1.000 | 1.000 | .833 | .583 | .917 | ||

BCaI | Mean | .904 | ||||||||||||

SD | .009 | .013 | .006 | .011 | .016 | .011 | .008 | .009 | .007 | .007 | .070 | .017 | ||

Min | .929 | .909 | .935 | .913 | .900 | .903 | .937 | .928 | .931 | .927 | .730 | .906 | ||

Max | .954 | .957 | .951 | .949 | .962 | .938 | .965 | .959 | .951 | .953 | .963 | .961 | ||

% | 1.000 | .917 | 1.000 | .833 | .833 | .750 | 1.000 | 1.000 | 1.000 | 1.000 | .583 | .833 | ||

Case A | BSI | Mean | .583 | .907 | .923 | .917 | ||||||||

SD | .350 | .135 | .011 | .013 | .013 | .009 | .036 | .063 | .099 | .016 | .064 | .068 | ||

Min | .000 | .227 | .945 | .946 | .884 | .945 | .686 | .530 | .315 | .934 | .617 | .496 | ||

Max | .936 | .989 | .991 | .994 | .967 | .984 | .976 | .966 | .974 | .996 | .970 | .969 | ||

% | .014 | .736 | .764 | .806 | .972 | .917 | .903 | .833 | .903 | .819 | .611 | .847 | ||

BPI | Mean | .616 | .922 | .981 | .977 | .985 | .907 | .898 | .906 | |||||

SD | .352 | .108 | .015 | .018 | .014 | .008 | .051 | .091 | .096 | .033 | .092 | .110 | ||

Min | .000 | .292 | .912 | .890 | .854 | .966 | .604 | .417 | .259 | .805 | .498 | .253 | ||

Max | .941 | .980 | .997 | .995 | .962 | .999 | .969 | .965 | .981 | .988 | .974 | .966 | ||

% | .139 | .750 | .222 | .306 | .972 | .111 | .889 | .694 | .889 | .681 | .556 | .722 | ||

BCaI | Mean | .560 | .900 | .923 | .905 | .914 | .886 | .900 | ||||||

SD | .354 | .123 | .016 | .015 | .021 | .012 | .061 | .095 | .117 | .018 | .102 | .113 | ||

Min | .000 | .274 | .864 | .857 | .797 | .897 | .557 | .408 | .226 | .855 | .481 | .267 | ||

Max | .938 | .967 | .954 | .957 | .957 | .947 | .972 | .967 | .962 | .963 | .978 | .966 | ||

% | .056 | .792 | .861 | .750 | .778 | .444 | .861 | .667 | .847 | .875 | .486 | .667 | ||

Case B | BSI | Mean | .856 | .745 | .919 | |||||||||

SD | .051 | .069 | .010 | .011 | .039 | .008 | .011 | .023 | .193 | .016 | .056 | .021 | ||

Min | .693 | .571 | .947 | .947 | .687 | .942 | .911 | .800 | .038 | .938 | .698 | .819 | ||

Max | .932 | .993 | .989 | .991 | .994 | .982 | .958 | .966 | .940 | .996 | .971 | .971 | ||

% | .042 | .708 | .819 | .764 | .750 | .958 | .847 | .819 | .028 | .750 | .681 | .875 | ||

BPI | Mean | .909 | .986 | .986 | .983 | .847 | .904 | |||||||

SD | .016 | .043 | .007 | .006 | .071 | .007 | .008 | .036 | .167 | .009 | .077 | .034 | ||

Min | .870 | .730 | .966 | .969 | .460 | .969 | .929 | .742 | .099 | .953 | .613 | .746 | ||

Max | .943 | .980 | .998 | .995 | .970 | .996 | .966 | .962 | .954 | .991 | .969 | .964 | ||

% | .208 | .750 | .069 | .056 | .806 | .181 | 1.000 | .653 | .417 | .556 | .583 | .778 | ||

BCaI | Mean | .886 | .899 | .889 | .923 | .917 | .773 | .891 | .920 | |||||

SD | .039 | .084 | .008 | .010 | .087 | .013 | .008 | .049 | .167 | .009 | .093 | .052 | ||

Min | .790 | .563 | .921 | .902 | .371 | .893 | .935 | .691 | .130 | .924 | .557 | .680 | ||

Max | .943 | .963 | .958 | .955 | .949 | .950 | .969 | .964 | .917 | .962 | .972 | .970 | ||

% | .194 | .597 | .958 | .847 | .361 | .472 | 1.000 | .611 | .000 | .986 | .514 | .639 |

Correlation |
||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Distribution | CI | Statistic | ||||||||||||

Case C | BSI | Mean | .588 | .782 | .844 | .731 | .879 | .756 | .773 | |||||

SD | .270 | .285 | .011 | .013 | .020 | .052 | .195 | .282 | .058 | .015 | .270 | .264 | ||

Min | .000 | .000 | .946 | .943 | .869 | .659 | .035 | .000 | .684 | .933 | .001 | .001 | ||

Max | .881 | .993 | .992 | .994 | .995 | .983 | .968 | .949 | .963 | .997 | .964 | .958 | ||

% | .000 | .417 | .778 | .806 | .750 | .792 | .542 | .319 | .208 | .764 | .361 | .375 | ||

BPI | Mean | .644 | .732 | .985 | .982 | .818 | .708 | .915 | .719 | .726 | ||||

SD | .310 | .291 | .008 | .010 | .022 | .077 | .223 | .298 | .088 | .016 | .295 | .294 | ||

Min | .000 | .000 | .954 | .931 | .804 | .539 | .017 | .000 | .435 | .893 | .000 | .000 | ||

Max | .925 | .968 | .996 | .995 | .973 | .995 | .958 | .956 | .970 | .995 | .971 | .965 | ||

% | .014 | .333 | .139 | .153 | .917 | .333 | .472 | .278 | .736 | .694 | .319 | .319 | ||

BCaI | Mean | .589 | .707 | .916 | .870 | .802 | .683 | .841 | .705 | .700 | ||||

SD | .305 | .308 | .009 | .008 | .034 | .090 | .238 | .314 | .100 | .010 | .305 | .312 | ||

Min | .000 | .000 | .911 | .908 | .707 | .451 | .010 | .000 | .368 | .914 | .000 | .000 | ||

Max | .925 | .962 | .958 | .949 | .947 | .949 | .964 | .963 | .929 | .967 | .969 | .973 | ||

% | .014 | .333 | .972 | .764 | .444 | .139 | .458 | .278 | .028 | .917 | .306 | .333 | ||

Case D | BSI | Mean | .359 | .690 | .881 | .676 | .564 | .628 | .625 | .622 | ||||

SD | .323 | .346 | .015 | .015 | .020 | .177 | .333 | .371 | .313 | .020 | .351 | .358 | ||

Min | .000 | .000 | .884 | .890 | .879 | .062 | .000 | .000 | .000 | .846 | .000 | .000 | ||

Max | .822 | .991 | .993 | .994 | .993 | .981 | .979 | .960 | .973 | .997 | .962 | .961 | ||

% | .000 | .361 | .778 | .764 | .792 | .653 | .250 | .167 | .125 | .778 | .222 | .264 | ||

BPI | Mean | .444 | .585 | .977 | .878 | .648 | .549 | .701 | .590 | .572 | ||||

SD | .368 | .374 | .028 | .036 | .029 | .218 | .348 | .377 | .327 | .052 | .368 | .375 | ||

Min | .000 | .000 | .778 | .732 | .777 | .010 | .000 | .000 | .000 | .621 | .000 | .000 | ||

Max | .910 | .966 | .996 | .996 | .979 | .994 | .962 | .963 | .970 | .996 | .971 | .963 | ||

% | .000 | .236 | .222 | .319 | .889 | .319 | .306 | .194 | .333 | .625 | .222 | .208 | ||

BCaI | Mean | .389 | .572 | .923 | .919 | .784 | .629 | .529 | .620 | .580 | .553 | |||

SD | .348 | .377 | .025 | .022 | .028 | .204 | .355 | .382 | .308 | .029 | .371 | .382 | ||

Min | .000 | .000 | .761 | .780 | .799 | .028 | .000 | .000 | .000 | .743 | .000 | .000 | ||

Max | .875 | .963 | .957 | .946 | .950 | .929 | .969 | .965 | .937 | .956 | .978 | .970 | ||

% | .000 | .208 | .847 | .639 | .556 | .042 | .278 | .181 | .028 | .833 | .222 | .222 |

_{CME}_{MAD}_{MED}_{bm}_{Q}_{τ}_{s}_{wLS}_{TRIM}_{W}_{pbc}

Given that

Although _{wLS}

Similar to Case A, the bootstrap CIs surrounding _{wLS}

In behavioral and social science research, data may contain O-LO in practice. The results of my simulation study suggest that

Given the more desirable performance of _{s}

The present simulation also found similar patterns of results, but the benefits of

Combining

The supplementary materials provide in Part A the simulation code used in the present study. Part B provide a real-world example based on

The author has no funding to report.

The author has declared that no competing interests exist.

The author has no additional (i.e., non-financial) support to report.