Despite the existence of sophisticated statistical methods, systematic reviews regularly indicate that single-case experimental designs (SCEDs) are predominantly analyzed through visual tools. For the quantitative aggregation of results, different meta-analytical techniques are available, but specific visual tools for the meta-analysis of SCEDs are lacking. The present article therefore describes the use of violin plots as visual tools to represent the raw data. We first describe the underlying rationale of violin plots and their main characteristics. We then show how the violin plots can complement the statistics obtained in a quantitative meta-analysis. The main advantages of violin plots as visual tools in meta-analysis are (a) that they preserve information about the raw data from each study, (b) that they have the ability to visually represent data from different designs in one graph, and (c) that they enable the comparison of score distributions from different experimental phases from different studies.
In single-case experimental designs (SCEDs) a single entity (e.g., a classroom) is measured repeatedly under different conditions of one or several independent variables (e.g., token economy) (
When meta-analyzing SCEDs, one can roughly distinguish between top-down and bottom-up models for effect size estimation (
However, these techniques reveal relatively little about the large amount of raw data from which the effect size was computed. In addition, these techniques are not commonly used in meta-analyses (
In contrast, “the term ‘bottom-up’ refers to an analytic strategy that proceeds from visually guided selection of individual phase contrasts (the ‘bottom’) to combining them to form a single (or a few) omnibus effect size representing the entire design (the ‘top’)” (
Following this logic, the current proposal builds on the complementarity of visual and statistical analyses for SCEDs. Visual analysis has a long-standing tradition in SCED data analysis and continues to be the most dominant mode of analysis. At the individual study level, it has been recommended to combine visual and statistical analyses (
The violin plot was formally introduced by
Violin plots can easily be constructed with very little programming knowledge, for example using R. Consider
The upper panel of
As shown in
Similar to the interval width for density estimation in the violin plot, the width of the jitter along the x-axis can be changed. As mentioned previously, the data points are jittered along the x-axis to avoid cluttering of points (
A crucial consideration when using vase or violin plots is the setting of the interval width
Regarding the optimal interval width for violin plots, Nelson and Hintze recommend 15% of the data range as a general rule of thumb. If a violin plot is constructed in R, the default setting for the interval width uses
It is important that all data points represented on adjacent violin plots are measured on the same scale. This can be achieved in four ways. First, it is possible to ensure during the study selection process that all included studies used the same measurement unit (e.g., percentage of a certain behavior). If practically feasible, this is the preferred method. Second, it is possible to transform the scores from studies that used a different measurement unit than the majority of studies. For example, if the main interest for a SCED meta-analysis is the percentage of time spent on-task by children with conduct problems, it is possible to transform scores from studies using other measurement units to percentages. If some studies used time spent on-task, it is possible to transform the time spent-on task to percentages by dividing it by the total time and multiplying with 100. However, it should be noted that sufficient background knowledge about the studies using other measurement units is required for this approach. Thus, insufficient reporting (
To demonstrate the use of violin plots in SCED meta-analysis, we make use of the meta-analysis of SCEDs conducted by
As shown by the boxplots, the median percentages of disruptive behavior are 31% for baseline data points and 1% for intervention data points. The third quartile (i.e., 75th percentile) lies at 69% for baseline measurements and 6% for intervention phase measurements. The violin plots show that by far the largest density for intervention measurements lies at 0% whereas the density is distributed more evenly over the whole range for baseline measurements. The density for the intervention data points gets narrower with higher percentages. The jitter clearly shows that there are only a handful of data points above 30%. At the same time, the jitter shows that for the baseline data points, there is also a non-negligible number of data points with 0% disruptive behavior.
To the left of the baseline violin plot and to the right of the intervention violin plot, the mean value per study can be found, following the same color scheme as the raw data. The size of the mean squares is proportionate to the number of measurements from which they were calculated. There are only 11 baseline mean squares because one study did not include baseline data. With the exception of two baseline means, all interventions means are lower than the baseline means. However, one of those two baseline means (pink square) contains the largest number of observations as evidenced by the size of the square. At the same time, the intervention mean is much smaller for that study and the pink intervention data points are mostly clustered around 0%. One study (brown square) already showed a relatively small mean percentage of disruptive behavior during baseline measures, which was still reduced to some extent during intervention measures. The lowest intervention mean (olive green square) lies at 0%, a reduction of 30% compared to baseline measures.
The mean effect sizes (accompanied by the standard deviation in parenthesis) calculated by Losinski et al. for all studies assessing disruptive behavior were SMD: −3.08 (2.18), PND: 79% (34.84%), and IRD: 87% (18.77%). Visual representations (and summary measures) as shown above can be valuable complements to these summary measures by giving more information about the score distributions. While effect size measures are important measures of intervention effectiveness, they reduce a vast amount of raw data to a single number.
The boxplots show a median percentage of on-task behavior of 50% for baseline measurements and 94% for intervention measurements. The third quartile is 73% for baseline data points and 99% for intervention data points. This reveals that the percentage of on-task behavior was already relatively high during baseline measures. However, the boxplots also show that the spread of the scores is much higher for baseline measurements. The violin shows that the density of the baseline scores is relatively even over the whole range of percentages. In addition, the boxplot shows that there are no outliers for the baseline measurements which is supported by the even vertical spread of the jitter. For intervention measurements, the density is much higher on the upper end of the scale and narrows towards 60%. The jitter supports this by showing that there are very few data points below 60%, which are all outliers according to the boxplot.
Two studies did not include baseline data, so that there are seven baseline mean squares and nine interventions mean squares. The mean squares reveal that four intervention phase means are either under or close to the highest baseline mean (light purple square). This highest baseline mean has slightly increased during intervention. The largest baseline square (light blue) lies clearly under 50% of on-task behavior, which has increased to over 75% after the intervention. The two largest intervention squares (salmon pink and olive green) are the ones showing the highest percentage of on-task behavior and both are higher than the highest baseline mean. Four out of the seven baseline mean squares are above 50%, supporting the information from the boxplot that the percentage of on-task behavior was already relatively high during baseline measure.
The mean effect sizes (accompanied by the standard deviation in parenthesis) calculated by Losinski et al. for all studies assessing on-task behavior were SMD: −2.93 (2.32), PND: 84% (30.83%), and IRD: 88% (25.46%).
The current article demonstrated the use of violin plots as visual tools in the meta-analysis of SCEDs. Complementing statistical top-down procedures and graphical representations of summary data (
The violin plots can contain data points from different designs. The integration of data points from ABAB designs (and extensions of it) and multiple baseline designs seems straightforward. A-phase data points are represented in one violin and B-phase data points are represented in another. One issue pertaining to meta-analyses of SCEDs is the question how to integrate data from changing criterion designs and alternating treatments design (
The use of violin plots, just like other SCED meta-analytical tools, requires the availability of the raw data from the included studies. For the sake of transparency and reproducibility, authors would therefore make the raw data available (
In addition, as noted by
The first limitation concerns the density estimation method used for constructing the violin plots. Depending on the density estimation method used, the violin plots may look somewhat different. Nevertheless, when the actually obtained measurements are added (jittered) to the plot, it is still possible to assess their distribution and concentration, independent of the density plot.
The second limitation relates to the number of data points contributed to the violin plots by each study. Studies with a higher number of data points may disproportionately affect the density curve. However, with the colored jittered raw scores and mean squares—whose size is proportionate to the number of data points—this limitation may be mitigated. Moreover, in meta-analysis it is expected (and desired) that studies with more measurements contribute to a greater extent to the overall quantification (i.e., have greater weight). Thus, this relation between number of data points and density is not necessarily a limitation. At the same time, it should be noted that with an increasing number of included articles, it may become difficult for the human eye to distinguish between the many different colors.
The third limitation refers to the need for all data to be expressed in the same measurement units. This may lead to working only with part of the data. However, any kind of summary of the data within a study (e.g., representing the data from a given phase using a mean or trend line or an effect size), entails a loss of data, as well as the assumption that the summary is an adequate representation of the raw measurements. Four ways for addressing this potential limitation have been presented in the text (selecting studies that report the same measurement unit, converting scores to the same measurement unit, standardizing the scores, and presenting separate adjacent violin plots per measurement unit).
Regarding future research, field tests of the conclusions that applied researchers could reach regarding a set of studies included in a meta-analysis, when interpreting independently quantitative summaries (e.g., SMD or non-overlap indices), forest plots, and violin plots may be conducted. It could be evaluated whether these three kinds of information lead to similar conclusions. Moreover, it possible to study whether the conclusions of applied researchers based on quantitative summaries, change after they are shown graphical summaries (e.g., forest plots) or violin plots and whether any potential changes are mainly attenuating or emphasizing the perceived degree of intervention effectiveness. Additionally, the social validity of the results needs to be taken into consideration (
Another avenue for future research is to assess the potential of the violin plots at the individual level, for example in the context of MBDs. In case of a sufficiently high number of data points, it would be possible to make side by side violin plots of baseline versus intervention for each setting. This would allow a quick assessment of the score distributions as a supplement to the traditional time-series graphs. In that context, the minimum required number of data points for obtaining meaningful violin plots in the SCED context may be further explored as well.
Data is freely available at
The supplementary materials provided are the R code and datasets used in the research and can be accessed in the
The authors have no funding to report.
The authors have declared that no competing interests exist.
The authors have no additional (i.e., non-financial) support to report.