<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD with MathML3 v1.2 20190208//EN" "JATS-journalpublishing1-mathml3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en">
<front>
<journal-meta><journal-id journal-id-type="publisher-id">METH</journal-id><journal-id journal-id-type="nlm-ta">Methodology</journal-id>
<journal-title-group>
<journal-title>Methodology</journal-title><abbrev-journal-title abbrev-type="pubmed">Methodology</abbrev-journal-title>
</journal-title-group>
<issn pub-type="ppub">1614-1881</issn>
<issn pub-type="epub">1614-2241</issn>
<publisher><publisher-name>PsychOpen</publisher-name></publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">meth.17715</article-id>
<article-id pub-id-type="doi">10.5964/meth.17715</article-id>
<article-categories>
<subj-group subj-group-type="heading"><subject>Original Article</subject></subj-group>

<subj-group subj-group-type="badge">
<subject>Data</subject>
<subject>Code</subject>
<subject>Materials</subject>	
</subj-group>

</article-categories>
<title-group>
<article-title>Bayesian Versus Frequentist Approaches in Multilevel Single-Case Designs: On Type I Error Rate and Power</article-title>
<alt-title alt-title-type="right-running">Bayesian vs Frequentist Approaches in Multilevel SCEDs</alt-title>
<alt-title specific-use="APA-reference-style" xml:lang="en">Bayesian versus frequentist approaches in multilevel single-case designs: On Type I error rate and power</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes"><contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0002-9173-4741</contrib-id><name name-style="western"><surname>Rodríguez-Prada</surname><given-names>Cristina</given-names></name><xref ref-type="corresp" rid="cor1">*</xref><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib>
<contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0002-6700-6832</contrib-id><name name-style="western"><surname>Martínez-Huertas</surname><given-names>José Ángel</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib>
<contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0002-1298-6861</contrib-id><name name-style="western"><surname>Olmos</surname><given-names>Ricardo</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib>
<contrib contrib-type="editor">
<name>
	<surname>Rudas</surname>
	<given-names>Tamás</given-names>
</name>
<xref ref-type="aff" rid="aff3"/>
</contrib>
<aff id="aff1"><label>1</label><institution content-type="dept">Department of Social Psychology and Methodology, School of Psychology</institution>, <institution>Universidad Autónoma de Madrid</institution>, <addr-line><city>Madrid</city></addr-line>, <country country="ES">Spain</country></aff>
<aff id="aff2"><label>2</label><institution content-type="dept">Department of Methodology of Behavioral Sciences, School of Psychology</institution>, <institution>Universidad Nacional de Educación a Distancia</institution>, <addr-line><city>Madrid</city></addr-line>, <country country="ES">Spain</country></aff>
	<aff id="aff3">Eötvös Loránd University, Budapest, <country>Hungary</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>*</label>“Aula PDIF”, Department of Social Psychology and Metholodogy, Universidad Autónoma de Madrid (UAM), Madrid, Spain. <underline><email xlink:href="cristina.rodriguezp@uam.es">cristina.rodriguezp@uam.es</email></underline></corresp>
</author-notes>
<pub-date date-type="pub" publication-format="electronic"><day>27</day><month>03</month><year>2026</year></pub-date>
<pub-date pub-type="collection" publication-format="electronic"><year>2026</year></pub-date>
<volume>22</volume>
<issue>1</issue>

<fpage>52</fpage>
<lpage>76</lpage>
<history>
<date date-type="received">
<day>16</day>
<month>04</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>01</month>
<year>2026</year>
</date>
</history>
<permissions><copyright-year>2026</copyright-year><copyright-holder>Rodríguez-Prada, Martínez-Huertas, &amp; Olmos</copyright-holder><license license-type="open-access" specific-use="CC BY 4.0" xlink:href="https://creativecommons.org/licenses/by/4.0/"><ali:license_ref>https://creativecommons.org/licenses/by/4.0/</ali:license_ref><license-p>This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License, CC BY 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p></license></permissions>
<abstract>
<p>Single-case designs (SCEDs) assess intervention effects through repeated measurements on one or a few individuals. Multilevel models nest repeated measures within individuals and have gained popularity for inferential analysis in SCEDs, in combination with expert knowledge of the clinicians and applied researchers. However, researchers often face model specification challenges without knowing the true population model underlying their data. This study evaluates how model selection criteria (AIC, BIC, WAIC, LOO) conditioned on the selected model impact statistical power and Type I error rates in intervention effects, reflecting the ecological reality where practitioners do not know the true model. A Monte Carlo simulation modelled data of AB designs varying sample size, measurement points, intervention effects, and random effect structures. Competing multilevel models were then fitted and compared using AIC, BIC, WAIC, and LOO to examine the impact of model selection on statistical power and Type I error rates. Results indicated that frequentist criteria performed well in simpler models in terms of power, while Bayesian approaches showed greater robustness with respect to Type I error control. The findings provide practical insights on multilevel model selection under real-world conditions, highlighting Bayesian methods as a robust alternative for applied researchers handling small sample sizes and complex data structures.</p>
</abstract>
<kwd-group kwd-group-type="author"><kwd>single-case designs</kwd><kwd>multilevel analysis</kwd><kwd>Bayesian statistics</kwd><kwd>frequentist analysis</kwd><kwd>statistical power</kwd><kwd>Type I Error rate</kwd></kwd-group>

</article-meta>
</front>
<body>
	<sec sec-type="intro" id="intro"><title/>	
		<p>Single-case experimental designs (SCEDs) provide a valuable framework for analysing intervention effects on individuals through repeated measures (<xref ref-type="bibr" rid="r7">Bono &amp; Arnau, 2014</xref>; <xref ref-type="bibr" rid="r24">Kazdin, 1982</xref>; <xref ref-type="bibr" rid="r62">Shadish &amp; Sullivan, 2011</xref>). Their objective is to establish a functional relationship between an intervention and changes in a behavioural outcome, as applied contexts often hinder meeting traditional causal criteria (logical connection, covariation, temporal precedence, and full control of confounding variables; <xref ref-type="bibr" rid="r23">Kazdin, 1977</xref>; <xref ref-type="bibr" rid="r32">Manolov et al., 2014</xref>; <xref ref-type="bibr" rid="r69">Virués-Ortega &amp; Haynes, 2005</xref>). While traditional qualitative methods, like visual inspection of time-series data, can be somewhat subjective and less effective for detecting subtle changes (<xref ref-type="bibr" rid="r12">Busk &amp; Serlin, 1992</xref>; <xref ref-type="bibr" rid="r24">Kazdin, 1982</xref>; <xref ref-type="bibr" rid="r26">Kratochwill et al., 2013</xref>; <xref ref-type="bibr" rid="r33">Manolov &amp; Moeyaert, 2017</xref>; <xref ref-type="bibr" rid="r50">Parsonson &amp; Baer, 1986</xref>; <xref ref-type="bibr" rid="r66">Van den Noortgate &amp; Onghena, 2003a</xref>), quantitative methods, particularly multilevel models, have increasingly addressed these considerations.</p>
<p>Multilevel linear models (MLMs) address the nested structure of the data, improving the precision of effect estimates and enabling the analysis of contextual variables at multiple levels (<xref ref-type="bibr" rid="r21">Hoffman, 2014</xref>; <xref ref-type="bibr" rid="r33">Manolov &amp; Moeyaert, 2017</xref>; <xref ref-type="bibr" rid="r42">Moeyaert et al., 2020</xref>; <xref ref-type="bibr" rid="r67">Van den Noortgate &amp; Onghena, 2003b</xref>). They effectively address statistical issues like autocorrelation, which are inherent to SCED data and can undermine the validity of parametric tests (<xref ref-type="bibr" rid="r7">Bono &amp; Arnau, 2014</xref>; <xref ref-type="bibr" rid="r20">Gentile et al., 1972</xref>; <xref ref-type="bibr" rid="r25">Keselman &amp; Leventhal, 1974</xref>). MLMs also complement non-parametric methods in SCED analysis, such as non-overlap indices (PND, PEM, NAP; <xref ref-type="bibr" rid="r49">Parker et al., 2011</xref>), which measure improvement across phases but have certain limitations (<xref ref-type="bibr" rid="r58">Rodríguez-Prada &amp; Olmos, 2019</xref>). MLMs provide a comprehensive framework for statistical inference in SCEDs (<xref ref-type="bibr" rid="r8">Botella &amp; Caperos, 2019</xref>).</p>
<p>MLMs can overcome some SCED challenges, such as model specification issues, limited information for estimating parameters and incorrect standard error estimations from techniques that overlook the data’s hierarchical structure (<xref ref-type="bibr" rid="r56">Rodabaugh &amp; Moeyaert, 2017</xref>). By modelling hierarchical structures, MLMs efficiently capture trends and allow flexible error covariance specifications, enhancing robustness and reliability in statistical decisions (<xref ref-type="bibr" rid="r21">Hoffman, 2014</xref>). MLMs also analyse fixed effects, shared among participants, and random effects, which account for individual variability in treatment outcomes (<xref ref-type="bibr" rid="r33">Manolov &amp; Moeyaert, 2017</xref>). This dual-level approach is particularly well-suited for SCEDs, where individual responses are central to understanding intervention effects, and the hierarchical nature of the data must be accounted for to ensure valid conclusions (<xref ref-type="bibr" rid="r41">Moeyaert et al., 2014</xref>; <xref ref-type="bibr" rid="r66">Van den Noortgate &amp; Onghena, 2003a</xref>; <xref ref-type="bibr" rid="r67">2003b</xref>). Researchers can assess individual factors influencing clinical outcomes by measuring between-individual variability by explicitly modelling moderators to address why some benefit more from interventions (<xref ref-type="bibr" rid="r45">Moeyaert et al., 2024</xref>; <xref ref-type="bibr" rid="r44">Moeyaert &amp; Yang, 2021</xref>). MLMs handle complex data structures, specify covariance structures, and model variability, making them suitable for small-sample designs including SCEDs. However, challenges remain in estimating higher-level variance components with few individuals.</p>
<p>Two main approaches to estimate MLMs are frequentist and Bayesian frameworks, differing in parameter estimation and uncertainty quantification. Frequentist methods, typically using maximum likelihood estimation (ML), view parameters as fixed but unknown. Restricted Maximum Likelihood (REML) is preferred for random effects models, as it adjusts for the loss of degrees of freedom in variance component estimation. However, these approaches depend on asymptotic assumptions that require large samples for accuracy (<xref ref-type="bibr" rid="r21">Hoffman, 2014</xref>). In small-sample contexts like SCEDs, these assumptions often fail, resulting in biased covariance estimates despite unbiased fixed effect estimates (<xref ref-type="bibr" rid="r3">Baek et al., 2020</xref>; <xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>; <xref ref-type="bibr" rid="r66">Van den Noortgate &amp; Onghena, 2003a</xref>).</p>
<p>Bayesian estimation has emerged as a promising alternative for small-sample designs in SCED studies (<xref ref-type="bibr" rid="r3">Baek et al., 2020</xref>; <xref ref-type="bibr" rid="r38">McNeish, 2016</xref>). Unlike frequentist methods, it incorporates prior information about parameters, helping to mitigate limitations of datasets with few Level-2 units (<xref ref-type="bibr" rid="r19">Gelman et al., 2013</xref>; <xref ref-type="bibr" rid="r65">van de Schoot et al., 2015</xref>). Selecting appropriate prior distributions is crucial in estimating Level-2 variance components, as weak or overly restrictive priors can impact results (<xref ref-type="bibr" rid="r5">Baek &amp; Ferron, 2020</xref>; <xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>). Balancing informative and weakly informative priors is essential to prevent bias and ensure robustness. Additionally, Bayesian methods use advanced algorithms like Markov Chain Monte Carlo (MCMC; <xref ref-type="bibr" rid="r9">Brooks, 1998</xref>) to estimate parameters in complex models, enhancing precision and stability (<xref ref-type="bibr" rid="r38">McNeish, 2016</xref>; <xref ref-type="bibr" rid="r55">Rindskopf, 2014</xref>).</p>
<p>Whether using frequentist or Bayesian frameworks, a major challenge in applying MLMs to SCED data is selecting the best model and its random effect structure. This choice affects the balance between Type I error rates and statistical power, influencing hypothesis testing conclusions (<xref ref-type="bibr" rid="r34">Matuschek et al., 2017</xref>). Underparameterized models may oversimplify data and miss key patterns, while overparameterized models can inflate standard errors and reduce statistical power (<xref ref-type="bibr" rid="r21">Hoffman, 2014</xref>; <xref ref-type="bibr" rid="r35">Martínez-Huertas et al., 2022</xref>; <xref ref-type="bibr" rid="r36">Martínez-Huertas &amp; Olmos, 2022</xref>). Thus, model selection is essential for valid and meaningful statistical decisions in SCED data interpretation, in combination with expert knowledge of the clinicians and applied researchers.</p>
<p>A practical approach uses information criteria to compare models, rewarding good fits and penalising overfitting. These criteria arise from Kullback-Leibler (KL) divergence and entropy, estimating model generalizability by prioritising predictive accuracy over simple goodness of fit. Frequentist methods employ the Akaike Information Criterion (AIC; <xref ref-type="bibr" rid="r1">Akaike, 1998</xref>) and Bayesian Information Criterion (BIC; <xref ref-type="bibr" rid="r61">Schwarz, 1978</xref>), with AIC favouring models that minimise information loss and BIC selecting the most probable model under a Bayesian framework with stricter complexity penalties (<xref ref-type="bibr" rid="r53">Raftery, 1995</xref>; <xref ref-type="bibr" rid="r71">Weakliem, 1999</xref>). Bayesian approaches use indices like the Watanabe-Akaike Information Criterion (WAIC; <xref ref-type="bibr" rid="r70">Watanabe, 2010</xref>) and Leave-One-Out Cross-Validation (LOO; <xref ref-type="bibr" rid="r68">Vehtari et al., 2017</xref>) to enhance model evaluation by incorporating posterior distributions. In contrast to frequentist criteria, WAIC prioritises predictive utility over determining the true population model (<xref ref-type="bibr" rid="r48">Nicenboim &amp; Vasishth, 2016</xref>), while LOO assesses model performance using cross-validation across data points, which makes it robust to outliers. Both indices select the model with the lowest value, and WAIC and LOO often outperform AIC and Deviance Information Criterion (DIC; <xref ref-type="bibr" rid="r64">Spiegelhalter et al., 2002</xref>), offering greater reliability in comparisons (<xref ref-type="bibr" rid="r19">Gelman et al., 2013</xref>). In the present simulation study, the performance of all indices is assessed according to their ability to identify the data-generating model.</p>

<sec><title>Aims of the Present Study</title>
<p>This study compares Bayesian and frequentist methods in MLMs for model selection in SCED data via simulation. It focuses on the information criteria AIC, BIC, WAIC, and LOO fit indices to select the true population model and analyze their impact on statistical power and Type I errors. Three objectives are established. The first objective is to evaluate if various Bayesian priors present differences regarding statistical power and Type I error rates. Building on <xref ref-type="bibr" rid="r43">Moeyaert et al. (2017)</xref> and <xref ref-type="bibr" rid="r3">Baek et al. (2020)</xref>, we explore how prior specifications affect robustness and accuracy. The second objective assesses selection accuracy using frequentist (AIC, BIC) and Bayesian (WAIC, LOO) information criteria, aiming for consistency in the Bayesian framework. The third examines how model selection affects statistical power and Type I error rates in intervention effects, identifying optimal estimation methods under unknown population models. In this context, we evaluate the operating characteristics (Type I error and power) of different model selection procedures (AIC, BIC, WAIC, LOO) when the data-generating process is unknown. Thus, this study addresses the gap of analysing model selection's impact from an ecological viewpoint, reflecting researchers’ experiences in applied settings. In practical terms, this paper examines how AIC, BIC, WAIC, and LOO behave in terms of statistical power and Type I error rates when the true data-generating multilevel model is unknown — a common situation in applied SCED studies. Based on these results, the study aims to provide an evidence-based recommendation on which criterion offers the best trade-off between power and Type I error control under model uncertainty.</p></sec></sec>
<sec sec-type="methods"><title>Method</title>
<sec><title>Simulation Study Design</title>
<p>A Monte Carlo simulation of an SCED AB-design with baseline (A) and intervention (B) phases was conducted. The dependent variables included statistical power, Type I error rate, and proportion of true model selection per information criteria. Statistical power indicates the rate of correctly detecting the intervention effect, while the Type I error rate measures the incorrect identification of an effect when none exists. Correct model selection quantifies how often the information criteria identified the true population model during comparison.</p></sec>
<sec><title>Data Generation and Population Values</title>
<p>SCEDs data were generated based on four models varying in the number of random effects (covariance parameters). The models were:</p>
<list id="L1" list-type="simple">
<list-item>
<p>1. <bold>Minimal model</bold>: Assumes a constant for the initial value and the intervention effect across individuals. This means that the dependent variable is initially absent or at a floor level for all individuals, and that the intervention effect does not present variation between individuals:<disp-formula id="e"><mml:math id="m1"><mml:mrow><mml:msub><mml:mtext>Y</mml:mtext><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mtext>γ</mml:mtext><mml:mrow><mml:mn>00</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mtext>γ</mml:mtext><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub><mml:mtext>*</mml:mtext><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mtext>e</mml:mtext><mml:mrow><mml:mtext>ij</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:math>,</disp-formula>
</p></list-item>
</list>
	<p>where <inline-formula><mml:math id="m2"><mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the response variable for individual <italic>j</italic> at observation <italic>i</italic>, <inline-formula><mml:math id="m3"><mml:mrow><mml:msub><mml:mtext>γ</mml:mtext><mml:mrow><mml:mn>00</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents baseline mean (intercept), <inline-formula><mml:math id="m4"><mml:mrow><mml:msub><mml:mtext>γ</mml:mtext><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the average intervention effect (slope), <inline-formula><mml:math id="m5"><mml:mrow><mml:msub><mml:mtext>e</mml:mtext><mml:mrow><mml:mtext>ij</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the Level-1 residual error for individual <italic>j</italic> at observation <italic>i</italic> and <inline-formula><mml:math id="m6"><mml:mrow><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is a dummy-coded predictor indicating whether observation <italic>i</italic> for individual <italic>j</italic> belongs to the baseline phase (0) or the intervention phase (1). This model does not include random effects.</p>
<list id="L2" list-type="simple">
<list-item>
<p>2. <bold>Partial intercepts model:</bold> Includes random intercepts for baseline differences among individuals. This model is relevant in SCED contexts where individuals have different baseline levels, but the intervention is expected to produce a uniform effect:<disp-formula id="e___1"><mml:math id="m7"><mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>00</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub><mml:mo>⋅</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>,</disp-formula>
</p></list-item>
</list>
<p>where <inline-formula><mml:math id="m8"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the random intercept in the baseline for individual <italic>j</italic>.</p>
<list id="L3" list-type="simple">
<list-item>
<p>3. <bold>Partial slopes model:</bold> Incorporates random slopes to represent different intervention effects among individuals, essential in SCED contexts where responses are initially low or absent, or where individual characteristics may influence intervention outcomes:<disp-formula id="e___2"><mml:math id="m9"><mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>00</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub><mml:mo>⋅</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>⋅</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>,</disp-formula>
</p></list-item>
</list>
<p>where <inline-formula><mml:math id="m10"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the random slope for individual <italic>j</italic>.</p>
<list id="L4" list-type="simple">
<list-item>
<p>4. <bold>Maximal model</bold>: Includes both random intercepts and slopes, accounting for variability in both baseline scores and intervention effects. This is the most ecological model because it considers the variability that an individual's learning history and personal factors can hold:<disp-formula id="e___3"><mml:math id="m11"><mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>00</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub><mml:mo>⋅</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>⋅</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>,</disp-formula>
</p></list-item>
</list>
<p>where <inline-formula><mml:math id="m12"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="m13"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> are random intercepts and slopes for individual <italic>j</italic>, respectively.</p>
<p>In all the population models, fixed effects were set at <inline-formula><mml:math id="m14"><mml:mrow><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:math></inline-formula>, random effects followed normal distributions <inline-formula><mml:math id="m15"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>∼</mml:mo><mml:mi>N</mml:mi><mml:mfenced><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="m16"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>∼</mml:mo><mml:mi>N</mml:mi><mml:mfenced><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula>, and residual variance was set to <inline-formula><mml:math id="m17"><mml:mrow><mml:mi>N</mml:mi><mml:mfenced><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula>. Random effects were uncorrelated (<italic>r</italic> = 0) to minimise model complexity, as suggested by <xref ref-type="bibr" rid="r66">Van den Noortgate and Onghena (2003a)</xref> and <xref ref-type="bibr" rid="r43">Moeyaert et al. (2017)</xref>. In addition, residuals were assumed to be independent over time, and no autocorrelation or secular nor temporal trends were simulated.</p></sec>
<sec><title>Simulation Conditions</title>
<p>The simulation comprised 144 scenarios from various factor combinations:</p>
<list id="L5" list-type="simple">
<list-item>
<p>1️. Number of individuals (<italic>N<sub>j</sub></italic> = 3, 5, 7) (<xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>; <xref ref-type="bibr" rid="r62">Shadish &amp; Sullivan, 2011</xref>).</p></list-item></list>
<list id="L5.1" list-type="simple"><list-item>
<p>2️. Number of repeated measurements (<italic>RRMM<sub>j</sub></italic> = 10, 20, 30, 40) (<xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>, <xref ref-type="bibr" rid="r62">Shadish &amp; Sullivan, 2011</xref>). We added two additional values (10 and 30) to increase design resolution across short-to-moderate series lengths and to assess whether performance changes monotonically with series length.</p></list-item></list>
<list id="L5.2" list-type="simple"><list-item>
<p>3️. Effect size for intervention effect <inline-formula><mml:math id="m18"><mml:mrow><mml:msub><mml:mi>γ</mml:mi><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>: null (<italic>d</italic> = 0), medium (<italic>d</italic> = 1.15), and large (<italic>d</italic> = 2.70), following <xref ref-type="bibr" rid="r15">Ferguson’s (2009)</xref> guidelines.</p></list-item></list>
<list id="L5.3" list-type="simple"><list-item>
<p>4️. Population model structure, ranging from a Minimal model (no random effects) to increasingly complex specifications (Partial Intercepts, Partial Slopes, and Maximal models). These random structures are in line with previous simulation studies (<xref ref-type="bibr" rid="r35">Martínez-Huertas et al., 2022</xref>; <xref ref-type="bibr" rid="r36">Martínez-Huertas &amp; Olmos, 2022</xref>; <xref ref-type="bibr" rid="r34">Matuschek et al., 2017</xref>).</p></list-item></list>

<p>In each scenario, four model information criteria were evaluated: frequentist criteria (AIC, BIC) and Bayesian criteria (WAIC, LOO), allowing a comparative analysis of selection approaches. Seven different priors were tested to assess their effects on statistical power, Type I error rates for the intervention effect, and the accuracy of model selection using WAIC and LOO. In line with the default parameterisation in <italic>brms</italic>, weakly informative priors (see <xref ref-type="bibr" rid="r18">Gelman, 2006</xref>; <xref ref-type="bibr" rid="r37">McElreath, 2020</xref>) were specified on the standard deviations of the random effects (σᵤ<sub>0</sub> and σᵤ<sub>1</sub>), not on the variance components (σ<sup>2</sup>ᵤ<sub>0</sub> and σ<sup>2</sup>ᵤ<sub>1</sub>). Because standard deviations are constrained to be positive, Half-Cauchy and Half-normal priors with scale parameters of 10, 20, and 50 were used, alongside a weakly informative Uniform (0, 100) prior. Following the approach outlined by <xref ref-type="bibr" rid="r43">Moeyaert et al. (2017)</xref>, who derived these values from reanalyses of empirical SCED studies with normally distributed continuous outcomes, our aim was to reflect plausible ranges for variance components in this context. This approach allows the priors to regularize estimation in small-sample settings while still letting the normally distributed outcome data contribute substantially to the results. Increasing the scale parameter in the Half-Cauchy and Half-Normal distributions makes the prior less informative by allowing for greater variability in the variance components. A smaller scale concentrates the prior mass closer to zero, while a larger scale spreads the distribution and assigns more weight to larger variance values, thus reducing the prior's influence. We adopted this approach because one of the main challenges in this context is selecting the appropriate prior distribution. Although future studies should explore this issue in greater depth, our current aim is to increase variability so that the prior parameterization remains sensible while still allowing the data to provide valuable insight.</p></sec>
<sec><title>Conducting the Simulation Study</title>
	<p>Five hundred replications were conducted for each of the 144 simulation conditions in R. Frequentist models used the <italic>lme4</italic> package (<xref ref-type="bibr" rid="r10">Bates et al., 2015</xref>) for models with random effects and <italic>nlme</italic> package (<xref ref-type="bibr" rid="r52">Pinheiro et al., 2021</xref>) for the minimal model, always using REML as the estimation method and Satterthwaite’s approximation to the denominator degrees of freedom (via <italic>lmerTest</italic>). Bayesian estimation utilised the <italic>brms</italic> package (<xref ref-type="bibr" rid="r11">Bürkner, 2017</xref>), employing the Hamiltonian Monte Carlo (HMC) algorithm with its NUTS extension for efficient sampling of complex models (<xref ref-type="bibr" rid="r22">Hoffman &amp; Gelman, 2014</xref>). Bayesian models were fitted using two chains of 1,000 iterations each, with 400 warm-up iterations per chain, yielding a total of 1,200 post–warm-up draws per model. The <italic>adapt_delta</italic> parameter was set to 0.95 to reduce divergent transitions and enhance sampling reliability. Extracted data included point estimates, standard deviations, <italic>p</italic> values (frequentist), and posterior means with 95% credible intervals (Bayesian). Information Criteria fit indices (AIC, BIC, WAIC, LOO) and convergence diagnostics were also extracted <inline-formula><mml:math id="m19"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mover accent="true"><mml:mi>R</mml:mi><mml:mo>^</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula>, with<inline-formula><mml:math id="m20"><mml:mrow><mml:mo> </mml:mo><mml:mover accent="true"><mml:mi>R</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo>&lt;</mml:mo><mml:mn>1.1</mml:mn></mml:mrow></mml:math></inline-formula> indicating satisfactory convergence). According to this criterion, approximately 99% of the replicas showed appropriate convergence. The convergence dataset is available on our OSF project (see <xref ref-type="supplementary-material" rid="r59">Rodríguez-Prada et al., 2026a</xref>).</p></sec>
<sec><title>Data Analysis</title>
	<p>Data processing and analysis were performed in R using RStudio. Power and Type I error rates were calculated based on intervention effects. Frequentist decisions used a significance threshold (<italic>p</italic> &lt; 0.05), while Bayesian ones relied on 95% credible intervals (CIs). Model selection was assessed using AIC, BIC, WAIC, and LOO by comparing each candidate model (minimal, partial intercepts, partial slopes, maximal) to the population model. In addition, analysis of variance (ANOVA) was conducted to examine the influence of simulation parameters on performance indices, and partial eta-squared (<inline-formula><mml:math id="m21"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi fontstyle="italic">p</mml:mi><mml:mo>2</mml:mo></mml:msubsup> </mml:mrow></mml:math></inline-formula>) was reported as a measure of effect size. Partial eta-squared was used as a relative indicator of the impact of simulation conditions. Effect size magnitudes were not interpreted using conventional cut-off values, as the aim was not to compare absolute effect sizes with previous studies but to examine differences across simulation conditions. Partial eta-squared was selected over eta-squared because it provides an unbiased estimate of the unique variance explained by each factor in multifactorial designs (<xref ref-type="bibr" rid="r54">Richardson, 2011</xref>). Consistent with prior methodological simulation studies (e.g., <xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>), cut-off points of .01, .06, and .14 were adopted to classify small, medium, and large effects, respectively (<xref ref-type="bibr" rid="r13">Cohen, 1988</xref>). This approach allowed us to identify the most influential simulation parameters beyond statistical significance. The dataset, prepared for the dissemination of scientific data, is available on the Open Science Framework at <xref ref-type="bibr" rid="r59">Rodríguez-Prada et al. (2026a)</xref>. This repository contains the scripts for data processing and analysis, which are also available at <xref ref-type="bibr" rid="r57">Rodríguez-Prada (2025)</xref>.</p></sec></sec>
<sec sec-type="results"><title>Results</title>
<sec><title>Analysis of Different Priors on Power, Type I Error Rate, and Model Selection in Bayesian Methods</title>
	<p>Seven Bayesian priors were evaluated for estimating variance components across four population multilevel models under the simulated conditions (Table S1, <xref ref-type="bibr" rid="r60">Rodríguez-Prada et al., 2026b</xref>). Minimal differences were observed in statistical power among the priors (<inline-formula><mml:math id="m22"><mml:mrow><mml:mi>F</mml:mi><mml:mfenced><mml:mrow><mml:mn>6</mml:mn><mml:mo>,</mml:mo><mml:mtext> </mml:mtext><mml:mn>215136</mml:mn></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mtext> </mml:mtext><mml:mn>0.76</mml:mn><mml:mo>,</mml:mo><mml:mtext> </mml:mtext><mml:mi fontstyle="italic">p</mml:mi><mml:mo>&lt;</mml:mo><mml:mn>.001</mml:mn><mml:mo>,</mml:mo><mml:msubsup><mml:mi>η</mml:mi><mml:mi fontstyle="italic">p</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mn>0.002</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, Type I error rates (<italic>F</italic>(6, 107568) = 0.34, <italic>p</italic> = 0.916, <inline-formula><mml:math id="m23"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = &lt; .001), and model selection accuracy (<italic>F</italic>(6, 428638) = 1.50, <italic>p</italic> = 0.174; <inline-formula><mml:math id="m24"><mml:mrow><mml:msubsup><mml:mtext>η</mml:mtext><mml:mi fontstyle="italic">p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = &lt; .001). Statistical power remained moderate (≈ 0.62), Type I error rates were conservative (≈ 0.033), and WAIC and LOO correctly identified the population model in 82% of cases. The truncated Cauchy prior <inline-formula><mml:math id="m25"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mtext>Half</mml:mtext><mml:mo>−</mml:mo><mml:mtext>Cauchy</mml:mtext><mml:mo>∼</mml:mo><mml:mfenced><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula> was selected for further analyses due to slightly higher power and proximity to the nominal Type I error rate of 0.05, reflecting its suitability for this simulation context.</p></sec>
<sec><title>Model Selection Using Information Criteria</title>
	<p>Correct model selection rates for frequentist (AIC, BIC) and Bayesian (WAIC, LOO) indices showed minimal overall differences (<italic>F</italic>(1.64, 117212) = 1557.99, <italic>p</italic> &lt; .001; <inline-formula><mml:math id="m26"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = 0.007). However, there was a significant interaction between the index and population model (<italic>F</italic>(4.93, 117212.53) = 2883.85, <italic>p</italic> &lt; .001; <inline-formula><mml:math id="m27"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = .11). Marginal means (Table S2, <xref ref-type="bibr" rid="r60">Rodríguez-Prada et al., 2026b</xref>) indicate that WAIC and LOO excel at identifying the maximal model but perform worse with simpler models, selecting them 76%–82% of the time. In contrast, frequentist indices favour simpler models: BIC overwhelmingly selects the minimal model (98%), while AIC demonstrates a more balanced across all population models (<xref ref-type="fig" rid="f1">Figure 1</xref>). Although the interaction effects of the information criteria with both the number of individuals (<italic>F</italic>(3.28, 117212.53) = 213.47; <italic>p</italic> &lt; 0.001; <inline-formula><mml:math id="m28"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = 0.002) and repeated measures (<italic>F</italic>(4.93, 117212.53) = 88.85; <italic>p</italic> &lt; 0.001; <inline-formula><mml:math id="m29"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = 0.001) were small, they reveal noteworthy patterns. More individuals and repeated measures improve correct model selection. AIC and LOO are less sensitive to these factors, while frequentist indices, especially BIC, show greater variability. Overall, <xref ref-type="fig" rid="f1">Figure 1</xref> shows that BIC and AIC consistently outperform WAIC and LOO in simpler structures (minimal and partial–intercepts models), whereas their performance decreases when slope variability is included in the model. In contrast, WAIC and LOO provide relatively stable, though lower, accuracy across models. Thus, the results suggest that the choice of information criterion has important implications: BIC and AIC tend to favor parsimony, whereas WAIC and LOO yield more balanced but less accurate selections.</p><fig id="f1" position="anchor" fig-type="figure" orientation="portrait"><label>Figure 1</label><caption><title>Proportion of Correct Selections (Accuracy Rate) for Each Relative Fit Index Based on Population Models</title></caption><graphic xlink:href="meth.17715-f1" position="anchor" orientation="portrait"/></fig>
<p>A detailed analysis of errors in BIC, AIC, WAIC, and LOO (i.e., the model selected when the fit index fails to identify the correct one) reveals distinct error patterns. BIC tends to overly penalise complex models and, when incorrect, selects one of the simpler models (minimal or partial-intercepts) in 77.8% of the occasions. In contrast, WAIC and LOO often err by selecting the maximal model nearly half the time. AIC shows the most balanced error distribution, although it slightly prefers the random-intercepts model (<xref ref-type="fig" rid="f2">Figure 2</xref>). When AIC and BIC misidentify the true model, the most frequent error is selecting simpler models (e.g., BIC favours partial-intercepts at 47.8%). In contrast, WAIC and LOO tend to misclassify more complex models, with nearly half of errors involving the maximal specification (46.3% for LOO and 47.9% for WAIC). These results suggest that AIC and BIC are biased toward parsimony, whereas WAIC and LOO are more likely to overfit, highlighting systematic tendencies in model misclassification.</p><fig id="f2" position="anchor" fig-type="figure" orientation="portrait"><label>Figure 2</label><caption><title>Distribution of Errors: Model Selection When the Information Criteria Fails to Identify the True Model</title></caption><graphic xlink:href="meth.17715-f2" position="anchor" orientation="portrait"/></fig></sec>
<sec><title>Power and Type I Error Rate Conditioned on Model Selection</title>
<p>Power and Type I error rates conditioned on model selection were analyzed, simulating scenarios in which the true population model is unknown, as happens in ecological contexts.</p>
	<p>On power, ANOVA results showed a significant main effect of the fit index (<italic>F</italic>(1.27, 60132.31) = 2702.46; <italic>p</italic> &lt; 0.001; <inline-formula><mml:math id="m30"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = 0.05), with BIC yielding the highest power, followed by AIC, WAIC and LOO (Table S3, <xref ref-type="bibr" rid="r60">Rodríguez-Prada et al., 2026b</xref>). However, for BIC, this increase in power was accompanied by substantially inflated Type I error rates in more complex models. Significant effects were also observed for population model (<italic>F</italic>(3, 47482) = 11393.98; <italic>p</italic> &lt; 0.001; <inline-formula><mml:math id="m31"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = 0.42), effect size of the intervention (<italic>F</italic>(1, 47482) = 9713.90; <italic>p</italic> &lt; 0.001; <inline-formula><mml:math id="m32"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = 0.17), and number of individuals (<italic>F</italic>(2, 47482) = 2782.18; <italic>p</italic> &lt; 0.001; <inline-formula><mml:math id="m33"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = 0.10). Larger effects and sample sizes enhance power, whereas higher error variance in models, as in maximal models, diminishes it (Table S4, <xref ref-type="bibr" rid="r60">Rodríguez-Prada et al., 2026b</xref>).</p>
	<p>The interaction between the population model and effect size (<italic>F</italic>(3, 47482) = 2309.91; <italic>p</italic> &lt; 0.001; <inline-formula><mml:math id="m34"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = 0.108) showed stable statistical power with large effect sizes across models. However, there were significant declines for moderate effects, particularly in Bayesian methods, reflecting their conservative nature (<xref ref-type="fig" rid="f3">Figure 3</xref>; Table S4, <xref ref-type="bibr" rid="r60">Rodríguez-Prada et al., 2026b</xref>). Overall, across all criteria, statistical power is high and stable for simpler models (minimal and partial–intercepts) but declines sharply when slope variability and maximal structures are introduced. The drop is most pronounced for smaller effects (ES = 1.15), where power often falls below .40, particularly for LOO and WAIC. Larger effects (ES = 2.70) mitigate but do not eliminate this decline. These results indicate that model complexity disproportionately reduces power for detecting smaller effects, with Bayesian indices (LOO, WAIC) being more sensitive to this loss than frequentist ones (AIC, BIC).</p><fig id="f3" position="anchor" fig-type="figure" orientation="portrait"><label>Figure 3</label><caption>
<title>Effects of the Interaction Between Population Model, Effect Size and Fit Index on Conditioned Power</title></caption><graphic xlink:href="meth.17715-f3" position="anchor" orientation="portrait"/></fig>
	<p>The model fit index, conditioned on the selected model, significantly impacts the Type I error rate (<italic>F</italic>(1.49, 35636.95) = 511.52, <italic>p</italic> &lt; .001, <inline-formula><mml:math id="m35"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = .02). Although the effect size is small, meaningful differences emerge among indices. Frequentist indices tend to exceed the nominal Type I error rate, while the Bayesian indices demonstrate a more conservative behaviour, staying closer to the nominal value of 0.05 (Table S5, <xref ref-type="bibr" rid="r60">Rodríguez-Prada et al., 2026b</xref>). The interaction between population model and model fit indices is statistically significant but small (<italic>F</italic>(4.47, 35636.95) = 165.28, <italic>p</italic> &lt; .001, <inline-formula><mml:math id="m36"><mml:mrow><mml:msubsup><mml:mi>η</mml:mi><mml:mi>p</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo> </mml:mo></mml:mrow></mml:math></inline-formula> = .02). Examining marginal means (Table S5, <xref ref-type="bibr" rid="r60">Rodríguez-Prada et al., 2026b</xref>), the BIC index shows an unacceptable Type I error rate for complex models, reaching 13% for the maximal model. The AIC index is less sensitive than BIC, with a Type I error rate of 8.9% for the maximal model. WAIC and LOO remain stable across all population models, aligning closely with the nominal level in complex models (<xref ref-type="fig" rid="f4">Figure 4</xref>). Thus, error rates remain close to the nominal .05 level for minimal and partial–intercepts models across all criteria. However, as complexity increases, AIC and BIC show inflated error rates, particularly under maximal specifications (AIC ≈ .13, BIC ≈ .09). By contrast, Bayesian fit indices (WAIC and LOO) maintain more stable control of Type I error, even in complex models. These findings suggest that AIC and BIC may increase the risk of false positives when applied to highly parameterized structures, whereas WAIC and LOO provide more conservative performance.</p><fig id="f4" position="anchor" fig-type="figure" orientation="portrait"><label>Figure 4</label><caption>
<title>Type I Error Rate Conditioned on Model Selection as a Function of the Population Model</title></caption><graphic xlink:href="meth.17715-f4" position="anchor" orientation="portrait"/></fig>
<p>In summary, BIC exhibited the highest values for both statistical power and the Type I error rate. WAIC and LOO yielded the lowest values, remaining close to but not exceeding the nominal 0.05 level for false positives. AIC demonstrated a balanced performance between statistical power and the Type I error rate. Interaction effects highlighted the influence of between-subject factors on both power and the Type I error rate, particularly the intervention effect size, sample size, and population model for power; and only the interaction with the population model for the Type I error rate.</p></sec></sec>
<sec sec-type="discussion"><title>Discussion</title>
<p>Statistical modelling has historically been underutilised in SCEDs within clinical and psychological contexts due to inherent challenges including small sample sizes, limited observations, and reduced data availability. However, advances in methods such as MLMs and Bayesian approaches offer promising solutions (<xref ref-type="bibr" rid="r4">Baek &amp; Ferron, 2013</xref>; <xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>). These methods are particularly valuable for estimating covariance parameters, enabling researchers to quantify between-individual subject variability and explore its causes (e.g., why some individuals benefit more from an intervention than others). At a descriptive level, multilevel models enable researchers to quantify intervention effects, estimate the magnitude and direction of changes in both levels and slopes, and assess their practical significance (<xref ref-type="bibr" rid="r16">Ferron et al., 2009</xref>). Several studies have focused on exploring estimation in terms of bias and standard errors (<xref ref-type="bibr" rid="r3">Baek et al., 2020</xref>; <xref ref-type="bibr" rid="r17">Ferron et al., 2010</xref>; <xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>). However, there has been limited focus on other statistical properties of MLMs. A key objective of this study was to compare the performance of model fit information criteria from two different frameworks (frequentist via REML estimation and Bayesian with varying levels of weakly-informativeness) in a context where the true population model is unknown.<xref ref-type="fn" rid="fn1"><sup>1</sup></xref><fn id="fn1"><label>1</label>
<p>It is worth noting that ‘statistical significance’ is not presented as evidence of population generalization nor of practical relevance in this study. We use it only as an operating property to calibrate model-selection procedures, acknowledging that random-effects (mis)specification can alter Type I error, power, and uncertainty estimates (e.g., confidence intervals, effect sizes, etc.). This perspective complements prior SCED work centered on bias and precision (<xref ref-type="bibr" rid="r3">Baek et al., 2020</xref>; <xref ref-type="bibr" rid="r17">Ferron et al., 2010</xref>; <xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>) and keeps the emphasis on effect sizes with uncertainty for practical interpretation.</p></fn> Bayesian methods showed no significant differences in statistical power, Type I error rate, and model selection among the chosen priors, confirming previous studies (<xref ref-type="bibr" rid="r3">Baek et al., 2020</xref>; <xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>). The half-Cauchy and half-normal priors, which favor smaller variances, performed slightly better, as suggested in earlier research. However, prior selection is context-dependent, with effectiveness varying by intervention type, outcome characteristics, and research design. Commonly used priors, like half-normal or half-Cauchy distributions, are often based on simulations and may not suit empirical cases where the dependent variable has greater dispersion. Priors should be tailored to the outcome's scale and variability for robust estimation. A stronger integration between simulation and applied studies is needed to improve these decisions. As highlighted by <xref ref-type="bibr" rid="r43">Moeyaert et al. (2017)</xref>, prior calibration can be informed by reanalyses of empirical SCEDs which helps to ensure that the scale of the priors reflects realistic ranges for variance components. For applied researchers, however, translating prior knowledge into formal prior distributions remains challenging. More empirical work and meta-analytical evidence are needed to provide practical guidance on how to derive informative priors in SCED contexts.</p>
<p>The information criteria (AIC, BIC, WAIC, LOO) showed minor differences in their ability to identify the true data-generating model. This finding partially aligns with those of <xref ref-type="bibr" rid="r43">Moeyaert et al. (2017)</xref>, which reported equivalence between REML and Bayesian methods for estimating fixed effects, as both frameworks produced similar results. However, interaction effects revealed nuanced behaviours: Bayesian indices tended to favour complex models, while frequentist indices were more accurate with simpler models. This pattern suggests that Bayesian methods may lean toward overparameterisation, whereas frequentist methods might prefer underparameterized models. More recent contributions have extended the scope of MLM research in SCEDs by exploring novel approaches, including the application of generalized linear models (GLMs) and refinements in variance component estimation. <xref ref-type="bibr" rid="r29">Li et al. (2024)</xref> recommends using AIC and BIC to select an optimal model in case of count data with overdispersion; but if there is overdispersion and zero-inflation, the recommendation is to use methods with lower penalty for these complex models. However, no Bayesian option was explored in <xref ref-type="bibr" rid="r29">Li et al. (2024)</xref>. Recent simulation work has further emphasized the importance of model specification for variance components in SCEDs. <xref ref-type="bibr" rid="r30">Li et al. (2022)</xref> demonstrated that biased estimates of between-case variance are particularly problematic when the true variance is small, and that unconstrained optimization methods combined with post hoc model selection procedures (e.g., bootstrap-based RLRT) can improve estimation accuracy and inference. These findings highlight that not only the choice between frequentist and Bayesian approaches but also the technical details of variance component estimation and covariance structure specification critically affect the robustness of MLMs applied to SCEDs.</p>
<p>It is worth mentioning that there are some differences between fixed and random effects regarding ecological validity and generalization of models in SCEDs depending on their complexity. On the one hand, simpler models may have lower ecological validity despite their greater generalizability. At the same time, overparameterised models may lead to problems of model fit and reduced predictive validity, particularly when unnecessary fixed effects are added. In the field of SCEDs, including additional fixed effects can enrich the description of treatment effects, and adding more random effects is often crucial for adequately modelling change. But this increases the estimation challenges. Our findings align with previous literature suggesting that Bayesian methods may offer a valuable alternative to handle this kind of complex model (<xref ref-type="bibr" rid="r3">Baek et al., 2020</xref>; <xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>; <xref ref-type="bibr" rid="r67">Van den Noortgate &amp; Onghena, 2003b</xref>). This may be particularly true considering that real SCEDs are generally more complex than those simulated in methodological studies, making the Bayesian framework potentially more suitable for real-world data. In contrast, frequentist approaches may be more appropriate for simpler scenarios as, for example, frequentist methods struggle to estimate covariance parameters in complex models (<xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>). AIC performs well across models, making it suitable for straightforward scenarios. In contrast, WAIC and LOO excel in complex models, being less affected by sample size and repeated measures, limitations often faced in SCEDs. BIC’s performance varies and is less reliable with complex models. Each information criterion shows distinct error patterns: WAIC and LOO favor complex models, whereas AIC and BIC lean towards simpler ones, consistent with previous findings (<xref ref-type="bibr" rid="r35">Martínez-Huertas et al., 2022</xref>; <xref ref-type="bibr" rid="r36">Martínez-Huertas &amp; Olmos 2022</xref>). This divergence may reflect frequentist methods’ limitations in estimating random effects in complex settings, highlighting the importance of understanding the unique strengths of each index in different modelling contexts.</p>
<p>The analysis of power and Type I error rates was focused on emulating real-world conditions researchers face, where the true data-generation model is unknown. When the intervention effect size was large, no significant differences in statistical power were found between Bayesian and frequentist methods, indicating that either can offer reasonable conclusions. However, frequentist methods, especially BIC, exhibited higher Type I error rates in complex scenarios. Higher power driven by inflated Type I error is not desirable. AIC showed better performance with a moderate excess of Type I error (8.9%) and a good balance with statistical power. Bayesian methods, while slightly conservative (Error rate ≈ 0.04), offered more stable and consistent results across conditions. Therefore, despite limitations and considering all these findings, Bayesian methods may be more robust in complex SCED settings than frequentist methods.</p>
<p>Model specification significantly influences statistical performance as it affects the estimation of confidence intervals, standard errors, and effect sizes. For instance, when standardising mean differences, using biased standard errors can result in biased effect size estimates. Model detectability varies based on complexity, information criteria, and estimation methods. This study analysed four models of increasing complexity, ranging from one with no random effects to one with random intercepts and random slopes. In this simulation, simpler models, which assumed no individual variability in intervention effects (an unrealistic assumption), yield lower standard errors from residual variance, enhancing detectability but overlook stochastic dependencies in nested data, assuming independence (<xref ref-type="bibr" rid="r40">Moerbeek, 2004</xref>). While fitting well, they may not represent real scenarios. Additional variability from random slopes heightens standard errors of the intervention effect, complicating detection. Increased error variance in complex models might explain lower performance, with detectability improving only with larger intervention effect sizes. Bayesian methods excel in these scenarios, at least using weakly informative priors and fit indices like WAIC or LOO. In fact, models can become more complex with relevant effects like temporal trends (<xref ref-type="bibr" rid="r3">Baek et al., 2020</xref>; <xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>), making them more intricate than those examined here. The MLMs simulated in this study were simpler, enabling controlled exploration of fit indices’ performance. Unreported simulations revealed estimating the correlation between slopes and intercepts often yielded impossible values or failed. This simplicity may clarify the stronger performance of frequentist methods over Bayesian approaches in some conditions. However, as complexity increases, Bayesian methods are expected to outperform frequentist methods, especially with small sample sizes and few repeated measures.</p>
<p>Bayesian methods have distinct advantages for SCEDs, especially with complex random effects. They have been successfully applied in situations with heterogeneous Level-1 variances (<xref ref-type="bibr" rid="r4">Baek &amp; Ferron, 2013</xref>) and autocorrelated error terms with trends (<xref ref-type="bibr" rid="r46">Natesan, 2019</xref>), which are challenging for frequentist methods. Bayesian approaches’ flexibility makes them ideal for these complexities. Software like the <italic>brms</italic> package (<xref ref-type="bibr" rid="r11">Bürkner, 2017</xref>) enhances accessibility, enabling researchers to apply Bayesian methods without deep knowledge of computation. Other tools like JAGS, WinBUGS, STAN, or <italic>rstan</italic> further support users without advanced statistics expertise. The findings emphasise matching estimation methods to model complexity. AIC is suitable for simpler models due to its stability, while Bayesian methods are better for complex models or small samples. In any case, we recommend researchers to perform sensitivity analyses with multiple fit indices to ensure consistent model selection, enhancing their findings' robustness.</p>
<sec><title>Limitations</title>
<p>The present simulation study evaluated the operating characteristics of Bayesian and frequentist methods in MLMs applied to SCEDs when the true population model is unknown. Findings are specific to simulated conditions and need empirical validation. Our findings should be interpreted conditional on the simulated data-generating mechanisms: an AB design with normally distributed outcomes and no explicit autocorrelation or phase-specific trends, nor cross-level interactions. In applied SCEDs, however, outcomes are often counts and time-related structure is common. These features can change the effective model complexity and the penalty-fit trade-off, potentially altering the relative behavior of AIC/BIC versus WAIC/LOO and frequentist versus Bayesian estimations. This is consistent with recent work extending multilevel SCED models to GLMM settings and more realistic covariance structures (e.g., <xref ref-type="bibr" rid="r29">Li et al., 2024</xref>, <xref ref-type="bibr" rid="r28">2025a</xref>, <xref ref-type="bibr" rid="r31">2025b</xref>), and motivates future simulations that jointly manipulate outcome distribution, trends, and autocorrelation to evaluate whether the performance patterns reported here generalize to these more ecological scenarios. These decisions limit the scope of our findings, since model complexity of MLM applications often arises precisely from elements such as trends (general or phase-specific) or autocorrelation (<xref ref-type="bibr" rid="r47">Natesan Batley &amp; Hedges, 2021</xref>). We deliberately opted for a simple design to keep the conditions more controlled and to examine the performance of model selection criteria in a focused way, particularly regarding random effects and the detection of random slopes. Including both baseline and intervention trends would have substantially increased the complexity of the simulation, making it more difficult to isolate the behaviour of the information. Future research should explore the behavior of information criteria on model selection in more complex designs (e.g., multiple baseline or reversal designs) and advanced covariance structures, such as autoregressive terms, to better reflect real-world data. Using dependent variables that follow normal distributions, uncommon in SCEDs where ceiling and floor effects and heteroscedasticity tend to feature, also limits generalizability. Future studies should investigate the behavior of both frequentist and Bayesian approaches within generalized linear multilevel models (GLMMs). By incorporating variables with asymmetric distributions, such as Poisson distributions, researchers can address this gap and explore small effect sizes that are frequently overlooked in qualitative analyses. Recent research has demonstrated that linear mixed models can be adapted to handle count outcomes in SCEDs, leading to more accurate estimates, particularly in cases of overdispersion or small sample sizes (<xref ref-type="bibr" rid="r14">Declercq et al., 2019</xref>). More recently, Li and colleagues have shown how GLMMs can be effectively applied to various count data scenarios, including zero-inflated and overdispersed distributions (<xref ref-type="bibr" rid="r27">Li, 2024</xref>; <xref ref-type="bibr" rid="r31">Li et al., 2025b</xref>; <xref ref-type="bibr" rid="r29">Li et al., 2024</xref>). They have also provided step-by-step tutorials designed for applied researchers (see <xref ref-type="bibr" rid="r28">Li et al., 2025a</xref>). These contributions emphasize the importance of moving beyond normally distributed outcomes to enhance the ecological validity and robustness of statistical inferences in SCED data.</p>
<p>Regarding the Bayesian fit indices, WAIC and LOO, both derived from the conditional likelihood, were chosen for their accessibility and compatibility with the evaluated models. However, future work could benefit from a broader exploration of Bayesian model comparison strategies. Greater emphasis should be placed on marginal rather than conditional likelihood-based approaches, which some authors argue are more suitable for model comparison in Bayesian frameworks (<xref ref-type="bibr" rid="r2">Ariyo et al., 2022</xref>; <xref ref-type="bibr" rid="r39">Merkle et al., 2019</xref>). Additionally, recent methodological advances demonstrate the growing applicability of Bayes factors within different SCEDs, such as ABAB, alternating treatments and changing criterion designs (<xref ref-type="bibr" rid="r72">Yamada &amp; Okada, 2024</xref>, <xref ref-type="bibr" rid="r73">2025</xref>). Incorporating Bayes factors in future analyses could offer complementary insights into model selection decisions. The best results of these information criteria were observed under optimal conditions: more individuals, repeated measures, and larger effect sizes, consistent with previous research (<xref ref-type="bibr" rid="r3">Baek et al., 2020</xref>; <xref ref-type="bibr" rid="r43">Moeyaert et al., 2017</xref>). While MLMs effectively capture complex realities, their mathematical and statistical demands underscore the importance of clear model specification. <xref ref-type="bibr" rid="r6">Bickel (2007)</xref> observes that these models perform best when grounded in robust theories and literature. Thus, advancing MLMs should also promote theoretical development, particularly within psychology. It is important to remember that statistical decisions should never be made without considering the wider context and the insights provided by clinical expertise. An analysed effect might not reach the traditional thresholds of statistical significance but can still be socially relevant, aligning with the concept of social validity (<xref ref-type="bibr" rid="r23">Kazdin, 1977</xref>; <xref ref-type="bibr" rid="r63">Snodgrass et al., 2023</xref>), and ways of control the antecession of the intervention before the change is fundamental to reassure the efficacy of an intervention (<xref ref-type="bibr" rid="r51">Perone, 1999</xref>). Future research should investigate the contingency relations between the decisions proposed by these models and those made by applied researchers, to understand how they are related.</p></sec>
<sec><title>Conclusions and Recommendations</title>
<p>MLM is a particularly versatile option for analysing SCED data, offering general and individual effect quantification, statistical testing, and the ability to model complex effects. This study highlights the strengths of Bayesian methods for highly complex models, where they maintain stable power and Type I error rates across varying effect sizes, sample sizes, and repeated measures. For simpler models, the frequentist REML approach performs equally well or better, particularly when AIC is used for model selection. When protection against Type I errors is paramount, as in assessing therapeutic change, Bayesian methods provide a reliable and robust framework, making them a valuable tool for applied researchers in SCEDs. Given that the data-generating model is typically unknown in applied SCEDs, the practical contribution of this study is to characterise the pros and cons of common model selection fit indices. Under our simulation conditions, REML estimations with AIC offered the most favourable balance between power and Type I error rates in simpler population models. But Bayesian estimations provided a more stable positive performance when model complexity increased. Thus, our recommendation is to treat model selection as risk management under uncertainty, triangulating across plausible random-effects structures, and using sensitivity analyses across AIC/BIC and WAIC/LOO to ensure that conclusions are not criterion-dependent.</p>
</sec></sec>
</body>
	<back>	
	<fn-group content-type="author-contribution">
		<fn fn-type="con">
			<p><italic>Cristina Rodríguez-Prada</italic>: Conceptualization, Data Curation, Methodology, Formal Analysis, Software, Visualization, Writing – Original draft, Writing – Review &amp; editing. <italic>José Ángel Martínez-Huertas</italic>: Conceptualization, Methodology, Writing – Review &amp; editing. <italic>Ricardo Olmos</italic>: Conceptualization, Data Curation, Methodology, Supervision, Software, Writing – Review &amp; editing.
			</p>
		</fn>
	</fn-group>
	
<fn-group><fn fn-type="financial-disclosure">
<p content-type="fn-title">CRP was supported by the “Ayudas al Fomento de la Investigación en Másteres Oficiales 2019-2020” and “Ayudas al Fomento de la Investigación en Másteres Oficiales 2020-2021” by the Universidad Autónoma de Madrid. CRP is also supported by “Contratos predoctorales para la Formación de Personal Investigador FPI-UAM 2022”, Universidad Autónoma de Madrid, Spain.</p></fn></fn-group><ack><title>Acknowledgements</title>
<p>Thanks to the Centro de Computación Científica (<ext-link ext-link-type="uri" xlink:href="https://www.ccc.uam.es/">https://www.ccc.uam.es/</ext-link>) of the Universidad Autónoma de Madrid for the resources they provided for the simulation process.</p></ack>
<ref-list><title>References</title>
<ref id="r1"><mixed-citation publication-type="book">Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In E. Parzen, K. Tanabe &amp; G. Kitagawa (Eds.), <italic>Selected papers of Hirotugu Akaike</italic> (pp. 199–213). Springer. <pub-id pub-id-type="doi">10.1007/978-1-4612-1694-0_15</pub-id></mixed-citation></ref>
	<ref id="r2"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ariyo</surname>, <given-names>O.</given-names></string-name>, <string-name name-style="western"><surname>Lesaffre</surname>, <given-names>E.</given-names></string-name>, <string-name name-style="western"><surname>Verbeke</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Quintero</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2022</year>). <article-title>Model selection for Bayesian linear mixed models with longitudinal data: Sensitivity to the choice of priors.</article-title> <source>Communications in Statistics — Simulation and Computation</source>, <volume>51</volume>(<issue>4</issue>), <fpage>1591</fpage>–<lpage>1615</lpage>. <pub-id pub-id-type="doi">10.1080/03610918.2019.1676439</pub-id></mixed-citation></ref>
<ref id="r3"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Baek</surname>, <given-names>E.</given-names></string-name>, <string-name name-style="western"><surname>Beretvas</surname>, <given-names>S. N.</given-names></string-name>, <string-name name-style="western"><surname>Van den Noortgate</surname>, <given-names>W.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Ferron</surname>, <given-names>J. M.</given-names></string-name></person-group> (<year>2020</year>). <article-title>Brief research report: Bayesian versus REML estimations with noninformative priors in multilevel single-case data.</article-title> <source>Journal of Experimental Education</source>, <volume>88</volume>(<issue>4</issue>), <fpage>698</fpage>–<lpage>710</lpage>. <pub-id pub-id-type="doi">10.1080/00220973.2018.1527280</pub-id></mixed-citation></ref>
<ref id="r4"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Baek</surname>, <given-names>E. K.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Ferron</surname>, <given-names>J. M.</given-names></string-name></person-group> (<year>2013</year>). <article-title>Multilevel models for multiple-baseline data: Modeling across-participant variation in autocorrelation and residual variance.</article-title> <source>Behavior Research Methods</source>, <volume>45</volume>(<issue>1</issue>), <fpage>65</fpage>–<lpage>74</lpage>. <pub-id pub-id-type="doi">10.3758/s13428-012-0231-z</pub-id><pub-id pub-id-type="pmid">22806706</pub-id></mixed-citation></ref>
<ref id="r5"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Baek</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Ferron</surname>, <given-names>J. M.</given-names></string-name></person-group> (<year>2020</year>). <article-title>Modeling heterogeneity of the Level-1 error covariance matrix in multilevel models for single-case data.</article-title> <source>Methodology: European Journal of Research Methods for the Behavioral and Social Sciences</source>, <volume>16</volume>(<issue>2</issue>), <fpage>166</fpage>–<lpage>185</lpage>. <pub-id pub-id-type="doi">10.5964/meth.2817</pub-id></mixed-citation></ref>
<ref id="r6"><mixed-citation publication-type="book">Bickel, R. (2007). <italic>Multilevel analysis for applied research: It’s just regression!</italic> Guilford Press.</mixed-citation></ref>
<ref id="r7"><mixed-citation publication-type="book">Bono, R., &amp; Arnau, J. (2014). <italic>Diseños de caso único en ciencias sociales y de la salud [Unique case designs in social sciences and health sciences]</italic>. Síntesis.</mixed-citation></ref>
<ref id="r8"><mixed-citation publication-type="book">Botella, J., &amp; Caperos, J. M. (2019). <italic>Metodología de investigación en psicología general sanitaria [Research methodology in general health psychology]</italic>. Síntesis.</mixed-citation></ref>
<ref id="r9"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Brooks</surname>, <given-names>S.</given-names></string-name></person-group> (<year>1998</year>). <article-title>Markov chain Monte Carlo method and its application.</article-title> <source>Journal of the Royal Statistical Society: Series D</source>, <volume>47</volume>(<issue>1</issue>), <fpage>69</fpage>–<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1111/1467-9884.00117</pub-id></mixed-citation></ref>
<ref id="r10"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Bates</surname>, <given-names>D.</given-names></string-name>, <string-name name-style="western"><surname>Mächler</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Bolker</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Walker</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2015</year>). <article-title>Fitting linear mixed-effects models using lme4.</article-title> <source>Journal of Statistical Software</source>, <volume>67</volume>(<issue>1</issue>), <fpage>1</fpage>–<lpage>48</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v067.i01</pub-id></mixed-citation></ref>
<ref id="r11"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Bürkner</surname>, <given-names>P.-C.</given-names></string-name></person-group> (<year>2017</year>). <article-title>brms: An R package for Bayesian multilevel models using Stan.</article-title> <source>Journal of Statistical Software</source>, <volume>80</volume>(<issue>1</issue>), <fpage>1</fpage>–<lpage>28</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v080.i01</pub-id></mixed-citation></ref>
<ref id="r12"><mixed-citation publication-type="book">Busk, P. L., &amp; Serlin, R. C. (1992). Meta-analysis for single-case research. In T. R. Kratochwill &amp; J. R. Levin (Eds.), <italic>Single-case research design and analysis: New directions for psychology and education</italic> (pp. 187–212). Lawrence Erlbaum Associates.</mixed-citation></ref>
<ref id="r13"><mixed-citation publication-type="book">Cohen, J. (1988). <italic>Statistical power analysis for the behavioral sciences</italic> (2<sup>nd</sup> ed.). Routledge.</mixed-citation></ref>
<ref id="r14"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Declercq</surname>, <given-names>L.</given-names></string-name>, <string-name name-style="western"><surname>Jamshidi</surname>, <given-names>L.</given-names></string-name>, <string-name name-style="western"><surname>Fernández-Castilla</surname>, <given-names>B.</given-names></string-name>, <string-name name-style="western"><surname>Beretvas</surname>, <given-names>S. N.</given-names></string-name>, <string-name name-style="western"><surname>Moeyaert</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Ferron</surname>, <given-names>J. M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Van den Noortgate</surname>, <given-names>W.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Analysis of single-case experimental count data using the linear mixed effects model: A simulation study.</article-title> <source>Behavior Research Methods</source>, <volume>51</volume>(<issue>6</issue>), <fpage>2477</fpage>–<lpage>2497</lpage>. <pub-id pub-id-type="doi">10.3758/s13428-018-1091-y</pub-id><pub-id pub-id-type="pmid">30105444</pub-id></mixed-citation></ref>
<ref id="r15"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ferguson</surname>, <given-names>C. J.</given-names></string-name></person-group> (<year>2009</year>). <article-title>An effect size primer: A guide for clinicians and researchers.</article-title> <source>Professional Psychology, Research and Practice</source>, <volume>40</volume>(<issue>5</issue>), <fpage>532</fpage>–<lpage>538</lpage>. <pub-id pub-id-type="doi">10.1037/a0015808</pub-id></mixed-citation></ref>
<ref id="r16"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ferron</surname>, <given-names>J. M.</given-names></string-name>, <string-name name-style="western"><surname>Bell</surname>, <given-names>B. A.</given-names></string-name>, <string-name name-style="western"><surname>Hess</surname>, <given-names>M. R.</given-names></string-name>, <string-name name-style="western"><surname>Rendina-Gobioff</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Hibbard</surname>, <given-names>S. T.</given-names></string-name></person-group> (<year>2009</year>). <article-title>Making treatment effect inferences from multiple-baseline data: The utility of multilevel modeling approaches.</article-title> <source>Behavior Research Methods</source>, <volume>41</volume>(<issue>2</issue>), <fpage>372</fpage>–<lpage>384</lpage>. <pub-id pub-id-type="doi">10.3758/BRM.41.2.372</pub-id><pub-id pub-id-type="pmid">19363177</pub-id></mixed-citation></ref>
<ref id="r17"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ferron</surname>, <given-names>J. M.</given-names></string-name>, <string-name name-style="western"><surname>Farmer</surname>, <given-names>J. L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Owens</surname>, <given-names>C. M.</given-names></string-name></person-group> (<year>2010</year>). <article-title>Estimating individual treatment effects from multiple-baseline data: A Monte Carlo study of multilevel-modeling approaches.</article-title> <source>Behavior Research Methods</source>, <volume>42</volume>(<issue>4</issue>), <fpage>930</fpage>–<lpage>943</lpage>. <pub-id pub-id-type="doi">10.3758/BRM.42.4.930</pub-id></mixed-citation></ref>
<ref id="r18"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Gelman</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2006</year>). <article-title>Prior distributions for variance parameters in hierarchical models (Comment on article by Browne and Draper).</article-title> <source>Bayesian Analysis</source>, <volume>1</volume>(<issue>3</issue>), <fpage>515</fpage>–<lpage>534</lpage>. <pub-id pub-id-type="doi">10.1214/06-BA117A</pub-id></mixed-citation></ref>
<ref id="r19"><mixed-citation publication-type="book">Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., &amp; Rubin, D. B. (2013). <italic>Bayesian data analysi</italic>s (3<sup>rd</sup> ed.). CRC Press.</mixed-citation></ref>
<ref id="r20"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Gentile</surname>, <given-names>J. R.</given-names></string-name>, <string-name name-style="western"><surname>Roden</surname>, <given-names>A. H.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Klein</surname>, <given-names>R. D.</given-names></string-name></person-group> (<year>1972</year>). <article-title>An analysis-of-variance model for the intrasubject replication design.</article-title> <source>Journal of Applied Behavior Analysis</source>, <volume>5</volume>(<issue>2</issue>), <fpage>193</fpage>–<lpage>198</lpage>. <pub-id pub-id-type="doi">10.1901/jaba.1972.5-193</pub-id><pub-id pub-id-type="pmid">16795340</pub-id></mixed-citation></ref>
<ref id="r21"><mixed-citation publication-type="book">Hoffman, L. (2014). <italic>Longitudinal analysis: Modeling within-person fluctuation and change</italic> (1<sup>st</sup> ed.). Routledge.</mixed-citation></ref>
<ref id="r22"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Hoffman</surname>, <given-names>M. D.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Gelman</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2014</year>). <article-title>The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo.</article-title> <source>Journal of Machine Learning Research</source>, <volume>15</volume>, <fpage>1593</fpage>–<lpage>1623</lpage>.</mixed-citation></ref>
<ref id="r23"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Kazdin</surname>, <given-names>A. E.</given-names></string-name></person-group> (<year>1977</year>). <article-title>Assessing the clinical or applied importance of behavior change through social validation.</article-title> <source>Behavior Modification</source>, <volume>1</volume>(<issue>4</issue>), <fpage>427</fpage>–<lpage>452</lpage>. <pub-id pub-id-type="doi">10.1177/014544557714001</pub-id></mixed-citation></ref>
<ref id="r24"><mixed-citation publication-type="book">Kazdin, A. E. (1982). <italic>Single-case research designs: Methods for clinical and applied settings</italic>. Oxford University Press.</mixed-citation></ref>
<ref id="r25"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Keselman</surname>, <given-names>H. J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Leventhal</surname>, <given-names>L.</given-names></string-name></person-group> (<year>1974</year>). <article-title>Concerning the statistical procedures enumerated by Gentile et al.: Another perspective.</article-title> <source>Journal of Applied Behavior Analysis</source>, <volume>7</volume>(<issue>4</issue>), <fpage>643</fpage>–<lpage>645</lpage>. <pub-id pub-id-type="doi">10.1901/jaba.1974.7-643</pub-id><pub-id pub-id-type="pmid">16795485</pub-id></mixed-citation></ref>
<ref id="r26"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Kratochwill</surname>, <given-names>T. R.</given-names></string-name>, <string-name name-style="western"><surname>Hitchcock</surname>, <given-names>J. H.</given-names></string-name>, <string-name name-style="western"><surname>Horner</surname>, <given-names>R. H.</given-names></string-name>, <string-name name-style="western"><surname>Levin</surname>, <given-names>J. R.</given-names></string-name>, <string-name name-style="western"><surname>Odom</surname>, <given-names>S. L.</given-names></string-name>, <string-name name-style="western"><surname>Rindskopf</surname>, <given-names>D. M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Shadish</surname>, <given-names>W. R.</given-names></string-name></person-group> (<year>2013</year>). <article-title>Single-case intervention research design standards.</article-title> <source>Remedial and Special Education</source>, <volume>34</volume>(<issue>1</issue>), <fpage>26</fpage>–<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1177/0741932512452794</pub-id></mixed-citation></ref>
<ref id="r27"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Li</surname>, <given-names>H.</given-names></string-name></person-group> (<year>2024</year>). <article-title>Model selection of GLMMs in the analysis of count data in single-case studies: A Monte Carlo simulation.</article-title> <source>Behavior Research Methods</source>, <volume>56</volume>(<issue>7</issue>), <fpage>7963</fpage>–<lpage>7984</lpage>. <pub-id pub-id-type="doi">10.3758/s13428-024-02464-7</pub-id><pub-id pub-id-type="pmid">38987450</pub-id></mixed-citation></ref>
<ref id="r28"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Li</surname>, <given-names>H.</given-names></string-name>, <string-name name-style="western"><surname>Baek</surname>, <given-names>E.</given-names></string-name>, <string-name name-style="western"><surname>Luo</surname>, <given-names>W.</given-names></string-name>, <string-name name-style="western"><surname>Du</surname>, <given-names>W.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Lam</surname>, <given-names>K. H.</given-names></string-name></person-group> (<year>2025</year><comment>a</comment>). <article-title>Using generalized linear mixed models in the analysis of count and rate data in single-case experimental designs: A step-by-step tutorial.</article-title> <source>Evaluation &amp; the Health Professions</source>, <volume>48</volume>(<issue>1</issue>), <fpage>143</fpage>–<lpage>155</lpage>. <pub-id pub-id-type="doi">10.1177/01632787241259500</pub-id><pub-id pub-id-type="pmid">39660841</pub-id></mixed-citation></ref>
<ref id="r29"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Li</surname>, <given-names>H.</given-names></string-name>, <string-name name-style="western"><surname>Luo</surname>, <given-names>W.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Baek</surname>, <given-names>E.</given-names></string-name></person-group> (<year>2024</year>). <article-title>Multilevel modeling in single-case studies with zero‐inflated and overdispersed count data.</article-title> <source>Behavior Research Methods</source>, <volume>56</volume>(<issue>4</issue>), <fpage>2765</fpage>–<lpage>2781</lpage>. <pub-id pub-id-type="doi">10.3758/s13428-024-02359-7</pub-id><pub-id pub-id-type="pmid">38383801</pub-id></mixed-citation></ref>
<ref id="r30"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Li</surname>, <given-names>H.</given-names></string-name>, <string-name name-style="western"><surname>Luo</surname>, <given-names>W.</given-names></string-name>, <string-name name-style="western"><surname>Baek</surname>, <given-names>E.</given-names></string-name>, <string-name name-style="western"><surname>Thompson</surname>, <given-names>C. G.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Lam</surname>, <given-names>K. H.</given-names></string-name></person-group> (<year>2022</year>). <article-title>Estimation and statistical inferences of variance components in the analysis of single-case experimental design using multilevel modeling.</article-title> <source>Behavior Research Methods</source>, <volume>54</volume>(<issue>4</issue>), <fpage>1559</fpage>–<lpage>1579</lpage>. <pub-id pub-id-type="doi">10.3758/s13428-021-01691-6</pub-id><pub-id pub-id-type="pmid">34508288</pub-id></mixed-citation></ref>
<ref id="r31"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Li</surname>, <given-names>H.</given-names></string-name>, <string-name name-style="western"><surname>Luo</surname>, <given-names>W.</given-names></string-name>, <string-name name-style="western"><surname>Baek</surname>, <given-names>E.</given-names></string-name>, <string-name name-style="western"><surname>Thompson</surname>, <given-names>C. G.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Lam</surname>, <given-names>K. H.</given-names></string-name></person-group> (<year>2025</year><comment>b</comment>). <article-title>Multilevel modeling in single-case studies with count and proportion data: A demonstration and evaluation.</article-title> <source>Psychological Methods</source>, <volume>30</volume>(<issue>4</issue>), <fpage>815</fpage>–<lpage>842</lpage>. <pub-id pub-id-type="doi">10.1037/met0000607</pub-id><pub-id pub-id-type="pmid">37603012</pub-id></mixed-citation></ref>
<ref id="r32"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Manolov</surname>, <given-names>R.</given-names></string-name>, <string-name name-style="western"><surname>Sierra</surname>, <given-names>V.</given-names></string-name>, <string-name name-style="western"><surname>Solanas</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Botella</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Assessing functional relations in single-case designs: Quantitative proposals in the context of the evidence-based movement.</article-title> <source>Behavior Modification</source>, <volume>38</volume>(<issue>6</issue>), <fpage>878</fpage>–<lpage>913</lpage>. <pub-id pub-id-type="doi">10.1177/0145445514545679</pub-id><pub-id pub-id-type="pmid">25092718</pub-id></mixed-citation></ref>
<ref id="r33"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Manolov</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Moeyaert</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Recommendations for choosing single-case data analytical techniques.</article-title> <source>Behavior Therapy</source>, <volume>48</volume>(<issue>1</issue>), <fpage>97</fpage>–<lpage>114</lpage>. <pub-id pub-id-type="doi">10.1016/j.beth.2016.04.008</pub-id><pub-id pub-id-type="pmid">28077224</pub-id></mixed-citation></ref>
<ref id="r34"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Matuschek</surname>, <given-names>H.</given-names></string-name>, <string-name name-style="western"><surname>Kliegl</surname>, <given-names>R.</given-names></string-name>, <string-name name-style="western"><surname>Vasishth</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Baayen</surname>, <given-names>H.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Bates</surname>, <given-names>D.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Balancing Type I error and power in linear mixed models.</article-title> <source>Journal of Memory and Language</source>, <volume>94</volume>, <fpage>305</fpage>–<lpage>315</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2017.01.001</pub-id></mixed-citation></ref>
<ref id="r36"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Martínez-Huertas</surname>, <given-names>J. A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Olmos</surname>, <given-names>R.</given-names></string-name></person-group> (<year>2022</year>). <article-title>Recovering crossed random effects in mixed-effects models using model averaging.</article-title> <source>Methodology: European Journal of Research Methods for the Behavioral and Social Sciences</source>, <volume>18</volume>(<issue>4</issue>), <fpage>298</fpage>–<lpage>323</lpage>. <pub-id pub-id-type="doi">10.5964/meth.9597</pub-id></mixed-citation></ref>
<ref id="r35"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Martínez-Huertas</surname>, <given-names>J. Á.</given-names></string-name>, <string-name name-style="western"><surname>Olmos</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Ferrer</surname>, <given-names>E.</given-names></string-name></person-group> (<year>2022</year>). <article-title>Model selection and model averaging for mixed-effects models with crossed random effects for subjects and items.</article-title> <source>Multivariate Behavioral Research</source>, <volume>57</volume>(<issue>4</issue>), <fpage>603</fpage>–<lpage>619</lpage>. <pub-id pub-id-type="doi">10.1080/00273171.2021.1889946</pub-id><pub-id pub-id-type="pmid">33635157</pub-id></mixed-citation></ref>

<ref id="r37"><mixed-citation publication-type="book">McElreath, R. (2020). <italic>Statistical rethinking: A Bayesian course with examples in R and Stan</italic> (2<sup>nd</sup> ed.). Chapman and Hall/CRC.</mixed-citation></ref>
<ref id="r38"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>McNeish</surname>, <given-names>D.</given-names></string-name></person-group> (<year>2016</year>). <article-title>On using Bayesian methods to address small sample problems.</article-title> <source>Structural Equation Modeling</source>, <volume>23</volume>(<issue>5</issue>), <fpage>750</fpage>–<lpage>773</lpage>. <pub-id pub-id-type="doi">10.1080/10705511.2016.1186549</pub-id></mixed-citation></ref>
<ref id="r39"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Merkle</surname>, <given-names>E. C.</given-names></string-name>, <string-name name-style="western"><surname>Furr</surname>, <given-names>D.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Rabe-Hesketh</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Bayesian comparison of latent variable models: Conditional versus marginal likelihoods.</article-title> <source>Psychometrika</source>, <volume>84</volume>(<issue>3</issue>), <fpage>802</fpage>–<lpage>829</lpage>. <pub-id pub-id-type="doi">10.1007/s11336-019-09679-0</pub-id><pub-id pub-id-type="pmid">31297664</pub-id></mixed-citation></ref>
<ref id="r40"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Moerbeek</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2004</year>). <article-title>The consequence of ignoring a level of nesting in multilevel analysis.</article-title> <source>Multivariate Behavioral Research</source>, <volume>39</volume>(<issue>1</issue>), <fpage>129</fpage>–<lpage>149</lpage>. <pub-id pub-id-type="doi">10.1207/s15327906mbr3901_5</pub-id><pub-id pub-id-type="pmid">26759936</pub-id></mixed-citation></ref>
<ref id="r41"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Moeyaert</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Ferron</surname>, <given-names>J. M.</given-names></string-name>, <string-name name-style="western"><surname>Beretvas</surname>, <given-names>S. N.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Van den Noortgate</surname>, <given-names>W.</given-names></string-name></person-group> (<year>2014</year>). <article-title>From a single-level analysis to a multilevel analysis of single-case experimental designs.</article-title> <source>Journal of School Psychology</source>, <volume>52</volume>(<issue>2</issue>), <fpage>191</fpage>–<lpage>211</lpage>. <pub-id pub-id-type="doi">10.1016/j.jsp.2013.11.003</pub-id><pub-id pub-id-type="pmid">24606975</pub-id></mixed-citation></ref>
<ref id="r42"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Moeyaert</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Manolov</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Rodabaugh</surname>, <given-names>E.</given-names></string-name></person-group> (<year>2020</year>). <article-title>Meta-analysis of single-case research via multilevel models: Fundamental concepts and methodological considerations.</article-title> <source>Behavior Modification</source>, <volume>44</volume>(<issue>2</issue>), <fpage>265</fpage>–<lpage>295</lpage>. <pub-id pub-id-type="doi">10.1177/0145445518806867</pub-id><pub-id pub-id-type="pmid">30360633</pub-id></mixed-citation></ref>
<ref id="r43"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Moeyaert</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Rindskopf</surname>, <given-names>D.</given-names></string-name>, <string-name name-style="western"><surname>Onghena</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Van den Noortgate</surname>, <given-names>W.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Multilevel modeling of single-case data: A comparison of maximum likelihood and Bayesian estimation.</article-title> <source>Psychological Methods</source>, <volume>22</volume>(<issue>4</issue>), <fpage>760</fpage>–<lpage>778</lpage>. <pub-id pub-id-type="doi">10.1037/met0000136</pub-id><pub-id pub-id-type="pmid">28358542</pub-id></mixed-citation></ref>
<ref id="r44"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Moeyaert</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Yang</surname>, <given-names>P.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Assessing generalizability and variability of single-case design effect sizes using two-stage multilevel modeling including moderators.</article-title> <source>Behaviormetrika</source>, <volume>48</volume>(<issue>2</issue>), <fpage>207</fpage>–<lpage>229</lpage>. <pub-id pub-id-type="doi">10.1007/s41237-021-00141-z</pub-id></mixed-citation></ref>
<ref id="r45"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Moeyaert</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Yang</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Xue</surname>, <given-names>Y.</given-names></string-name></person-group> (<year>2024</year>). <article-title>Individual participant data meta-analysis including moderators: Empirical validation.</article-title> <source>Journal of Experimental Education</source>, <volume>92</volume>(<issue>4</issue>), <fpage>723</fpage>–<lpage>740</lpage>. <pub-id pub-id-type="doi">10.1080/00220973.2023.2208062</pub-id></mixed-citation></ref>
<ref id="r46"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Natesan</surname>, <given-names>P.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Fitting Bayesian models for single-case experimental designs.</article-title> <source>Methodology: European Journal of Research Methods for the Behavioral and Social Sciences</source>, <volume>15</volume>(<issue>4</issue>), <fpage>147</fpage>–<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1027/1614-2241/a000180</pub-id></mixed-citation></ref>
<ref id="r47"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Natesan Batley</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Hedges</surname>, <given-names>L. V.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Accurate models vs. accurate estimates: A simulation study of Bayesian single-case experimental designs.</article-title> <source>Behavior Research Methods</source>, <volume>53</volume>(<issue>4</issue>), <fpage>1782</fpage>–<lpage>1798</lpage>. <pub-id pub-id-type="doi">10.3758/s13428-020-01522-0</pub-id><pub-id pub-id-type="pmid">33575987</pub-id></mixed-citation></ref>
<ref id="r48"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Nicenboim</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Vasishth</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Statistical methods for linguistic research: Foundational ideas — Part II.</article-title> <source>Language and Linguistics Compass</source>, <volume>10</volume>(<issue>11</issue>), <fpage>591</fpage>–<lpage>613</lpage>. <pub-id pub-id-type="doi">10.1111/lnc3.12207</pub-id></mixed-citation></ref>
<ref id="r49"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Parker</surname>, <given-names>R. I.</given-names></string-name>, <string-name name-style="western"><surname>Vannest</surname>, <given-names>K. J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Davis</surname>, <given-names>J. L.</given-names></string-name></person-group> (<year>2011</year>). <article-title>Effect size in single-case research: A review of nine nonoverlap techniques.</article-title> <source>Behavior Modification</source>, <volume>35</volume>(<issue>4</issue>), <fpage>303</fpage>–<lpage>322</lpage>. <pub-id pub-id-type="doi">10.1177/0145445511399147</pub-id><pub-id pub-id-type="pmid">21411481</pub-id></mixed-citation></ref>
<ref id="r50"><mixed-citation publication-type="book">Parsonson, B. S., &amp; Baer, D. M. (1986). The graphic analysis of data. In A. Poling &amp; R. W. Fuqua (Eds.), <italic>Research methods in applied behavior analysis: Issues and advances</italic> (pp. 157–186). Springer US. <pub-id pub-id-type="doi">10.1007/978-1-4684-8786-2_8</pub-id></mixed-citation></ref>
<ref id="r51"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Perone</surname>, <given-names>M.</given-names></string-name></person-group> (<year>1999</year>). <article-title>Statistical inference in behavior analysis: Experimental control is better.</article-title> <source>Behavior Analyst</source>, <volume>22</volume>(<issue>2</issue>), <fpage>109</fpage>–<lpage>116</lpage>. <pub-id pub-id-type="doi">10.1007/BF03391988</pub-id><pub-id pub-id-type="pmid">22478328</pub-id></mixed-citation></ref>
<ref id="r52"><mixed-citation publication-type="web">Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., &amp; R Core Team. (2021). <italic>nlme: Linear and nonlinear mixed effects models</italic> (Version 3.1-152) [Software]. <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=nlme">https://CRAN.R-project.org/package=nlme</ext-link></mixed-citation></ref>
<ref id="r53"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Raftery</surname>, <given-names>A. E.</given-names></string-name></person-group> (<year>1995</year>). <article-title>Bayesian model selection in social research.</article-title> <source>Sociological Methodology</source>, <volume>25</volume>, <fpage>111</fpage>–<lpage>163</lpage>. <pub-id pub-id-type="doi">10.2307/271063</pub-id></mixed-citation></ref>
<ref id="r54"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Richardson</surname>, <given-names>J. T.</given-names></string-name></person-group> (<year>2011</year>). <article-title>Eta squared and partial eta squared as measures of effect size in educational research.</article-title> <source>Educational Research Review</source>, <volume>6</volume>(<issue>2</issue>), <fpage>135</fpage>–<lpage>147</lpage>. <pub-id pub-id-type="doi">10.1016/j.edurev.2010.12.001</pub-id></mixed-citation></ref>
<ref id="r55"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Rindskopf</surname>, <given-names>D.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Nonlinear Bayesian analysis for single case designs.</article-title> <source>Journal of School Psychology</source>, <volume>52</volume>(<issue>2</issue>), <fpage>179</fpage>–<lpage>189</lpage>. <pub-id pub-id-type="doi">10.1016/j.jsp.2013.12.003</pub-id><pub-id pub-id-type="pmid">24606974</pub-id></mixed-citation></ref>
<ref id="r56"><mixed-citation publication-type="web">Rodabaugh, E., &amp; Moeyaert, M. (2017). Multilevel modeling of single-case data: An introduction and tutorial for the applied researcher. <italic>NERA Conference Proceedings 2017(8), University of Connecticut</italic>. <ext-link ext-link-type="uri" xlink:href="https://opencommons.uconn.edu/nera-2017/8">https://opencommons.uconn.edu/nera-2017/8</ext-link></mixed-citation></ref>
<ref id="r57"><mixed-citation publication-type="web">Rodríguez-Prada, C. (2025). <italic>Bayesian versus frequentist in SCEDs — R code — v1.0.0</italic> [Github Project page containing R code for study simulation]. GitHub. <ext-link ext-link-type="uri" xlink:href="https://github.com/Cristrinaranjus/phdthesis/releases/tag/methodology-journal">https://github.com/Cristrinaranjus/phdthesis/releases/tag/methodology-journal</ext-link></mixed-citation></ref>
<ref id="r58"><mixed-citation publication-type="confproc">Rodríguez-Prada, C., &amp; Olmos, R. (2019). Análisis multinivel y medidas del tamaño del efecto en diseños de caso único: Un estudio piloto comparando datos de simulación y datos empíricos [Multilevel analysis and effect size measures in single-case designs: A pilot study comparing simulated and empirical data]. <italic>VIII Congreso SAVECC - Sociedad para el Avance del Estudio Científico del Comportamiento[8th SAVECC Congress: Society for the Advancement of the Scientific Study of Behavior)]</italic>. Universidad Autónoma de Madrid. <pub-id pub-id-type="doi">10.13140/RG.2.2.33619.45609</pub-id></mixed-citation></ref>
	<ref id="r59"><mixed-citation publication-type="data">Rodríguez-Prada, C., Olmos, R., &amp; Martínez-Huertas, J. Á. (2026a). <italic>Bayesian versus frequentist approaches in multilevel single-case designs: On Type I error rate and power</italic> [OSF project page containing study dataset, code scripts for data, supplementary materials]. Open Science Foundation. <pub-id pub-id-type="doi">10.17605/OSF.IO/K7B82</pub-id></mixed-citation></ref>
	<ref id="r60"><mixed-citation publication-type="data">Rodríguez-Prada, C., Martínez-Huertas, J. Á., &amp; Olmos, R. (2026b). <italic>Supplementary materials to</italic> “Bayesian versus frequentist approaches in multilevel single-case designs: On Type I error rate and power” [Supplementary tables]. PsychOpen GOLD. <pub-id pub-id-type="doi">10.23668/psycharchives.21777</pub-id></mixed-citation></ref>
<ref id="r61"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Schwarz</surname>, <given-names>G.</given-names></string-name></person-group> (<year>1978</year>). <article-title>Estimating the dimension of a model.</article-title> <source>Annals of Statistics</source>, <volume>6</volume>(<issue>2</issue>), <fpage>461</fpage>–<lpage>464</lpage>. <pub-id pub-id-type="doi">10.1214/aos/1176344136</pub-id></mixed-citation></ref>
<ref id="r62"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Shadish</surname>, <given-names>W. R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Sullivan</surname>, <given-names>K. J.</given-names></string-name></person-group> (<year>2011</year>). <article-title>Characteristics of single-case designs used to assess intervention effects in 2008.</article-title> <source>Behavior Research Methods</source>, <volume>43</volume>(<issue>4</issue>), <fpage>971</fpage>–<lpage>980</lpage>. <pub-id pub-id-type="doi">10.3758/s13428-011-0111-y</pub-id><pub-id pub-id-type="pmid">21656107</pub-id></mixed-citation></ref>
<ref id="r63"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Snodgrass</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Cook</surname>, <given-names>B. G.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Cook</surname>, <given-names>L.</given-names></string-name></person-group> (<year>2023</year>). <article-title>Considering social validity in special education research.</article-title> <source>Learning Disabilities Research &amp; Practice</source>, <volume>38</volume>(<issue>4</issue>), <fpage>311</fpage>–<lpage>319</lpage>. <pub-id pub-id-type="doi">10.1111/ldrp.12326</pub-id></mixed-citation></ref>
<ref id="r64"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Spiegelhalter</surname>, <given-names>D. J.</given-names></string-name>, <string-name name-style="western"><surname>Best</surname>, <given-names>N. G.</given-names></string-name>, <string-name name-style="western"><surname>Carlin</surname>, <given-names>B. P.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Van Der Linde</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2002</year>). <article-title>Bayesian measures of model complexity and fit.</article-title> <source>Journal of the Royal Statistical Society, Series B: Statistical Methodology</source>, <volume>64</volume>(<issue>4</issue>), <fpage>583</fpage>–<lpage>639</lpage>. <pub-id pub-id-type="doi">10.1111/1467-9868.00353</pub-id></mixed-citation></ref>
<ref id="r65"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Van de Schoot</surname>, <given-names>R.</given-names></string-name>, <string-name name-style="western"><surname>Broere</surname>, <given-names>J. J.</given-names></string-name>, <string-name name-style="western"><surname>Perryck</surname>, <given-names>K. H.</given-names></string-name>, <string-name name-style="western"><surname>Zondervan-Zwijnenburg</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>van Loey</surname>, <given-names>N. E.</given-names></string-name></person-group> (<year>2015</year>). <article-title>Analyzing small data sets using Bayesian estimation: The case of posttraumatic stress symptoms following mechanical ventilation in burn survivors.</article-title> <source>European Journal of Psychotraumatology</source>, <volume>6</volume>, <elocation-id>25216</elocation-id>. <pub-id pub-id-type="doi">10.3402/ejpt.v6.25216</pub-id><pub-id pub-id-type="pmid">25765534</pub-id></mixed-citation></ref>
<ref id="r66"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Van den Noortgate</surname>, <given-names>W.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Onghena</surname>, <given-names>P.</given-names></string-name></person-group> (<year>2003</year><comment>a</comment>). <article-title>Combining single-case experimental data using hierarchical linear models.</article-title> <source>School Psychology Quarterly</source>, <volume>18</volume>(<issue>3</issue>), <fpage>325</fpage>–<lpage>346</lpage>. <pub-id pub-id-type="doi">10.1521/scpq.18.3.325.22577</pub-id></mixed-citation></ref>
<ref id="r67"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Van den Noortgate</surname>, <given-names>W.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Onghena</surname>, <given-names>P.</given-names></string-name></person-group> (<year>2003</year><comment>b</comment>). <article-title>Hierarchical linear models for the quantitative integration of effect sizes in single-case research.</article-title> <source>Behavior Research Methods, Instruments, &amp; Computers</source>, <volume>35</volume>(<issue>1</issue>), <fpage>1</fpage>–<lpage>10</lpage>. <pub-id pub-id-type="doi">10.3758/BF03195492</pub-id><pub-id pub-id-type="pmid">12723775</pub-id></mixed-citation></ref>
<ref id="r68"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Vehtari</surname>, <given-names>A.</given-names></string-name>, <string-name name-style="western"><surname>Gelman</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Gabry</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.</article-title> <source>Statistics and Computing</source>, <volume>27</volume>(<issue>5</issue>), <fpage>1413</fpage>–<lpage>1432</lpage>. <pub-id pub-id-type="doi">10.1007/s11222-016-9696-4</pub-id></mixed-citation></ref>
<ref id="r69"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Virués-Ortega</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Haynes</surname>, <given-names>S. N.</given-names></string-name></person-group> (<year>2005</year>). <article-title>Functional analysis in behavior therapy: Behavioral foundations and clinical application.</article-title> <source>International Journal of Clinical and Health Psychology</source>, <volume>5</volume>(<issue>3</issue>), <fpage>567</fpage>–<lpage>587</lpage>.</mixed-citation></ref>
<ref id="r70"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Watanabe</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2010</year>). <article-title>Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.</article-title> <source>Journal of Machine Learning Research</source>, <volume>11</volume>(<issue>116</issue>), <fpage>3571</fpage>–<lpage>3594</lpage>.</mixed-citation></ref>
<ref id="r71"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Weakliem</surname>, <given-names>D. L.</given-names></string-name></person-group> (<year>1999</year>). <article-title>A critique of the Bayesian information criterion for model selection.</article-title> <source>Sociological Methods &amp; Research</source>, <volume>27</volume>(<issue>3</issue>), <fpage>359</fpage>–<lpage>397</lpage>. <pub-id pub-id-type="doi">10.1177/0049124199027003002</pub-id></mixed-citation></ref>
<ref id="r72"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Yamada</surname>, <given-names>T.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Okada</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2024</year>). <article-title>Bayes factor for single-case ABAB design data.</article-title> <source>Behaviormetrika</source>, <volume>51</volume>(<issue>1</issue>), <fpage>277</fpage>–<lpage>286</lpage>. <pub-id pub-id-type="doi">10.1007/s41237-023-00206-1</pub-id></mixed-citation></ref>
<ref id="r73"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Yamada</surname>, <given-names>T.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Okada</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2025</year>). <article-title>Bayes factor for major single-case experimental designs: Case for alternating treatment design and changing criterion design.</article-title> <source>Behaviormetrika</source>, <volume>52</volume>, <fpage>707</fpage>–<lpage>720</lpage>. <pub-id pub-id-type="doi">10.1007/s41237-025-00259-4</pub-id></mixed-citation></ref>
</ref-list>
	<sec sec-type="data-availability" id="das"><title>Data Availability</title>
		<p>The study dataset, code scripts for data, and supplementary materials that support the findings of this study are available in the OSF repository at <xref ref-type="supplementary-material" rid="r59">Rodríguez-Prada et al. (2026a)</xref>. Supplementary tables for this study are available at <xref ref-type="supplementary-material" rid="r60">Rodríguez-Prada et al. (2026b)</xref>.</p>
	</sec>	
	
	<sec sec-type="supplementary-material" id="sp1"><title>Supplementary Materials</title>
		<table-wrap position="anchor">
			<table frame='void' style="background-#f3f3f3">
				<col width="60%" align="left"/>
				<col width="40%" align="left"/>
				<thead>
					<tr>
						<th>Type of supplementary materials</th>
						<th>Availability/Access</th>
					</tr>
				</thead>
				<tbody>
					<tr>
						<th colspan="2">Data</th>						
					</tr>
					<tr>
						<td>Study dataset.</td>
						<td><xref ref-type="supplementary-material" rid="r59">Rodríguez-Prada et al. (2026a)</xref></td>
					</tr>					
					<tr style="grey-border-top-dashed">
						<th colspan="2">Code</th>
					</tr>
					<tr>
						<td>R code scripts for data.</td>
						<td><xref ref-type="supplementary-material" rid="r59">Rodríguez-Prada et al. (2026a)</xref></td>
					</tr>
					<tr style="grey-border-top-dashed">
						<th colspan="2">Material</th>
					</tr>
					<tr>
						<td>Supplementary materials.</td>
						<td><xref ref-type="supplementary-material" rid="r59">Rodríguez-Prada et al. (2026a)</xref></td>
					</tr>
					<tr>
						<td>Supplementary tables.</td>
						<td><xref ref-type="supplementary-material" rid="r60">Rodríguez-Prada et al. (2026b)</xref></td>
					</tr>
					<tr style="grey-border-top-dashed">
						<th colspan="2">Study/Analysis preregistration</th>
					</tr>	
					<tr>
						<td>The study was not preregistered.</td>
						<td>&mdash;</td>
					</tr>
					<tr style="grey-border-top-dashed">
						<th colspan="2">Other</th>
					</tr>	
					<tr>
						<td>No other materials available.</td>
						<td>&mdash;</td>
					</tr>
				</tbody>
			</table>
		</table-wrap>		
	</sec>
			

<fn-group>
<fn fn-type="conflict"><p>The authors have declared that no competing interests exist.</p></fn>
</fn-group>
</back>
</article>
