Speaker
Description
In the context of a two-group comparison, when the assumption of equal variances between groups is doubtful or the data may be skewed or ordinal, the classical t-test and an effect measure parameterized in terms of means may no longer be suitable. In such cases, it appears more appropriate to formulate the problem as the nonparametric Behrens-Fisher problem of testing H0: θ = 1/2, where θ = P(X<Y) + 1/2P(X=Y) represents the Mann-Whitney effect. This parameter offers a meaningful assessment of treatment effects regardless of the true underlying distribution.
While many methods exist to test the aforementioned hypothesis, several impose distributional restrictions on the underlying data, such as assuming continuity or excluding discrete data (such as ordered categorical data), thereby limiting their flexibility in practical applications. To date, the well-known Brunner–Munzel test has been regarded the standard method for testing this hypothesis, owing to its minimal assumptions and reasonable performance with larger sample sizes. The Brunner–Munzel test, however, struggles to control the Type-1 error rate at significance levels α<0.05, which may be considered a limitation given recent developments advocating for more stringent significance thresholds and the frequent need to adjust for multiplicity, leading to reduced α-levels. Moreover, the confidence intervals compatible to the Brunner–Munzel test are not range-preserving and tend to exhibit liberal behavior when effect sizes are large.
In this talk, we present an alternative method to address the nonparametric Behrens-Fisher problem. The test is derived by considering the ratio of the true variance of the Mann-Whitney effect estimator to its theoretical maximum, as derived from the Birnbaum-Klose inequality. Through simulations, we demonstrate that the proposed test effectively controls the Type-1 error rate under various conditions, including small and unbalanced sample sizes, and different data-generating mechanisms. Notably, it provides better control of the Type-1 error rate than the widely used Brunner-Munzel test, particularly at small significance levels such as α = 0.005. We further construct compatible range-preserving confidence intervals and show that they exhibit improved coverage compared to the confidence intervals compatible to the Brunner–Munzel test. Finally, we illustrate the application of the method in a clinical trial example.
32144104244