robnptests – An R package for robust two-sample location and dispersion tests

The R (R Core Team, 2022) package robnptests is a compilation of two-sample tests selected by two criteria: The tests are (i) robust against outliers and (ii) (approximately) distribution free. Criterion (ii) means that the implemented tests keep an intended significance level and provide a reasonably high power under a variety of continuous distributions. Robustness is achieved by using test statistics that are based on robust location and scale measures.


Data situation
We consider two samples of independent and identically distributed (i.i.d.) random variables  1 , ...,   and  1 , ...,   , respectively.The underlying distributions are assumed to be continuous with cumulative distribution functions   and   .
The tests can be used for either of the following scenarios: • Two-sample location problem: Assuming that both distributions are equal except that   may be a shifted version of   , i.e.   () =   ( + Δ) for all  ∈ ℝ and some Δ ∈ ℝ, or equivalently   =  − Δ, the tests can be used to detect such a shift.• Two-sample scale problem: In case of a difference only in scale, i.e.   () =   (/) for some  > 0, or equivalently   =  ⋅ , a transformation of the observations enables to identify differing scale parameters.For more information see vignette("robnptests").

Statement of need
A popular test for the location setting is the two-sample -test.It is considered to be robust against deviations from the normality assumption because it keeps the specified significance level asymptotically due to the central limit theorem in case of finite variances.However, non-normality can result in a loss of power (Wilcox, 2003).In addition, the -test is vulnerable to outliers (Fried & Dehling, 2011).Distribution-free tests, like the two-sample Wilcoxon rank-sum test, can be nearly as powerful as the -test under normality and may have higher power under non-normality.Still, they also can be vulnerable to outliers, particularly for small samples (Fried & Gather, 2007).The two-sample tests in robnptests are (approximately) distribution free and, at the same time, robust against outliers.Thus, they are well suited for outlier-corrupted samples from unknown data-generating distributions.At the same time, such tests can be nearly as powerful as popular procedures like the aforementioned -test or the Wilcoxon test on uncontaminated samples for a somewhat longer computation time.
Figure 1 compares the power of the -test, the Wilcoxon test and two robust tests.The HL1-test is based on the one-sample Hodges-Lehmann estimator (Hodges & Lehmann, 1963) and the Huber M-test uses Huber's M-estimator (Huber, 1964).We consider a fixed location difference between the samples and a single outlier of increasing size.The power of the -test decreases to zero, while the loss in power of the Wilcoxon test and both robust tests is small.The robust tests provide a somewhat higher power than the Wilcoxon test and this advantage can become larger when more outliers are involved (Fried & Dehling, 2011).Common parametric and nonparametric tests for scale differences have similar problems as described above for the location tests.In addition, some nonparametric tests for the scale problem do not cope well with asymmetry.The package robnptests uses the idea of applying the robust location tests to transformed observations as proposed by Fried (2012).Such tests are also robust and can obtain good results in terms of power and size under both asymmetry and outlier corruption.However, these tests may be less powerful under symmetry than classical procedures like the Mood test.

Other packages with robust two-sample tests
The package WRS2 (Mair & Wilcox, 2020) contains a collection of robust two-sample location tests for the heteroscedastic setting.In robnptests, we assume homoscedasticity for the location tests.This is because estimating the within-sample dispersion for both samples separately may be unreliable when the sample sizes are small.Equal sample sizes  =  can protect against a deteriorating performance in terms of size and power if we are actually in the heteroscedastic setting (Staudte & Sheather, 1990, p. 179).
The package nptest (Helwig, 2021) contains nonparametric versions of the two-sample -test, realized by using the permutation and randomization principles, as described in the next section, on the -statistic.This approach has also been studied in Abbas & Fried (2017), and, while being distribution free, the test statistic lacks robustness against outliers.

Implemented two-sample tests
The tests for a location difference are simple ratios inspired by the test statistic of the twosample -test.The numerator is a robust estimator for the location difference between the two populations and the denominator is a robust measure for the dispersion within the samples.
The -value can be computed by using the permutation principle, the randomization principle, or a normal approximation.With the permutation principle, the tests hold the desired significance level exactly at the cost of large computing times even for quite small samples such as  =  = 10.The time can be reduced by using a randomization distribution and, even more, by taking advantage of the asymptotic normality of the location-difference estimators.The latter approach, however, is only advisable for large sample sizes ,  > 30.
The tests based on the following estimators for the location difference are described in Fried & Dehling (2011): • The difference of the sample medians leads to highly robust tests.However, they are not very powerful under normality due to the low efficiency of the median.• To obtain more powerful tests under normality, one can use the difference between the one-sample Hodges-Lehmann estimators (Hodges & Lehmann, 1963).This may result in less robust tests due to the lower breakdown point.• The two-sample Hodges-Lehmann estimator (Hodges & Lehmann, 1963) leads to robust tests with a higher power under normality than the tests based on the sample median and can achieve similar robustness.
For scaling, we use different estimators based on medians and pairwise differences, see Fried & Dehling (2011) for a detailed description.
In addition, we implemented tests based on M-estimators.This approach to robust location estimation allows for flexibility in how outliers are treated through the specification of the tuning constants of the corresponding -function.We focus on Huber's -function, the bisquare function and Hampel's -function.The measure for the dispersion within the samples is a pooled statistic derived from the asymptotic normality of the M-estimators (Maronna et al., 2019, p. 37ff).Moreover, the package contains Yuen's -test which uses the difference of trimmed means to estimate the location difference and a scale estimator based on the pooled winsorized variances (Yuen & Dixon, 1973).
In case of data with many ties (e.g.caused by discrete sampling), the ties may carry over to the permutation distribution.This can happen in real-world applications when the measurements are rounded or stem from discrete distributions and may lead to a loss in power or conservative tests.Additionally, the robust scale estimators may become zero, so that the test statistic cannot be calculated.Both issues can be addressed by adding random noise from a uniform distribution with a small variance to each observation ("wobbling", see Fried & Gather (2007)).
The following code snippet shows how the tests can be applied to a data set.Here, we use a test based on the one-sample Hodges-Lehmann estimator.By setting alternative = "two.sided"and delta = 0, we test the null hypothesis  0 ∶ Δ = 0, i.e. there is no location difference between the populations.In the example above, we use method = "permutation" so that the -value is computed with the permutation principle.
In general, the functions start with the name of the underlying location-difference estimator and have several arguments to customize the test.
More examples on how to use the tests and a detailed overview of the implemented tests and corresponding test statistics can be found in the vignette("robnptests").

Figure 1 :
Figure 1: Power of the two-sample -test, the Wilcoxon rank-sum test, and two robust tests -one based on the one-sample Hodges-Lehmann estimator and one based on Huber's M-estimator -on two samples of size  =  = 10 from two normal distributions with unit variance, a location difference of Δ = 2, and an additive single outlier of increasing size.