MULTIVARIATE. STATISTICS. SECOND EDITION. Barbara G. Tabachnick. Linda S. Fidell. California State University, Northridge. HarperCollins Publishers. Request PDF on ResearchGate | On Jan 1, , Barbara. have been designated as possible outliers (Hair et al., ; Tabachnick & Fidell, ). Using multivariate statistics, 5th ed. Citation. Tabachnick, B. G., & Fidell, L. S. ( ). Using multivariate statistics (5th ed.). Boston, MA: Allyn & Bacon/Pearson .
|Language:||English, Spanish, German|
|Country:||United Arab Emirates|
|Distribution:||Free* [*Register to download]|
(Tabachnick & Fidell, ). For this study, maximum and minimum extreme values for all the study variables were produced using SPSS. A visual inspection of. , vol. 3 (2), p. Understanding Power and Rules of Thumb .. Comrey and Lee () (see Tabachnick & Fidell, ) give the following guide. schools. As another example, consider the analysis reported by Tabachnick and Fidell (, pp. ), using data described in the article by Fidell et al.
Micceri evaluated deviations from normality based on arbitrary cut-offs of various measures of nonnormality, including asymmetry, tail weight, outliers, and modality. More recently, Blanca et al. The study includes many psychological variables, and the authors found that However, neither Micceri nor Blanca et al. Scheffe , p. If skewness is different from 0, the distribution deviates from symmetry.
If kurtosis is different from 0, the distribution deviates from normality in tail mass and shoulder DeCarlo b.
In order to study nonnormality, we have contacted and obtained responses from researchers, among whom only three reported skewness and kurtosis in their papers. The under-report of normality measures can be due to several reasons.
First, many researchers are still not aware of the prevalence and influence of nonnormality.
Second, not every researcher is familiar with skewness and kurtosis or their interpretation. Third, extra work is needed to compute skewness and kurtosis than the commonly used summary statistics such as means and standard deviations. Fourth, researchers might worry about the consequences of reporting large skewness and kurtosis. This paper provides a simple and practical response to the continuing under-report of nonnormality measures in published literature by elucidating the problem of nonnormality and offering feasible recommendations.
We begin with an easy-to-follow introduction to univariate and multivariate skewness and kurtosis, their calculations, and interpretations. We then report on a review we conducted assessing the prevalence and severity of univariate and multivariate skewness and kurtosis in recent psychology and education publications. We also show the influence of skewness and kurtosis on commonly used statistical tests in our field using data of typical skewness, kurtosis, and sample size found in our review.
In addition, we offer a tutorial on how to compute the skewness and kurtosis measures we report here through commonly used software including SAS, SPSS, R, and a Web application. Finally, we offer practical recommendations for our readers to follow in their own research, including a guideline on how to report sample statistics in empirical research and some possible solutions for nonnormality.
Univariate and multivariate skewness and kurtosis Different formulations for skewness and kurtosis exist in the literature. Joanes and Gill summarize three common formulations for univariate skewness and kurtosis that they refer to as g 1 and g 2, G 1 and G 2, and b 1 and b 2.
Minitab reports b 1 and b 2, and the R package e Meyer et al.
The sample skewness G 1 can take any value between negative infinity and positive infinity. For a symmetric distribution such as a normal distribution, the expectation of skewness is 0.
Distributions with positive skewness have a longer right tail in the positive direction, and those with negative skewness have a longer left tail in the negative direction. The one in the middle is a normal distribution and its skewness is 0. The demonstration will utilize functions from the R software package for both outlier detection and data analysis after removal of the outliers. It should be noted that the focus of this manuscript is not on attempting to identify some optimal approach for dealing with outliers once they have been identified, which is an area of statistics itself replete with research, and which is well beyond the scope of this study.
Suffice it to say that identification of outliers is only the first step in the process, and much thought must be given to how outliers will be handled.
In the current study, they will be removed from the dataset in order to clearly demonstrate the differential impact of the various outlier detection methods on the data and subsequent analyses. However, it is not recommended that this approach to outliers be taken in every situation. Impact of outliers in multivariate analysis Outliers can have a dramatic impact on the results of common multivariate statistical analyses.
For example, they can distort correlation coefficients Marascuilo and Serlin, ; Osborne and Overbay, , and create problems in regression analysis, even leading to the presence of collinearity among the set of predictor variables in multiple regression Pedhazur, Distortions to the correlation may in turn lead to biased sample estimates, as outliers artificially impact the degree of linearity present between a pair of variables Osborne and Overbay, In addition, methods based on the correlation coefficient such as factor analysis and structural equation modeling are also negatively impacted by the presence of outliers in data Brown, Cluster analysis is particularly sensitive to outliers with a distortion of cluster results when outliers are the center or starting point of the analysis Kaufman and Rousseeuw, Outliers can also themselves form a cluster, which is not truly representative of the broader array of values in the population.
Outliers have also been shown to detrimentally impact testing for mean differences using ANOVA through biasing group means where they are present Osborne and Overbay, While outliers can be problematic from a statistical perspective, it is not always advisable to remove them from the data. When these observations are members of the target population, their presence in the dataset can be quite informative regarding the nature of the population e.
To remove outliers from the sample in this case would lead to loss of information about the population at large. In such situations, outlier detection would be helpful in terms of identifying members of the target population who are unusual when compared to the rest, but these individuals should not be removed from the sample Zijlstra et al.
Methods of multivariate outlier detection Given the negative impact that outliers can have on multivariate statistical methods, their accurate detection is an important matter to consider prior to data analysis Tabachnick and Fidell, ; Stevens, In popular multivariate statistics texts, the reader is recommended to use D2 for multivariate outlier detection, although as is described below, there are several alternatives for multivariate outlier detection that may prove to be more effective than this standard approach.
Prior to discussing these methods however, it is important to briefly discuss general qualities that make for an effective outlier detection method. Readers interested in a more detailed treatment are referred to two excellent texts by Wilcox , When thinking about the impact of outliers, perhaps the key consideration is the breakdown point of the statistical analysis in question.
The breakdown point can be thought of as the minimum proportion of a sample that can consist of outliers after which point they will have a notable impact on the statistic of interest. In other words, if a statistic has a breakdown point of 0.
Comparatively, a statistic with a breakdown point of 0. Of course, it should be remembered that the degree of this impact is dependent on the magnitude of the outlying observation, such that more extreme outliers would have a greater impact on the statistic than would a less extreme value. A high breakdown point is generally considered to be a positive attribute. While the breakdown point is typically thought of as a characteristic of a statistic, it can also be a characteristic of a statistic in conjunction with a particular method of outlier detection.
Thus, if a researcher calculates the sample mean after removing outliers using a method such as D2, the breakdown point of the combination of mean and outlier detection method will be different than that of the mean by itself. Finally, although having a high breakdown point is generally desirable, it is also true that statistics with higher breakdown points e. Another important property for a statistical measure of location e.
Location equivariance means that if a constant is added to each observation in the data set, the measure of location will be increased by that constant value. Scale equivariance occurs when multiplication of each observation in the data set by a constant leads to a change in the measure of location by the same constant. In other words, the scale of measurement should not influence relative comparisons of individuals within the sample or relative comparisons of group measures of location such as the mean.