A well-rounded understanding of statistics includes the evaluation of outliers and of methods for sampling distributions. Measures of central tendency are important tools in the study of statistics for the evaluation of data. Mean, median, mode, and range all describe various aspects of information about raw data that need to be reported in a way that effectively relates meaning; too easily can the selection of a specific measure be used to cherry pick the message a researcher wants to convey (Warner, 2021). Therefore, it is important for beginning researchers to understand the way these measures can be used to deceive so that through awareness, better communication of appropriate information is achieved.
Measures of Central Tendency and Outliers
Outliers are data points that fall considerably outside the central tendency, or typical responses (Warner, 2021). Using frequency tables and graphs is a first step which allows for easier recognition of outliers. “Outliers must not be ignored” (Gibbert et al., 2021, p. 179); however, there are multiple ways to define outliers, many techniques used to identify outliers, and conflicting suggestions on how to handle their presence (Aguinis et al., 2013). Caution must be taken when deciding what to do with outliers since the way they are delt with can be strongly influenced by the desire of the researcher, since decisions are made after the data has been gathered and is being analyzed (Aguinis et al., 2013). A detailed description of best practices, including a decision-making tree, for handling outliers is provided in Aguinis et al. (2013). After data is assessed and the type of outlier is determined, the way to address its presence varies from deleting when based on nonlegitimate observations to further studying the phenomenon and developing new theories (Aguinis et al., 2013; Gibbert et al., 2021). It is interesting to consider how the concept of the outlier has even entered the mainstream when taking into account how some influential people have been outside of the mean (Gladwell, 2008). Of even greater interest to believers is the recognition of the Son of Man, Yahusha Himself (see Appendix), being the most influential outlier the world has ever encountered – leading those who believe and love Him to the point of obedience to eternal life as well as abundant peace and love in the present life (John 3:16, 36; 14:27; 1 Corinthians 13, The Scriptures, 2008).
An example in Warner (2021) demonstrates the effect of one outlier in the values for mean, median, and mode. Prior to the outlier being introduced by changing a heart rate from 82 to 160, the sample had a mean of 73.10, median of 73.5, and mode of 75. After the addition of the outlier, the mean changed to 80.90, the median remained at 73.5, and the mode remained at 75. Therefore, the mean has been demonstrated to be less robust than the median and mode (Warner, 2021).
Deviations From The Mean
Although easily influenced by outliers and can sometimes inaccurately indicate typical responses, mean is the most often included measure of central tendency in research reports (Warner, 2021). Additionally, it is related to many other important statistical analyses, such as variability – the distance of an individual score from the mean (Warner, 2021). The deviations from the mean relay information useful for computing sample variance and standard deviation – in essence an understanding of how values in a sample differ relative to the mean (Warner, 2021). However, since deviations from the mean sum to zero, due to its positive and negative values, and are therefore unable to be used for this analysis, statisticians instead sum the square of the deviations – since squared numbers cannot be negative (Warner, 2021). This is called the sum of squares (SS), a value capable of providing the needed information. SS has a minimum value of zero when all values in a data set are equal, meaning all values are equal to the mean and therefore there is no deviation from the mean (Warner, 2021).
Nonparametric Methods
Due to reliance on a normal distribution, parametric methods are more susceptible to outliers, while nonparametric methods are “relatively insensitive to outlying observations” (Hollander et al., 2014, p.1). As demonstrated previously, median is more robust than mean when outliers are present, and the data is skewed. However, there are still situations – such as comparing medians across groups – where this measure can yield spurious results when relying on parametric methods (McGreevy et al., 2009). McGreevy et al. (2009), suggest and demonstrate the consideration of using nonparametric methods to adjust medians rather than means in skewed data for the purposes of unbiased results.
Sampling Distributions
Sampling distributions are performed when values of the mean (M) from a large number of samples are taken from a population of interest since it is sometimes difficult to measure every case of the population (Warner, 2021). These are useful for setting up confidence intervals (CI) and statistical significance tests (Warner, 2021). When the population values of the mean M (m) and population standard deviation (s) are known, assuming normal distribution, then distance of M from m can be determined and therefore provide information regarding outcomes by using a z score (Warner, 2021). Lower values of s indicate shorter distances of values of M from m, leading to a reduced incidence of prediction error; additionally, increasing the sample size (N) can also lead to a reduced value in the population standard error of M (Warner, 2021).
In most real-life research, s is typically not known; such cases require the estimation of s by using values of standard deviation and N in the sample (Warner, 2021). No longer able to assume normal distribution, t distributions are used – characterized by shorter peaks and wider tails than normal distributions in order to correct for issues that arise in estimating s (Warner, 2021). The shape and critical values of a t distribution are dependent upon N. Higher values of N lead to more normal distribution shapes; values larger than 120 bring to naught the sampling error based on using standard deviation to calculate s (Warner, 2021). Additionally, N affects confidence intervals. Confidence intervals – usually set to 95% – reveal expected outcomes if many samples are drawn from a population, and therefore the likelihood that the interval would include m (Warner, 2021). Narrow confidence intervals are preferred and can be achieved by choosing a lower level of confidence, increasing N, and if possible decreasing standard deviation (Warner, 2021). In conclusion, confidence intervals help make up for the uncertainty based on sampling errors discussed and remind that confidence in exact values is limited (Bruce & Bruce, 2016; Warner, 2021).
Conclusion
Important statistical concepts such as outliers and sampling distributions can be perceived from a spiritual viewpoint. Any sampling distribution taken from the population of believers that are part of The Great Outlier’s (Yahusha’s) body ought to demonstrate a consistent pattern of anomalies – people who like Him are not afraid to deviate from the greater population by living set-apart lives dedicated to Yahuah. Like statistical outliers, our commitment disrupts the data. Some can choose to simply delete us without considering the Truth that life is more than the physical – some did just this with Yahusha. However, our persistence in adherence to the Word – to walk as He walked – has the power to reveal the Son to a lost generation.
References
Aguinis, H., Gottfredson, R. K., & Joo, H. (2013). Best-Practice Recommendations for Defining, Identifying, and Handling Outliers. Organizational Research Methods, 16(2), 270–301. https://doi.org/10.1177/1094428112470848
Bruce, A. & Bruce, P. (2016). Improve your data quality using sampling distribution (1st ed.). O’Reilly Media, Inc.
Hollander, M., Wolfe, D. A., & Chicken, E. (2014). Nonparametric statistical methods (3rd ed.). John Wiley & Sons, Inc.
Gibbert, M., Nair, L. B., Weiss, M., & Hoegl, M. (2021). Using Outliers for Theory Building. Organizational Research Methods, 24(1), 172–181. https://doi.org/10.1177/1094428119898877
Gladwell, P. (2008). Outliers: The story of success. Little. Brown and Company.
McGreevy, K. M., Lipsitz, S. R., Linder, J. A., Rimm, E., & Hoel, D. G. (2009). Using Median Regression to Obtain Adjusted Estimates of Central Tendency for Skewed Laboratory and Epidemiologic Data. Clinical Chemistry, 55(1), 165-9. https://go.openathens.net/redirector/liberty.edu?url=https://www.proquest.com/scholarly-journals/using-median-regression-obtain-adjusted-estimates/docview/213995639/se-2
The Scriptures. (2018). Institute for Scripture Research.
Warner, R. (2021). Applied statistics I: Basic bivariate techniques (3rd ed.). SAGE Publications, Inc.
