Open Journal of Statistics
ISSN / EISSN : 2161718X / 21617198
Current Publisher: Scientific Research Publishing, Inc. (10.4236)
Total articles ≅ 627
Latest articles in this journal
Open Journal of Statistics, Volume 10, pp 52-63; doi:10.4236/ojs.2020.101004
LINEX means linear exponential loss function which used in the analysis of statistical estimation and prediction problem which rises exponentially on one side of zero and almost linearly on the other side of zero. It is used in both overestimation and underestimation problems. Ali Shadrokh and Hassan Pazira  presented Shrinkage estimator in Gamma Type-II Censored Data under LINEX loss function. In that paper, they have explained how the LINEX loss function works however no practical or detail explanations were given in terms of changing the shape parameter and the error function. In this study we have explained how the LINEX loss function works through practical or detail explanations in terms of changing the shape parameter and the error function, also see how the loss function works with the data generated from gamma distribution through resampling methods to compare the performance of LINEX loss function considering the relative estimation error and usual estimation error through generating random numbers from gamma distribution like randomization method and by using bootstrapping samples. The very intention is to find out which resampling method performs well in using the LINEX loss function. Using Monte Carlo Simulations these estimators are compared. It is doing draw random number from the gamma distribution and finds the maximum likelihood estimate of θ is and using this estimator to explain the LINEX loss function ; , or , where c is the shape parameter and is any estimate of the parameter . The shape of this loss function is determined by the value of c. In the analysis we use the values of shape parameter c = -0.25, -0.50, -0.75, -1 and c = 0.25, 0.50, 0.75, 1. The same procedure is done by using bootstrapping method, and finally compared between this two methods. The relative estimation error should be used instead of the estimation error where the LINEX loss function works better in both of the cases. Between the two estimators, bootstrap method is better work because although the characteristics are same, bootstrap method is more dispersed than others.
Open Journal of Statistics, Volume 10, pp 113-126; doi:10.4236/ojs.2020.101009
Least Absolute Shrinkage and Selection Operator (LASSO) is used for variable selection as well as for handling the multicollinearity problem simultaneously in the linear regression model. LASSO produces estimates having high variance if the number of predictors is higher than the number of observations and if high multicollinearity exists among the predictor variables. To handle this problem, Elastic Net (ENet) estimator was introduced by combining LASSO and Ridge estimator (RE). The solutions of LASSO and ENet have been obtained using Least Angle Regression (LARS) and LARS-EN algorithms, respectively. In this article, we proposed an alternative algorithm to overcome the issues in LASSO that can be combined LASSO with other exiting biased estimators namely Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator. Further, we examine the performance of the proposed algorithm using a Monte-Carlo simulation study and real-world examples. The results showed that the LARS-rk and LARS-rd algorithms, which are combined LASSO with r-k class estimator and r-d class estimator, outperformed other algorithms under the moderated and severe multicollinearity.
Open Journal of Statistics, Volume 10, pp 154-162; doi:10.4236/ojs.2020.101012
Comparing two samples about corresponding parameters of their respective populations is an old and classical statistical problem. In this paper, we present a simple yet effective tool to compare two samples through their medians. We calculate the confidence of the statement “the median of the first population is strictly smaller (larger) than the median of the second.” We analyze two real data sets and empirically demonstrate the quality of the confidence for such a statement. This confidence in the order of the medians is to be seen as a pre-analysis tool that can provide useful insights for comparing two or more populations. The method is entirely based on their exact distribution with no need for asymptotic considerations. We also provide the Quor statistical software, an R package that implements the ideas discussed in this work.
Open Journal of Statistics, Volume 10, pp 1-9; doi:10.4236/ojs.2020.101001
In this study, we analyze brain activity data describing functional magnetic resonance imaging (MRI) imaging of 820 subjects with each subject being scanned at 4 different times. This multiple scanning gives us an opportunity to observe the consistency of imaging characteristics within the subjects as compared to the variability across the subjects. The most consistent characteristics are then used for the purpose of predicting subjects’ traits. We concentrate on four predictive methods (Regression, Logistic Regression, Linear Discriminant Analysis and Random Forest) in order to predict subjects’ traits such as gender and age based on the brain activities observed between brain regions. Those predictions are done based on the adjusted communication activity among the brain regions, as assessed from 4 scans of each subject. Due to a large number of such communications among the 116 brain regions, we performed a preliminary selection of the most promising pairs of brain regions. Logistic Regression performed best in classifying the subject gender based on communication activity among the brain regions. The accuracy rate was 85.6 percent for an AIC step-wise selected Logistic Regression model. On the other hand, the Logistic Regression model maintaining the entire set of ranked predictor was capable of getting an 87.7 percent accuracy rate. It is interesting to point out that the model with the AIC selected features was better classifying males, whereas the complete ranked model was better classifying females. The Random Forest technique performed best for prediction of age (grouped within five categories as provided by the original data) with 48.8 percent accuracy rate. Any set of predictors between 200 and 1600 was presenting similar rates of accuracy.
Open Journal of Statistics, Volume 10, pp 31-51; doi:10.4236/ojs.2020.101003
The purpose of this article is to present an alternative method for intervention analysis of time series data that is simpler to use than the traditional method of fitting an explanatory Autoregressive Integrated Moving Average (ARIMA) model. Time series regression analysis is commonly used to test the effect of an event on a time series. An econometric modeling method, which uses a heteroskedasticity and autocorrelation consistent (HAC) estimator of the covariance matrix instead of fitting an ARIMA model, is proposed as an alternative. The method of parametric bootstrap is used to compare the two approaches for intervention analysis. The results of this study suggest that the time series regression method and the HAC method give very similar results for intervention analysis, and hence the proposed HAC method should be used for intervention analysis, instead of the more complicated method of ARIMA modeling. The alternative method presented here is expected to be very helpful in gaming and hospitality research.
Open Journal of Statistics, Volume 10, pp 10-30; doi:10.4236/ojs.2020.101002
When longitudinal data contains outliers, the classical least-squares approach is known to be not robust. To solve this issue, the exponential squared loss (ESL) function with a tuning parameter has been investigated for longitudinal data. However, to our knowledge, there is no paper to investigate the robust estimation procedure against outliers within the framework of mean-covariance regression analysis for longitudinal data using the ESL function. In this paper, we propose a robust estimation approach for the model parameters of the mean and generalized autoregressive parameters with longitudinal data based on the ESL function. The proposed estimators can be shown to be asymptotically normal under certain conditions. Moreover, we develop an iteratively reweighted least squares (IRLS) algorithm to calculate the parameter estimates, and the balance between the robustness and efficiency can be achieved by choosing appropriate data adaptive tuning parameters. Simulation studies and real data analysis are carried out to illustrate the finite sample performance of the proposed approach.
Open Journal of Statistics, Volume 10, pp 64-73; doi:10.4236/ojs.2020.101005
The study is on the Binary logistic models of home ownership among civil servants in Wukari, Nigeria. The data used is of primary source using questionnaires. The multicollinear data, as well as the reduced data using the Principal component analysis and the stepwise regression methods to determine the factors that chiefly account for home ownership, were x-rayed. Four components were selected out of six namely grade level of respondent, cadre of institution of service of respondent, family size of respondent and age of respondent. The four components selected accounted for 87.78 percent of the variation and four variables were selected from them. The logit model for home ownership status is obtained from the selected variables. Test for the adequacy of the model was carried out using the count R2 which indicates how useful the explanatory variables are in predicting the response variables and can be referred to as measures of effect size. In testing the significance of each of the factors only Age of respondent is significant in determining variability in the home Ownership Model.
Open Journal of Statistics, Volume 10, pp 74-86; doi:10.4236/ojs.2020.101006
For any statistical analysis, Model selection is necessary and required. In many cases of selection, Bayes factor is one of the important basic elements. For the unilateral hypothesis testing problem, we extend the harmony of frequency and Bayesian evidence to the generalized p-value of unilateral hypothesis testing problem, and study the harmony of generalized P-value and posterior probability of original hypothesis. For the problem of single point hypothesis testing, the posterior probability of the Bayes evidence under the traditional Bayes testing method, that is, the Bayes factor or the single point original hypothesis is established, is analyzed, a phenomenon known as the Lindley paradox, which is at odds with the classical frequency evidence of p-value. At this point, many statisticians have been worked for this from both frequentist and Bayesian perspective. In this paper, I am going to focus on Bayesian approach to model selection, starting from Bayes factors and going within Lindley Paradox, which also briefly talks about partial and fractional Bayes factor. Trying to use a simple way to consider this paradox is the thing what I want to do in the paper. On the other hand, a detailed derivation of BIC and AIC is given in Section 4. The guiding principle of selecting the optimal model is to investigate from two aspects: one is to maximize the likelihood function, the other is to minimize the number of unknown parameters in the model. The larger the likelihood function value, the better the model fitting, but we can not simply measure the model fitting accuracy, which leads to more and more unknown parameters in the model, and the model that becomes more and more complex would have caused an overmatch. Therefore, a good model should be the combination of the fitting accuracy and the number of unknown parameters to optimize the configuration.
Open Journal of Statistics, Volume 10, pp 127-138; doi:10.4236/ojs.2020.101010
Logistic regression is the most important tool for data analysis in various fields. The classical approach for estimating parameters is the maximum likelihood estimation, a disadvantage of this method is high sensitivity to outlying observations. Robust estimators for logistic regression are alternative techniques due to their robustness. This paper presents a new class of robust techniques for logistic regression. They are weighted maximum likelihood estimators which are considered as Mallows-type estimator. Moreover, we compare the performance of these techniques with classical maximum likelihood and some existing robust estimators. The results are illustrated depending on a simulation study and real datasets. The new estimators showed the best performance relative to other estimators.
Open Journal of Statistics, Volume 10, pp 139-153; doi:10.4236/ojs.2020.101011
Engaging non-science majors in a college-level science course can prove challenging. In turn, this can make it difficult to effectively teach science and math content. However, topics related to planetary exploration have a unique way of capturing one’s imagination and may serve to robustly engage non-science majors. In this contribution, I 1) describe a model rocketry lab module, I have created and implemented into an introductory-level planetary geology course and 2) quantify student learning gains as a result of this module. This module builds on model rocketry lesson plans for science and math coursework at the K-12 level (e.g.,  ) and involves students working in groups to 1) design and build model rockets to carry out a theoretical mission that addresses a science question the students have developed, 2) launch their rockets and collect related data, 3) synthesize and evaluate their data, and 4) report their results in both oral and written forms. The tasks of building and launching the model rocket serve as a vehicle that allows students to employ the scientific process while learning about planetary mission design and applying geologic and quantitative skills useful to answering a science-related question. Quantification of student learning gains shows that through this lab module, students significantly improved their quantitative and scientific reasoning skills. Results from student questionnaires showed a significant increase in student interest and confidence in addressing scientific questions as well as an understanding of how planetary missions are designed and conducted.