Refine Search

New Search

Results in Journal Indonesian Journal of Statistics and Its Applications: 129

(searched for: journal_id:(4140976))
Page of 3
Articles per Page
Show export options
  Select all
Sri Sulastri, Lismayani Usman, Utami Dyah Syafitri
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 228-242; doi:10.29244/ijsa.v5i2p228-242

The new student admissions was regularly held every year by all grades of education, including in IPB University. Since 2013, IPB University has a track record of every school that has succeeded in sending their graduates, even until they successfully completed their education at IPB University. It was recorded that there were 5,345 schools that included in the data. It was necessary to making every school in the data into the clusters, so IPB could see which schools were classified as good or not good in terms of sending their graduates to continue their education at IPB based on the characteristics of the clusters. This study using the k-prototypes algorithm because it can be used on the data that consisting of categorical and numerical data (mixed type data). The k-prototypes algorithm could maintain the efficiency of the k-means algorithm in handling large data sizes, but eliminated the limitations of k-means. The results showed that the optimal number of clusters in this study were four clusters. The fourth cluster (421 school members) was the best cluster related to the student admission at IPB University. On the other hand, the third cluster (391 school members) was the worst cluster in this study.
Dhiar Niken Larasati, Usman Bustaman, Setia Pramana
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 333-342; doi:10.29244/ijsa.v5i2p333-342

The COVID-19 outbreak is not only talking about health crises but also social and economic crises all over the world. In Indonesia, the outbreak has shaken almost all business sectors, however it seems to bring a silver lining for e-commerce sectors since the pandemic has developed online shopping habits. During the pandemic, the impact of COVID-19 on the Indonesian economy needs to be updated from time to time to be used on quick policymaking. Therefore, big data plays an important role to provide the information relatively fast. This paper aims to describe how big data i.e., marketplace data, could be used to figure the impact of COVID-19 outbreak on micro and small retailers in Indonesia. The dataset was collected regularly from a marketplace website in Indonesia from January to June 2020. To see the changing of sales during the COVID-19 period, the sales before and after social distancing policy implementation are compared. The result showed that the online marketplace in Indonesia is dominated by micro retailers based on the number of products sold in the marketplace. The total revenue of micro retailers gives a significant increase during the pandemic. Whereas for medium retailers, the increase in total revenue is seen to be lower than micro retailers’ total revenue. It indicates a positive sign for the growth of micro retailers in the online marketplace.
Syalam Ali Wira Dinata, Muhammad Azka, Primadina Hasanah, Suhartono Suhartono, Moh Danil Hendry Gamal
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 243-259; doi:10.29244/ijsa.v5i2p243-259

This paper investigates a case study on short term forecasting for East Kalimantan, with emphasis on special days, such as public holidays. A time series of load demand electricity recorded at hourly intervals contains more than one seasonal pattern. There is a great attraction in using a modelling time series method that is able to capture triple seasonalities. The Triple SARIMA model has been adapted for this purpose and competitive for modelling load. Using the least squares method to estimate the coefficients in a triple SARIMA model, followed by model building, model assumptions and comparing model criteria, we propose and demonstration the triple Seasonal Autoregressive Integrated Moving Average model with AIC 290631.9 and SBC 290674.2 as the best model for this study. The Triple seasonal ARIMA is one of the alternative strategy to propose accurate forecasts of electricity load Kalimantan data for planning, operation maintenance and market related activities.
Suryo Adhi Rakhmawan
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 220-227; doi:10.29244/ijsa.v5i2p220-227

South Kalimantan is a province in Indonesia with many youths and has the lowest score in Indonesia Youth Development Index (YDI) 2017. However, the lowest score is the gender and discrimination dimension which incomplete to be analyzed because there are some indicators that are not included in the dimension. To solve the problems, it is necessary to build a measurement that can monitor a smaller level. Through this research, the author provides a measurement for describing the level of youth development in classifications for South Kalimantan in 2018. This index is built with the analysis factor method. It consists of five dimensions used in Indonesian YDI 2017 with some additional indicators. The result of this research shows that the index is a valid measure due to its significant correlation with Indonesia YDI 2017. The other result is the youth living in urban areas tend to have a higher index than youth who live in rural areas. While the youth who are male, also tend to have a higher development index than the female population. The suggestion for the South Kalimantan government is to improve the youth, the development priority for every classification can be started from the classification and dimension of youth index with the lowest achievement.
E Widodo, R Maggandari
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 260-272; doi:10.29244/ijsa.v5i2p260-272

Crime is bad behavior, from social and religious norms and it makes psychology and economics harm. Stealing, ill-treatment, embezzlement, deception, deception/embezzlement, and adultery are the most crime in the last 9 months. Therefore, for identify the type of crime in the community we need a method to see the tendency of a category using multiple correspondence analysis methods. Analysis of multiple correspondences is one of the descriptive statistics that use to describe a pattern of relationships from contingency’s table with the aim of finding liability between categories. The results of the correspondence analysis are that the tendency of criminal suspect to be related to this types of crime of stealing and ill-treatment to be done by students or students less than 25 years old and were male, suspect of deception and adultery tends to be done by women over 40 years old and does not work, and suspect of embezzlement tends by workers and their ages around 25 to 40 years. The liability of the relation between criminal incidents and the types of crime is the types of crime of ill-treatment and adultery that are most prone to occur in shops with vulnerable hours 00:00-05:59 and 18:00-23:59.
J A Putri, Suhartono Suhartono, H Prabowo, N A Salehah, D D Prastyo, Setiawan Setiawan
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 284-303; doi:10.29244/ijsa.v5i2p284-303

Most research about the inflow and outflow currency in Indonesia showed that these data contained both linear and nonlinear patterns with calendar variation effect. The goal of this research is to propose a hybrid model by combining ARIMAX and Deep Neural Network (DNN), known as hybrid ARIMAX-DNN, for improving the forecast accuracy in the currency prediction in East Java, Indonesia. ARIMAX is class of classical time series models that could accurately handle linear pattern and calendar variation effect. Whereas, DNN is known as a machine learning method that powerful to tackle a nonlinear pattern. Data about 32 denominations of inflow and outflow currency in East Java are used as case studies. The best model was selected based on the smallest value of RMSE and sMAPE at the testing dataset. The results showed that the hybrid ARIMAX-DNN model improved the forecast accuracy and outperformed the individual models, both ARIMAX and DNN, at 26 denominations of inflow and outflow currency. Hence, it can be concluded that hybrid classical time series and machine learning methods tend to yield more accurate forecasts than individual models, both classical time series and machine learning methods.
Said Al Afghani, Widhera Yoza Mahana Putra
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 369-376; doi:10.29244/ijsa.v5i2p369-376

There are several algorithms to solve many problems in grouping data. Grouping data is also known as clusterization, clustering takes advantage to solve some problems especially in business. In this note, we will modify the clustering algorithm based on distance principle which background of K-means algorithm (Euclidean distance). Manhattan, Mahalanobis-Euclidean, and Chebyshev distance will be used to modify the K-means algorithm. We compare the clustered result related to their accuracy, we got Mahalanobis - Euclidean distance gives the best accuracy on our experiment data, and some results are also given in this note.
Hasna Afifah Rusyda, Fajar Indrayatna, Lienda Noviyanti
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 405-414; doi:10.29244/ijsa.v5i2p405-414

This paper will discuss the risk estimation of a portfolio based on value at risk (VaR) using a copula-based asymmetric Glosten – Jagannathan – Runkle - Generalized Autoregressive Conditional Heteroskedasticity (GJR-GARCH). There is non-linear correlation for dependent model structure among the variables that lead to the inaccurate VaR estimation so that we use copula functions to model the joint probability of large market movements. Data is GEV distributed. Therefore, we use Block Maxima consisting of fitting an extreme value distribution as a tail distribution to count VaR. The results show VaR can estimate the risk of portfolio return reasonably because the model has captured the data properties. Data volatility can be accommodated by GJR-GARCH, Copula can capture dependence between stocks, and Block maxima can accommodate extreme tail behavior of the data.
N Cahyani, Sinta Septi Pangastuti, K Fithriasari, Irhamah Irhamah, N Iriawan
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 396-404; doi:10.29244/ijsa.v5i2p396-404

A Neural network is a series of algorithms that endeavours to recognize underlying relationships in a set of data through processes that mimic the way human brains operate. In the case of classification, this method can provide a fit model through various factors, such as the variety of the optimal number of hidden nodes, the variety of relevant input variables, and the selection of optimal connection weights. One popular method to achieve the optimal selection of connection weights is using a Genetic Algorithm (GA), the basic concept is to iterate over Darwin's evolution. This research presents the Neural Network method with the Backpropagation Neural Network (BPNN) and the combined method of BPNN with GA, where GA is used to initialize and optimize the connection weight of BPNN. Based on accuracy value, the BPNN method combined with GA provides better classification, which is 90.51%, in the case of Bidikmisi Scholarship classification in East Java.
Iqbal Hanif, Regita Fachri Septiani
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 377-395; doi:10.29244/ijsa.v5i2p377-395

Rating is one of the most frequently used metrics in the television industry to evaluate television programs or channels. This research is an attempt to develop a prediction model of television program ratings using rating data gathered from UseeTV (interned-based television service from Telkom Indonesia). The machine learning methods (Random Forest and Extreme Gradient Boosting) were tried out utilizing a set of rating data from 20 television programs collected from January 2018 to August 2019 (train dataset) and evaluated using September 2019 rating data (test dataset). Research results show that Random Forest gives a better result than Extreme Gradient Boosting based on evaluation metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). On the training dataset, prediction using Random Forest produced lower RMSE and MAE scores than Extreme Gradient Boosting in all programs, while on the testing dataset, Random Forest produced lower RMSE and MAE scores in 16 programs compared with Extreme Gradient Boosting. According to MAPE score, Random Forest produced more good quality prediction (4 programs in the training dataset, 16 programs in the testing dataset) than Extreme Gradient Boosting method (1 program in the training dataset, 12 programs in the testing dataset) both in training and testing dataset.
Tigor Nirman Simanjuntak, Setia Pramana
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 304-313; doi:10.29244/ijsa.v5i2p304-313

This study aims to conduct analysis to determine the trend of sentiment on tweets about Covid-19 in Indonesia from the Twitter accounts overseas on big data perspective. The data was obtained from Twitter in the period of April 2020, with the word query "Indonesian Corona Virus" from foreign user accounts in English. The process of retrieving data comes from Twitter tweets by crawling the text using Twitter's API (Application Programming Interface) by employing Python programming language. Twitter was chosen because it is very fast and easy to spread through status updates from and among the user accounts. The number of tweets obtained was 8,740 in text format, with a total engagement of 217,316. The data was sorted from the tweets with the largest to smallest engagement, then cleaned from unnecessary fonts and symbols as well as typo words and abbreviations. The sentiment classification was carried out by analytical tools, extracting information with text mining, into positive, negative, and neutral polarity. To sharpen the analysis, the cleaned data was selected only with the largest engagement until those with 100 engagements; then was grouped into 30 sub-topics to be analyzed. The interesting facts are found that most tweets and sub-topics were dominated by the negative sentiment; and some unthinkable sub-topics were talked by many users.
Siska Yosmar, S Damayanti, S Febrika
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 343-354; doi:10.29244/ijsa.v5i2p343-354

The world was shocked by the emergence of a virus that spread very quickly to several countries including Indonesia at the end of 2019. This virus infection is called Corona Virus Disease 2019 (Covid-19). The outbreak of Covid-19 not only threatens human lives but also disrupts various economic, financial, and business activities, especially in Indonesia. A stock portfolio is a collection of financial assets in a unit that is held or created by an investor, investment company, or financial institution. The Black-Litterman model of the stock portfolio is a portfolio model that involves the CAPM equilibrium return and investor views. The purpose of this study is to determine the stock portfolio with the Black-Litterman model using company data listed in the LQ45 stock index from January 2020 to June 2020. Four of the twenty-nine LQ45 stocks were selected as assets in the stock portfolio. The stock portfolio containing the four stocks, namely ICBP, KLBF, MNCN, and TLKM with the Black-Litterman model resulted in an expected return of 2.07% and a risk of 2.82%.
Nadya Dwi Muchisha, Novian Tamara, Andriansyah Andriansyah, Agus M Soleh
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 355-368; doi:10.29244/ijsa.v5i2p355-368

GDP is very important to be monitored in real time because of its usefulness for policy making. We built and compared the ML models to forecast real-time Indonesia's GDP growth. We used 18 variables that consist a number of quarterly macroeconomic and financial market statistics. We have evaluated the performance of six popular ML algorithms, such as Random Forest, LASSO, Ridge, Elastic Net, Neural Networks, and Support Vector Machines, in doing real-time forecast on GDP growth from 2013:Q3 to 2019:Q4 period. We used the RMSE, MAD, and Pearson correlation coefficient as measurements of forecast accuracy. The results showed that the performance of all these models outperformed AR (1) benchmark. The individual model that showed the best performance is random forest. To gain more accurate forecast result, we run forecast combination using equal weighting and lasso regression. The best model was obtained from forecast combination using lasso regression with selected ML models, which are Random Forest, Ridge, Support Vector Machine, and Neural Network.
Dede Yoga Paramartha, Ana Lailatul Fitriyani, Setia Pramana
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 314-325; doi:10.29244/ijsa.v5i2p314-325

Environmental data such as pollutants, temperature, and humidity are data that have a role in the agricultural sector in predicting rainfall conditions. In fact, pollutant data is common to be used as a proxy to see the density of industry and transportation. With this need, it is necessary to have automated data from outside websites that are able to provide data faster than satellite confirmation. Data sourced from IQair, can be used as a benchmark or confirmative data for weather and environmental statistics in Indonesia. Data is taken by scraping method on the website. Scraping is done on the API available on the website. Scraping is divided into 2 stages, the first is to determine the location in Indonesia, the second is to collect statistics such as temperature, humidity, and pollutant data (AQI). The module used in python is the scrapy module, where the crawling is effective starting from May 2020. The data is recorded every three hours for all regions of Indonesia and directly displayed by the Power BI-based dashboard. We also illustrated that AQI data can be used as a proxy for socio-economic activity and also as an indicator in monitoring green growth in Indonesia.
Rio Pradani Putra, Dian Anggraeni, Alfian Futuhul Hadi
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 326-332; doi:10.29244/ijsa.v5i2p326-332

Rainfall forecasting has an important role in people's lives. Rainfall forecasting in Indonesia has complex problems because it is located in a tropical climate. Rainfall prediction in Indonesia is difficult due to the complex topography and interactions between the oceans, land and atmosphere. With these conditions, an accurate rainfall forecasting model on a local scale is needed, of course taking into account the information about the global atmospheric circulation obtained from the General Circulation Model (GCM) output. GCM may still be used to provide local or regional scale information by adding Statistical Downscaling (SD) techniques. SD is a regression-based model in determining the functional relationship between the response variable and the predictor variable. Rainfall observations obtained from the Meteorology Climatology and Geophysics Council (BMKG) are a response variable in this study. The predictor variable used in this study is the global climate output from GCM. This research was conducted in a place, namely Kupang City, East Nusa Tenggara because it has low rainfall. The Projection Pursuit Regression (PPR) will be used in this SD method for this study. In PPR modeling, optimization needs to be done and model validation is carried out with the smallest Root Mean Square Error (RMSE) criteria. The expected results must have a pattern between the results of forecasts and observations showing or approaching the observational data. The PPR model is a good model for predicting rainfall because The results of the forecast and observation show that the results of the rainfall forecast are observational data.
Salsabila Basalamah, Edy Widodo
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 273-283; doi:10.29244/ijsa.v5i2p273-283

Response Surface Method (RSM) is a collection of statistical techniques in the form of experiments and regression, as well as mathematics that is useful for developing, improving, and optimizing processes. In general, the determination of models in RSM is estimated by linear regression with Ordinary Least Square (OLS) estimation. However, OLS estimation is very weak in the presence of data identified as outliers, so in determining the RSM model a strong and resistant estimation is needed namely robust regression. One estimation method in robust regression is the Method of Moment (MM) estimation. This study aims to compare the OLS estimation and MM estimation method to get the optimal point of response in this case study. Comparison of the best estimation models using the parameters MSE and R^2 adj. The results of MM estimation give better results to the optimal response results in this case study.
Winda Nurpadilah, I Made Sumertajaya, Muhamad Nur Aidi
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 173-181; doi:10.29244/ijsa.v5i1p173-181

Spatial regression analysis is a form of regression model that considers spatial effects. Geographically weighted regression (GWR) is the spatial regression methods that can be used to deal with the problem of spatial diversity. This method generates local model parameter estimates for each observation location. The application of spatial statistics can be done in all areas such as the problem of poverty. Poverty can be influenced by factors of proximity between regions, so that in determining the poverty factor, the proximity factor of the region cannot be ignored. West Java Province is a province with the largest population, so this study aims to model the poverty data in West Java Province by incorporating spatial effects. The weighting function used for the GWR model is the function of the fixed and adaptive kernels. The analysis results show that the fixed exponential kernel function has the smallest cross validation (CV) value, so the weighting matrix used in the model is determined by the exponential kernel function. The largest value and the smallest AIC value are owned by the GWR model with an exponential kernel function. Based on the results obtained by the the ANOVA table to test GWR's global goodness, the GWR model is more effective than global regression. Therefore, the GWR model is the best model when it used in West Java’s poverty cases. The effect of each explanatory variable on the percentage of poverty varies in each district/city in West Java Province.
Dia Cahya Wati, Dea Alvionita Azka, Herni Utami
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 61-74; doi:10.29244/ijsa.v5i1p61-74

The Geographically Weighted Panel Regression (GWPR) is a development of a global regression model where the basic idea is taken from a combination of panel data and GWR. The GWPR model is built from the point approach method, which is based on the position of the coordinates of latitude and longitude. The parameters for the regression model at each location will produce different values. GWPR can accommodate spatial effects, so that it can better explain the relationship between response variables and predictors. The purpose of this study is to compare the GWPR model with the Fixed Gaussian and Adaptive Bisquare weighting functions based on the AIC value. The data used in this study is secondary data taken from the website of the Central Statistics Agency (BPS) in the form of Per-Capita Expenditure Figures in South Sumatra in 2013-2019. This research results that in the case of the Per-Capita Expenditure Rate (AP), it is better to use the GWPR method with a fixed gaussian weighting function in the modeling, where the resulting coefficient of determination is 95.81% rather than adaptive bisquare with a determination coefficient of 93.3%. The factors that influence the Per-Capita Expenditure Rate (AP) in South Sumatra on the fixed gaussian weighting are divided into 6 groups, while the adaptive bisquare is divided into 2 groups.
Angel Zushelma Hartono, Siskarossa Ika Oktora
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 182-194; doi:10.29244/ijsa.v5i1p182-194

Adolescent smoking habits have become the Ministry of Health's major program associated with tobacco consumption. In 2016, the prevalence of adolescent smoking aged 10-18 years reached 8.8% and were rate increasingly against the Strategic Planning Ministry of Health 2015-2019 target to lower adolescent smoking prevalence to 5.4%. Male adolescents consuming cigarettes are higher than females. Whereas, high consumption of cigarettes in men will increase the risk of impotence and decrease reproductive health quality to affect future generations' quality. This study aims to determine the general picture of smoking behavior in Indonesia's male adolescent in 2018 and any variables that affect the number of cigarettes consumed. The analytical method used is Poisson Regression and Negative Binomial Regression. The data source used is raw data Riskesdas 2018 with the unit of analysis are male adolescent smokers aged 10-18 years. Research indicates that most male adolescents are light smokers. Heavy smokers were dominated by older age, living in a rural area, poorly educated, employed, lived with a household head who was a smoker, and had low education. Age, location of residence, education level, working status, smoking status, and household head education level significantly affect male adolescents' smoking behavior.
Nurul Fadhilah, Erfiani Erfiani, Indahwati Indahwati
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 14-25; doi:10.29244/ijsa.v5i1p14-25

The calibration method is an alternative method that can be used to analyze the relationship between invasive and non-invasive blood glucose levels. Calibration modeling generally has a large dimension and contains multicolinearities because usually in functional data the number of independent variables (p) is greater than the number of observations (p>n). Both problems can be overcome using Functional Regression (FR) and Functional Principal Component Regression (FPCR). FPCR is based on Principal Component Analysis (PCA). In FPCR, the data is transformed using a polynomial basis before data reduction. This research tried to model the equations of spectral calibration of voltage value excreted by non-invasive blood glucose level monitoring devices to predict blood glucose using FR and FPCR. This study aimed to determine the best calibration model for measuring non-invasive blood glucose levels with the FR and FPCR. The results of this research showed that the FR model had a bigger coefficient determination (R2) value and lower Root Mean Square Error (RMSE) and Root Mean Square Error Prediction (RMSEP) value than the FPCR model, which was 12.9%, 5.417, and 5.727 respectively. Overall, the calibration modeling with the FR model is the best model for estimate blood glucose level compared to the FPCR model.
Dwi Jayanti, Septian P Palupi, Khairil Anwar Notodiputro
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 195-204; doi:10.29244/ijsa.v5i1p195-204

Unemployment is a critical problem faced by developing countries. It is a complex problem which creates other social and economic problems such as poverty, economic gaps, and crimes. This paper discusses the determinant factors of unemployment rates based on empirical data using the conditional logistic regression model. The model was used to analyze matched pair data using gender, age and residence as matching factors. The result showed that household status, marriage status, as well as levels of education were the determinant factors of a person being unemployed in West Java. It is also shown that the conditional logistic regression outperformed the standard logistic regression for analyzing the cause of unemployment.
Jajang Jajang, Budi Pratikno, Mashuri Mashuri
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 130-140; doi:10.29244/ijsa.v5i1p130-140

In 2019 the number of people with TB (Tuberculosis) in Banyumas, Central Java, is high (1,910 people have been detected with TB). The number of people infected Tuberculosis (TB) in Banyumas is the count data and it is also the area data. In modeling, the parameter estimation and characteristic of the data need to be considered. Here, we studied comparing Generalized Poisson (GP), negative binomial (NB), and Poisson and CAR.BYM model for TB cases in Banyumas. Here, we use two methods for parameter estimation, maximum likelihood estimation (MLE) and Bayes. The MLE is used for GP and NB models, whereas Bayes is used for Poisson and CAR-BYM. The results showed that Poisson model detected overdispersion where deviance value is 67.38 for 22 degrees of freedom. Therefore, ratio of deviance to degrees of freedom is 3.06 (>1). This indicates that there was overdispersion. The folowing GP, NB, Poisson-Bayes and CAR-BYM are used to modeling TB data in Banyumas and we compare their RMSE. With refer to RMES criteria, we found that CAR-BYM is the best model for modeling TB in Banyumas because its RMSE is smallest.
Rizky Zulkarnain, Tri Listianingrum, Khairil Anwar Notodiputro
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 161-172; doi:10.29244/ijsa.v5i1p161-172

Working children may create problem since it relates to human right as well as to the development of children especially in getting sufficient education. This paper discusses determinant factors of working children by using conditional logistics regression for matched pairs data. Matching is employed to adjust confounding factors and to avoid bias. In this paper there are three confounding factors that have been considered, i.e. residential area, gender, and income of household head. The results showed that the conditional regression model outperformed the standard regression model. The number of household members, whether the head of household was married or single, age of the head of household, educational attainment of the head of household, as well as the work status of the head of household were the determinant factors of the working children.
Muhammad Ilham Abidin, Khairil Anwar Notodiputro, Bagus Sartono
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 26-38; doi:10.29244/ijsa.v5i1p26-38

Efforts from the police to address hate speech on social media such as Twitter will not be sufficient to rely solely on manual checks. Therefore, it is necessary to use statistical modelling like the classification model to detect hate speech automatically. Classification is a type of predictive modelling to produce accurate predictions based on labelled data. Generally, the available data are usually unlabelled implying that the labelling process needs to be done beforehand. Data labelling is time consuming, high cost, and often fails to produce correct labels. This research aims to improve the performances of classification models by adding a small amount of data through the so called active learning method. The results showed that there was no significant difference in the performances of logistic regression and naïve bayes classification models in detecting hate speech. However, the results also showed that adding data through the active learning method substantially improved the logistics regression performance in detecting hate speech when compared to data addition based on a simple random sampling method. Therefore, the performances of classification models in detecting hate speech on Twitter could be improved by using an active learning method.
Ferry Kondo Lembang, Lexy Janzen Sinay, Asrul Irfanullah
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 39-49; doi:10.29244/ijsa.v5i1p39-49

Maluku Province is one of the regions in Indonesia with a very active and very prone earthquake intensity because it is a meeting place for 3 (three) plates, namely the Eurasian, Pacific and Australian plates. In the last 100 years, the history of tectonic earthquakes with tsunamis that occurred in Indonesia was 25-30% occurring in the Maluku Sea and Banda Sea. Based on this fact, this study aims to analyze the incidence of tectonic earthquakes that occurred in the Maluku region and its surroundings using the Autoregressive Fractionally Integrated Moving Averages (ARFIMA) model which has the ability to explain long-term time series data (long memory). The results of the research data analysis show that the best model for predicting the number of tectonic earthquakes that occur in Maluku and its surroundings is ARFIMA (0; 0.712; 1) with an MSE value of 0.1156. Meanwhile, the best model for predicting the average magnitude of the number of tectonic earthquakes that occurred in Maluku and its surroundings is ARFIMA (0; -3,224 x 10-9; 1) with an MSE value of 0.01237. Based on the two best models, the prediction results obtained from the number of tectonic earthquakes and the average magnitude of the number of tectonic earthquakes that occurred in Maluku and its surroundings for the next three periods, namely the first period there were 31 tectonic earthquakes with an average magnitude of 4.38481 SR. the second period there were 32 tectonic earthquakes with an average magnitude of 4.38407, and the third period there were 32 tectonic earthquakes with an average magnitude of 4.38333.
Yusma Yanti, Asep Saepulrohman
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 92-104; doi:10.29244/ijsa.v5i1p92-104

Determining the segmentation and positioning of the lecturers in selecting the thesis supervisor is very important to do. It is because, with this information, the supervision process in thesis writing can run well. This study intends to analyze the segmentation and positioning of lecturers related to determine the thesis supervisor using the Clusterwise Bilinear Spatial Multidimensional Scaling Model (CBSMSM) method. The data used is survey data for fifth-semester bachelor students of the 2019/2020 academic year of the Department of Computer Science, Pakuan University. One hundred sixty-one student observations provide an assessment of 10 attributes regarding the characteristics of 32 lecturers of the department. Furthermore, the estimation of the segment coordinate parameters, lecturer coordinates, dimensions, and attributes simultaneously uses the alternating least square (ALS) algorithm. The number of segments and dimensions are selected based on the smallest sum square error (SSE) value for combining segments and other dimensions. As a result, we get four segments and four dimensions with an SSE value of 4864.003. Furthermore, the department can use this result to illustrate student assessments of their lecturers' characteristics regarding thesis supervision.
Zerlita Fahdha Pusdiktasari, Widiarni Ginta Sasmita, Wulaida Rizky Fitrilia, Rahma Fitriani, Suci Astutik
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 117-129; doi:10.29244/ijsa.v5i1p117-129

The Covid-19 pandemic has hit Indonesia since March 2020. Several policies have been issued by the Indonesian government to reduce the level of the spread of Covid-19. This policy has an impact on various fields of life, especially the economic sector in various sectors. This study was conducted to analyze the grouping of provinces whose economies are at risk of being affected by Covid-19 based on various economic sectors, namely the unemployment rate, the percentage of poor people, the provincial minimum wage, and the occupancy rate of hotels using cluster analysis. Cluster analysis was performed using several hierarchical methods, namely Simple, Complete, Average, and Centroid Linkage and Ward. The Cophenetic correlation coefficient (rCoph) was used to determine the best method, while the number of clusters was determined based on the Dunn, Connectivity, and Silhoutte indexes. The analysis result shows that Average Linkage is the best method with two clusters. The first cluster consists of all provinces in Indonesia except Papua, whose economy is highly at risk of being affected by Covid-19, characterized by a low percentage of the poor and a low provincial minimum wage, as well as high levels of open unemployment and hotel occupancy rates. Meanwhile, the second cluster consists of the Province of Papua, which is an economic group with a low risk of being affected by Covid-19. By looking at the impact of the Covid-19 disaster, the government can make recovery efforts and generalize economic recovery policies due to Covid-19 which have an impact on the economy of almost all provinces in Indonesia.
Qorry Meidianingsih, Debby Agustine
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 105-116; doi:10.29244/ijsa.v5i1p105-116

The problems of imbalanced class classification have been found in many real applications. It has potential to make the minority class instances tend to be classified into the majority class. This study examined the performance of bagging method’s application in safe-level SMOTE based on Support Vector Machine classifier. The data used consisted of three types based on the proportion of observations in the majority and minority classes. Each type of data has three variables, two independent variables and one variable dependent. The observations of independent variables were generated based on multivariate normal distribution, while dependent variables are binary. The results showed that the classifier has a high accuracy and sensitivity for all types of data for both in the imbalanced class and the balanced class (obtained by safe-level SMOTE and safe-level SMOTEBagging). Nevertheless, specificity was the main measure in assessing the performance of the classifier because it provides accuracy in classifying the minority class observations. The specificity increased when the number of observations between the two classes were approximately balance due to the implementation of safe-level SMOTE. The best performance of the Support Vector Machine in predicting minority class observations was achieved when bagging were applied in safe-level SMOTE. The specificity rate for all types of data were 77.93 percent, 78.46 percent, and 85.69 percent, respectively.
Naima Rakhsyanda, Kusman Sadik, Indahwati Indahwati
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 50-60; doi:10.29244/ijsa.v5i1p50-60

Small area estimation can be used to predict the population parameter with small sample sizes. For some cases, the population units that are close spatially may be more related than units that are further apart. The use of spatial information like geographic coordinates are studied in this research. Outlier contaminations can affect small area estimations. This study was conducted using simulation methods on generated data with six scenarios. The scenarios are the combination of spatial effects (spatial stationary and spatial non-stationary) with outlier contamination (no outlier, symmetric outliers, and non-symmetric outliers). The purpose of this study was to compare the geographically weighted empirical best linear unbiased predictor (GWEBLUP) and robust GWEBLUP (RGWEBLUP) with direct estimator, EBLUP, and REBLUP using simulation data. The performance of the predictors is evaluated using relative root mean squared error (RRMSE). The simulation results showed that geographically weighted predictors have the smallest RRMSE values for scenarios with spatial non-stationary, therefore offer a better prediction. For scenarios with outliers, robust predictors with smaller RRMSE values offer more efficiency than non-robust predictors.
Valantino Agus Sutomo, Dian Kusumaningrum, Aurellia Layvieda, Rahma Anisa
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 205-219; doi:10.29244/ijsa.v5i1p205-219

Area yield index insurance at district level faces heterogeneous basis risk due to geographical conditions which implies to obtain unprecise critical index . Clustering and zone-based area yield scheme can reduce heterogeneous basis risk that leads to determine the suitable alternative for . On the previous research, we have obtained 7 clusters and 2 level of paddy productivity based on clustering assumption from primary data in Java. The suitable clustering assumption for calculating is cluster based assumption, which gives the homogeneous paddy productivity under 7 clusters in Java. Therefore, our goal is to develop area yield index at district level (cluster based) with minimize basis risk at certain constraints for paddy farmer productivity in Java Indonesia. There are some methods for calculating such as mean, median, winsor mean, one sigma, two sigma and (first quartile) method on the basis risk constraints using confusion matrix. Furthermore, two basis risk constraints are the difference between overpayment and shortfall is not extremely far, and total basis risk does not exceed 20% of its total claim occurrence. Two sigma method has the lowest basis risk, overpayment, and shortfall, but it has lowest pure premium, small probability of claim, and low range of claim. Hence, we consider to use (first quartile) method as alternative and suitable method to calculate that satisfied two basis risk constraints. In conclusion, our research provides analytical calculation for area yield index at district level with pure premium as Rp 152,151 using ( method), which is sufficient to cover the total claim and consistent with the simulation.
Yopi Ariesia Ulfa, Agus M Soleh, Bagus Sartono
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 1-13; doi:10.29244/ijsa.v5i1p1-13

Based on data from the Directorate General of Disease Prevention and Control of the Ministry of Health of the Republic of Indonesia, in 2017, new leprosy cases that emerged on Java Island were the highest in Indonesia compared to the number of events on other islands. The purpose of this study is to compare Poisson regression to a negative binomial regression model to be applied to the data on the number of new cases of leprosy and to find out what explanatory variables have a significant effect on the number of new cases of leprosy in Java. This study's results indicate that a negative binomial regression model can overcome the Poisson regression model's overdispersion. Variables that significantly affect the number of new cases of leprosy based on the results of negative binomial regression modeling are total population, percentage of children under five years who had immunized with BCG, and percentage of the population with sustainable access to clean water.
Nurafiza Thamrin, Arie Wahyu Wijayanto
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 141-160; doi:10.29244/ijsa.v5i1p141-160

The National Medium Term Development Plan 2020-2024 states that one of the visions of national development is to accelerate the distribution of welfare and justice. Cluster analysis is analysis that grouping of objects into several smaller groups where the objects in one group have similar characteristics. This study was conducted to find the best clustering method and to classify cities based on the level of welfare in Java. In this study, the cluster analysis that used was hard clustering such as K-Means, K-Medoids (PAM and CLARA), and Hierarchical Agglomerative as well as soft clustering such as Fuzzy C Means. This study use elbow method, silhouette method, and gap statistics to determine the optimal number of clusters. From the evaluation results of the silhouette coefficient, dunn index, connectivity coefficient, and Sw/Sb ratio, it was found that the best cluster analysis was Agglomerative Ward Linkage which produced three clusters. The first cluster consists of 27 cities with moderate welfare, the second cluster consists of 16 cities with high welfare, the third cluster consists of 76 cities with low welfare. With the best clustering results, the government of cities in Java shall be able to make a better policies of welfare based on the dominant indicators found in each cluster.
Sri Astuti Thamrin, Dian Sidik, Hedi Kuswanto, Armin Lawi, Ansariadi Ansariadi
Indonesian Journal of Statistics and Its Applications, Volume 5, pp 75-91; doi:10.29244/ijsa.v5i1p75-91

The accuracy of the data class is very important in classification with a machine learning approach. The more accurate the existing data sets and classes, the better the output generated by machine learning. In fact, classification can experience imbalance class data in which each class does not have the same portion of the data set it has. The existence of data imbalance will affect the classification accuracy. One of the easiest ways to correct imbalanced data classes is to balance it. This study aims to explore the problem of data class imbalance in the medium case dataset and to address the imbalance of data classes as well. The Synthetic Minority Over-Sampling Technique (SMOTE) method is used to overcome the problem of class imbalance in obesity status in Indonesia 2013 Basic Health Research (RISKESDAS). The results show that the number of obese class (13.9%) and non-obese class (84.6%). This means that there is an imbalance in the data class with moderate criteria. Moreover, SMOTE with over-sampling 600% can improve the level of minor classes (obesity). As consequence, the classes of obesity status balanced. Therefore, SMOTE technique was better compared to without SMOTE in exploring the obesity status of Indonesia RISKESDAS 2013.
M. Yunus, Asep Saefuddin, Agus M Soleh
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 649-660; doi:10.29244/ijsa.v4i4.724

One of the rainfall prediction techniques is the Statistical Downscaling Modeling (SDS). SDS modeling is one of the applications of modeling with covariates conditions that are generally large and not independent. The problems that will be encountered is the problem of ill-conditional data i.e multicollinearity and the high correlation between variables. The case of highly correlated data causes a linear regression coefficient estimators obtained to have a large variance. This research was conducted to make the statistical downscaling modeling using the lasso and group lasso for the prediction of rainfall. Group of the covariate scenario is applied based on the adjacent area, the high correlation between covariates and correlation between covariates and responses, and also the addition of dummy variables. Scenario six (grouping which is done by considering the covariates that have a positive correlation to the response is divided into 3 groups, 1 individual and the covariates that are negatively correlated with the response are divided into 2 groups, 1 individual) is better than the other scenarios in linear modeling without a dummy. Then, linear modeling with a dummy is better than without a dummy for both techniques. In linear modeling with a dummy, the Group lasso technique can be considered more in SDs modeling, because the difference in the RMSEP statistical value and the correlation coefficient value is significant.
Windyana Pusparani, Agus M Soleh, Akbar Rizki
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 590-603; doi:10.29244/ijsa.v4i4.525

Twitter is a popular social media platform for communicating between its users by writing short messages in limited characters, called tweets. Extracting data information that has non-structured form and huge-sized, usually known as text mining. Badan Nasional Penanggulangan Bencana Indonesia (@BNPB_Indonesia) is the official twitter account of the government agency in the field of disaster management that uses twitter to share much information about disasters that have occurred in Indonesia. This study aims to determine the characteristics of all tweets and to group the types of tweets that they shared based on the similarity of its content. The data used in the study came from BNPB Indonesia's tweets with the period of taking tweets 6th of August 2018 to 16th of February 2019. The cluster result obtained by the k-Means method was 4 groups. The characteristics of the first cluster contained information about the weather conditions in Yogyakarta, the second cluster was about the source and magnitude of an earthquake, and the third group was about the occurrence of earthquakes in Lombok. However, the fourth group characteristic couldn’t be specifically identified because there was no clear distinction between other tweets in its members.
Isna Shofia Mubarokah, Anwar Fitrianto, Farit M Affendi
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 627-637; doi:10.29244/ijsa.v4i4.709

ARCH and GARCH models are widely used in financial data to describe its volatility pattern. The models assume the positive and negative return residual gives the same or symmetric influence on its volatility. However, in reality, this assumption is frequently violated, which is called heteroscedasticity. Therefore, to deal with heteroscedasticity and asymmetric data, the asymmetric GARCH models, which are EGARCH and GJR-GARCH models are used. This research aims to compare the models between symmetric and asymmetric GARCH to make financial data modeling. It uses daily data on three foreign exchange rates for IDR including IDR/CNY, IDR/JPY, and IDR/USD. The data series to be used here are from January 4, 2016, to January 20, 2020. This research method is started by selecting the best mean model for each data. Based on the best mean model, then modeling the mean and variance function are simultaneously conducted using the GARCH model. To test whether there was an asymmetric effect on the data, a Lagrange multiplier test was applied on the residuals of the GARCH model. The results show that the asymmetric effect was found in the IDR/CNY and IDR/JPY exchange rates. To overcome this asymmetric effect, EGARCH and GJR-GARCH model were applied to the two exchange rates. Then the two models are compared to find out which volatility model is better. Using AIC and BIC we find EGARCH as the best model for IDR/CNY exchange rates daily return and GJR-GARCH as the best model for IDR/JPY exchange rates daily return.
Choirun Nisa, Muhammad Nur Aidi, I Made Sumertajaya
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 615-626; doi:10.29244/ijsa.v4i4.689

The negative binomial distribution is one of the data collection counts that focuses on success and failure events. This study conducted a study of the distribution of negative binomial data to determine the characterization of the distribution based on the value of Variance Mean Ratio (VMR). Simulation data are generated based on negative binomial distribution with a combination of p and n parameters. The results of the VMR study on negative binomial distribution simulation data show that the VMR value will be smaller when the p-value is large and the VMR value is more stable as the sample size increases. Simulation data of negative binomial distribution when p≥0.5 begins to change data distribution to the distribution of Poisson and binomial. The calculation VMR value can be used as a reference for detecting patterns of data count distribution.
Evita Choiriyah, Utami Dyah Syafitri, I Made Sumertajaya
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 579-589; doi:10.29244/ijsa.v4i4.584

Based on Statistics Indonesia (BPS) South Sulawesi is one of the national rice granary province. There are three regions, Bone, Wajo, and Gowa that contribute to the high production of rice in South Sulawesi. However, rice production in Indonesia especially South Sulawesi often declined sharply due to climate disturbances, such as drought or flood. Therefore, Indonesia's government should provide a forecast related to rice production accurately to ensure the availability of food stocks as an integral part of national food security. Moreover, rainfall as climate factors should be included to produce an appropriate forecast model that can be expected to generate the estimation of the rice production data accurately. This research focused on comparing the forecasting model of rice production data by SARIMAX and GSTARIMAX model and used rainfall as explanatory variables. The SARIMAX model is a multivariate time series forecasting model that can accommodate the seasonal components. In contrast, the GSTARIMAX model, which is equipped with an inverse distance spatial weight matrix, is a space-time forecasting model that involves interconnection between locations. The GSTARIMAX model built for rice production forecasting in Bone, Wajo, and Gowa is GSTARIMAX (2,1,0)(0,1,1)12. Rainfall as an explanatory variable was significant at each location. The comparison of rice production forecasting models for the next six periods in four locations showed that the GSTARIMAX model provided more stable forecasting results than the SARIMAX model, viewed from the average MAPE value of the GSTARIMAX mode in each location.
Beny Trianjaya, Anang Kurnia, Agus M Soleh
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 566-578; doi:10.29244/ijsa.v4i4.333

Employment data is one of the important indicators related to the development progress of a country. Labor conditions in the territory of Indonesia can only be compared between times through the Survei Angkatan Kerja Nasional (Sakernas) data. Data generated from Sakernas and published by BPS is the number of employed and unemployed. The obstacle in estimating the semester unemployment rate at the regency/municipality level is the lack of a number of examples. One of the indirect estimates currently developing is small area estimation (SAE). This study developed the generalized linear mixed model (GLMM) by adding cluster information and examines the development of modifications with several model scenarios. The purpose of this study was to develop a prediction model for basic GLMM on a small area approach by adding cluster information as a fixed effect or random effect. The simulation results show that Model-2, a model that adds a fixed effect k-cluster and also adds a mean from the estimated effect of random areas in the sample area, is the best model with the smallest relative bias (RB) and Relative root mean squares error (RRMSE). This model is better than the basic GLMM model (Model-0) and Model-1 (a model which only adds a mean from the estimated random effect area in the sample area). Model-2 is applied to estimate the proportion of unemployed sub-district level in Southeast Sulawesi Province. Estimating the proportion of unemployed with calibration Model-2 produced an estimated aggregation of the unemployment proportion of Southeast Sulawesi Province at 0.0272. These results are similar to BPS (0.0272). Thus, the results of the estimated proportion of unemployment at the sub-district level with a calibration Model-2 can be said to be feasible to use.
Rahmat Kevin Praditia, Dian Agustina, Dyah Setyo Rini
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 638-648; doi:10.29244/ijsa.v4i4.716

A method that can be used if there is a spatial factor and if overdispersion happens in a count data is Geographically Weighted Zero-Inflated Poisson Regression (GWZIPR). This research aimed to analyze the number of malaria cases in every regency/city of Sumatra Land using the GWZIPR method and distribution mapping of factors affecting the number of malaria cases in Sumatra Land. Data involved in this research was the number of malaria cases as the response variable and the predictor variable as a percentage of households that have access to proper sanitation, a percentage of households that have access to proper water resources, and a percentage of the number of public health centers. The results were for each area which had distinctive models based on significant variables. The distribution mapping of factors affecting the number of malaria cases in every regency/city was commonly divided into three groups based on significant variables on ln and logit models. The mapping did not shape a spreading pattern or each regency/city in that group because the geographical locations were close to each other. GWZIPR method in this research was better than the ZIP Regression method because it produced the least AIC value.
Anissa Dika Larasati, Vera Lisna
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 604-614; doi:10.29244/ijsa.v4i4.654

Economic development includes increasing economic growth and alleviating poverty. D.I Yogyakarta is a province with the lowest economic growth and per capita income compared to other provinces in Java. Besides, it has the highest poverty rate. With this condition, it is feared that economic development and economic contribution in D.I Yogyakarta which are relatively low are difficult to increase. This study aims to analyze the simultaneous relationship between indicators of economic development in the province of D.I Yogyakarta, explores the variables that influence it, and perform policy simulations to improve economic development. The indicators used to describe economic growth in this study are Regional Gross Domestic Product (regional GDP), household consumption, and community savings in banks. While the indicators that are used to reflect the poverty level are the percentage of poor people. The estimation method used is simultaneous Two-Stage Least Squares (2SLS) equation system which consisted of three structural equation and one identity equation using the historical data from the year 2001-2017. The results of the simulation show a 6% increase in government expenditure can improve economic growth to 5.41% and reduce the percentage of poor people by 0.41% points.
Riza Indriani Rakhmalia, Agus M Soleh, Bagus Sartono
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 473-483; doi:10.29244/ijsa.v4i3.667

Rainfall prediction is one of the most challenging problems of the last century. Statistical Downscaling Technique is one of the rainfall estimation techniques that are often used. The goal of this paper is to develop the modeling of cluster-wise regression with rainfall data set that has Tweedie distribution. The data used in this paper were the precipitation from Climate Forecast System Reanalysis (CFSR) version 2 as the predictor variables and rainfall from BMKG as the response variable. Data were collected from January 2010 to December 2019 on the Bogor, Citeko, Jatiwangi, and Bandung rain posts. The best result of this study is a Cluster-wise Regression model with 4 clusters and using Tweedie distribution in each rain post. The best model was evaluated by the Root Mean Square Error Prediction. RMSEP value on Bogor rain post is 17.11 (three clusters), Citeko rain post 14.85 (two clusters), Jatiwangi rain post 15.26 (three clusters), and Bandung rain post 14.33 (two clusters). This model was able to make models and clusters well on daily rainfall application.
Siswanto Siswanto, Sri Astuti Thamrin
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 498-509; doi:10.29244/ijsa.v4i3.681

In Indonesia malaria is found to be widespread in all islands with varying degrees and severity of infection. Based on the Annual of Parasite Incidence (API) in Eastern Indonesia, Malaria is a disease that has a high incidence rate. The three provinces with the highest APIs are Papua (42.64%), West Papua (38.44%) and East Nusa Tenggara (16.37%). Spatial aspects are considered important to be studied because the spread of disease through mosquitoes is strongly influenced by fluctuating climate. The purpose of this study is to determine the potential factors that influence the incidence of Malaria disease in the province of Papua in 2013 by looking at aspects that are the focus of attention in spatial epidemiology. The methods used in analyzing the area are Simultaneous Autoregressive (SAR) and Conditional Autoregressive (CAR) models with a spatial weighting matrix up to second order. The result shows the average monthly wind velocity, average monthly rainfall, and malaria treatment with government program drugs by getting ACT drugs are substantial factors in determining the incidence number of Malaria in Papua based on the lowest AIC value for the second-order of CAR model. While the SAR model, in this case, has no spatial influence. By knowing the potential factors that influence the incidence of malaria, the Papua Province through the Health Office can take more effective preventive measures to reduce the number of malaria incidents.
Debora Chrisinta, I Made Sumertajaya, Indahwati Indahwati
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 448-461; doi:10.29244/ijsa.v4i3.630

Most of the traditional clustering algorithms are designed to focus either on numeric data or on categorical data. The collected data in the real-world often contain both numeric and categorical attributes. It is difficult for applying traditional clustering algorithms directly to these kinds of data. So, the paper aims to show the best method based on the cluster ensemble and latent class clustering approach for mixed data. Cluster ensemble is a method to combine different clustering results from two sub-datasets: the categorical and numerical variables. Then, clustering algorithms are designed for numerical and categorical datasets that are employed to produce corresponding clusters. On the other side, latent class clustering is a model-based clustering used for any type of data. The numbers of clusters base on the estimation of the probability model used. The best clustering method recommends LCC, which provides higher accuracy and the smallest standard deviation ratio. However, both LCC and cluster ensemble methods produce evaluation values that are not much different as the application method used potential village data in Bengkulu Province for clustering.
Ernawati Ernawati, Bambang Widjanarko Otok, Sutikno Sutikno
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 510-527; doi:10.29244/ijsa.v4i3.653

In a study it is necessary to have a good randomization role between the treatment and control groups so there is no large differences in the observed covariates resulting in an estimate of the effect of unbiased treatment. However, in observational studies, especially in the field of health, because it is directly related to human life, it is not possible to do Randomized Controlled Trial (RCT). One method of propensity score (PS) is Propensity Score Stratification (PSS) with approach of Support Vector Machine (SVM) is used to overcome the problem of bias due to non-random observation and unbalanced covariate. The case used in this research is disease complication in patient of Diabetes Mellitus Type 2 at Regional Public Hospital of Pasuruan with respondent counted 96 patient. The result is obtained of the analysis is the variables that become confounding is a sport activity. The accuracy level of PSS SVM is the same for all strata that is equal to 65.6%. Estimation of treatment effect (ATE) gave the result that the variable of sports activity is a variable that influence the disease complication (Y) in patients of DM type 2. The number of strata that reduce the largest bias is in strata of 4 with the percent bias reduction (PBR) is 86.39% with the smallest standard error value is 0.103 and the estimated value of ATE is 0.597.
Puput Cahya Ambarwati, Indahwati Indahwati, Muhammad Nur Aidi
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 484-497; doi:10.29244/ijsa.v4i3.684

Geographic weighted regression (GWR) is one of the regression methods for spatial data. GWR with the response variable following the poisson distribution can use the geographic weighted poisson regression (GWPR). GWPR often does not complete the assumption of dispersion. The classic approach commonly used to overcome overdispersion is related to poisson distribution, which is the approach obtained from poisson and gamma distribution which is similar to negative binomial distribution function. GWR for the response variable following the negative binomial distribution can use the geographical weighted negative binomial regression (GWNBR). The data used in this study are simulation data and real data. The results of the simulation data are the tolerance limits that are still precisely modeled with GWPR are overdispersion approaching 1 based on significant amount and average p-value.. The results of research from real data, the GWNBR is the best model for overdispersion cases in malnourished children in East Java Province in 2017 compared to the GWPR based on comparison of the values ​​of AIC.
Tata Pacu Maulidina, Siskarossa Ika Oktora
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 528-544; doi:10.29244/ijsa.v4i3.690

Development inequality in Indonesia has led the developed and underdeveloped regions. Regional backwardness caused by high inequality must be handled properly to prevent negative impacts on national stability. But in fact, the handling of underdeveloped regions is only effective in Western Indonesia, while in Eastern Indonesia tends to be not optimal. This study aims to explore regional backwardness in Indonesia and examines the factors that influence it. Based on data, underdeveloped regions tend to cluster in eastern Indonesia, and the independent variables have large variations between regions. This indicates dependence and spatial heterogeneity. Therefore, this study applies spatial analysis using the Geographically Weighted Logistic Regression (GWLR) method. GWLR shows better performance in modeling the regional backwardness in Indonesia compared to its global model (binary logistic regression). This study provides a local model for each district/city that can be used by local governments to implement more effective policies based on factors that do have significant effects on regional backwardness.
Yunita Wulan Sari, Gunardi Gunardi
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 557-565; doi:10.29244/ijsa.v4i3.660

Crop insurance is a type of insurance that provides protection to farmers who hold an insurance policy for losses due to crop failure. Extreme weather, especially rainfall, has been the main cause of the crop failure. Therefore, the type of crop insurance based on weather or rainfall must be developed and applied. This paper will discuss the cash-or-nothing up and in barrier option approach for determining insurance premiums where the risk of loss in terms of high rainfall, then compare it to the Black-Scholes option approach. In this approach, the claim limit is based on the rainfall index and the value of the barrier is determined according to the size of the extreme rainfall. We use cumulative rainfall data in the first subround in Sleman regency as a case study. The conclusions obtained are barrier value has a negative effect on the value of insurance premiums and claim limit value has a positive effect. Besides the premium value with this barrier option approach is cheaper than the Black-Scholes option approach, this approach method more interesting to apply because of the barrier value addition.
Lissa Octavia Wardana, Liza Kurnia Sari
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 432-447; doi:10.29244/ijsa.v4i3.616

Every child has human rights to grow and develop as a whole, both physically and mentally. The government prohibits employers from employing children to protect children's rights. In reality, children begin to participate in economic activities as workers. The issue of child labor is very close to exploitation. This research aims to find general facts about exploitation on child laborers and to identify variables that influence exploitation on child laborers in Indonesia in 2018. Data of National Social and Economic Survey (Susenas) in 2018 were analyed through binary logistic regression. The result shows that most of child laborers in 2018 are exploited. Provinces with the highest percentage of child laborers exploitation are DKI Jakarta, Banten, and Central Java. Area of residence, child labor sector, gender of child, and education of household head in the category of junior high school, elementary school, or not graduate from school significantly influence the exploitation of child labor. Child laborers who live in urban areas, male, work in the formal sector, and has a household head who graduate from junior high school or elementary school or doesn’t graduate at all are more likely experience exploitation.
Agustin Faradila, Utami Dyah Syafitri, I Made Sumertajaya
Indonesian Journal of Statistics and Its Applications, Volume 4, pp 419-431; doi:10.29244/ijsa.v4i3.585

Statistics Indonesia (BPS) noted that there has been a decrease in the contribution of the industrial sector to the national GDP even though it had provided a significant multiplier effect on national economic growth. Therefore, it is necessary to cluster the industrial subsector based on its growth patterns so that the optimization of development results can be achieved. Prediction-based clustering is part of time series clustering (TSclust) which aims to form clusters based on prediction characteristics so that it can be used to choose a cluster that will become a mainstay industry in the future. This study focused on applying prediction-based clustering in the large and medium industrial sub-sector for a prediction period of 1 month, 1 quarter, and 1 semester. The data used in this study was the production index data from January 2010 to December 2018. The results showed that the best cluster for 1 month consisted of 5 groups, for 1 quarter consisted of 4 groups and for 1 semester consisted of 2 groups. Thus, it was concluded that the food industry; leather industry, leather goods, and footwear; and the pharmaceutical industry, chemical drug products, and traditional medicine could be chosen to be the mainstay industry in the future.
Page of 3
Articles per Page
Show export options
  Select all
Back to Top Top