Use of Classification Algorithms in Diagnosis of Hypothyroidism

31 July 2020

journal article
Published by International Journal of Informatics Technologies in Bilişim Teknolojileri Dergisi

Vol. 13 (3), 255-268
https://doi.org/10.17671/gazibtd.710728

Abstract

Tr en Hastalık teşhisi, tıp alanında karşılaşılan en önemli problemlerden biridir. Belirli bir hastalığın farklı türlerinin ve diğer hastalıklarla benzer semptomlarının olması hastalığın teşhisini zorlaştırmaktadır. Tiroit hastalığı çeşitlerinden biri olan hipotiroidi de bu sebeplerle teşhisi geciken ve hastaların yaşam kalitesini düşüren bir hastalıktır. Bu çalışmanın amacı, tanı sürecinde hastalara sorulan soru ve uygulanan test sonuçlarını kullanarak hipotiroidi hastalığının doğru teşhis oranını arttıracak veri madenciliği temelli bir sistem önermektir. Diğer amaç ise dolaylı olarak teşhis için kullanılan girişimsel testlerden oluşabilecek komplikasyonları azaltmaktır. Bu amaçlar doğrultusunda UCI makine öğrenmesi veri tabanında yer alan ve 151 tanesi hipotiroidi geri kalanı hipotiroidi olmayan toplam 3163 örnekten oluşan veri seti kullanılarak yeni örneklerin hipotiroidi olup olmadığı tahmin edilmiştir. Veri setindeki dengesiz dağılımı ortadan kaldırmak için veri setine farklı örnekleme teknikleri uygulanarak Lojistik Regresyon, K En Yakın Komşu ve Destek Vektör Makinesi sınıflandırıcıları ile hipotiroidi hastalığını teşhis edecek modeller oluşturulmuştur. Bu yönüyle, çalışma örnekleme yöntemlerinin hipotiroidi hastalığı teşhisi üzerindeki etkisini göstermiştir. Geliştirilen modeller içinde en yüksek performansı, aşırı örnekleme teknikleri uygulanan veri seti ile eğitilen Lojistik Regresyon sınıflandırıcısı vermiştir. Bu sınıflandırıcı ile elde edilen en iyi sonuçlar; doğruluk oranı için %97.8, F-Skor değeri için %82.26, eğri altında kalan alan için %93.2 ve Matthews korelasyon katsayısı için de %81.8’dir. Disease diagnosis is one of the most important problems encountered in the medical field. Different types of a specific disease and similar symptoms with other diseases make the disease harder to diagnose. For these reasons Hypothyroidism, which is one of the types of thyroid disease, is a disease that decreases patient's quality of life due to the delay in its diagnosis. The purpose of this article is to propose a data mining-based system that will increase the correct diagnosis of hypothyroidism rate by using the question asked to the patients during the diagnosis process, and the test results applied. The other aim is to reduce the complications that may arise from interventional tests used indirectly for diagnosis. For these purposes, it was estimated whether new samples were hypothyroidism by using a data set consisting of 3163 samples in the UCI machine learning database, 151 of which were hypothyroid and the rest without hypothyroidism. In order to deal with the imbalanced class distribution in the data, different sampling techniques were applied to the data set and models to diagnose hypothyroidism with Logistic Regression, K Nearest Neighbor, and Support Vector Machine classifiers were created. With this aspect, the study demonstrated the effect of sampling methods on the diagnosis of hypothyroid disease. Among the developed models, the Logistics Regression classifier, which was trained with the data set applied to the oversampling techniques, gave the highest performance. The best results obtained with this classifier are 97.8% for accuracy rate, 82.26% for F-Score value, 93.2% for area under the curve and 81.8% for Matthews correlation coefficient.

Keywords

This publication has 21 references indexed in Scilit:

Quad-phased data mining modeling for dementia diagnosis
BMC Medical Informatics and Decision Making, 2017
A Hybrid Classification System for Heart Disease Diagnosis Based on the RFRS Method
Computational and Mathematical Methods in Medicine, 2017
Implementation of an optimized classification model for prediction of hypothyroid disease risks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
A FAST INTELLIGENT DIAGNOSIS SYSTEM FOR THYROID DISEASES BASED ON EXTREME LEARNING MACHINE
Anadolu University Journal of Science and Technology-A Applied Sciences and Engineering, 2015
Novel swarm optimization for mining classification rules on thyroid gland data
Information Sciences, 2012
Using data mining techniques in heart disease diagnosis and treatment
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Combining integrated sampling with SVM ensembles for learning from imbalanced datasets
Information Processing & Management, 2011
A comparative study on thyroid disease diagnosis using neural networks
Expert Systems with Applications, 2009
A hybrid medical decision making system based on principles component analysis, k-NN based weighted pre-processing and adaptive neuro-fuzzy inference system
Digital Signal Processing, 2006
Predicting smear negative pulmonary tuberculosis with classification trees and logistic regression: a cross-sectional study
BMC Public Health, 2006

Cited by 8 articles