A hybrid feature selection on AIRS method for identifying breast cancer diseases

Abstract
Breast cancer may cause a death due to the late diagnosis. A cheap and accurate tool for early detection of this disease is essential to prevent fatal incidence. In general, the cheap and less invasive method to diagnose the disease could be done by biopsy using fine needle aspirates from breast tissue. However, rapid and accurate identification of the cancer cell pattern from the cell biopsy is still challenging task. This diagnostic tool can be developed using machine learning as a classification problem. The performance of the classifier depends on the interrelationship between sample sizes, some features, and classifier complexity. Thus, the removal of some irrelevant features may increase classification accuracy. In this study, a new hybrid feature selection fast correlation based feature (FCBF) and information gain (IG) was used to select features on identifying breast cancer using AIRS algorithm. The results of 10 times the crossing (CF) of our validation on various AIRS seeds indicate that the proposed method can achieve the best performance with accuracy =0.9797 and AUC=0.9777 at k=6 and seed=50.