Abstract
Cluster analysis is a principal approach to discover unknown tumor subtypes. Innovative and effective cluster analysis methods are of great significance for tumor diagnosis and malignant tumor treatment. Existing studies on the cluster analysis of tumor gene data generally have defects in aspects such as unsatisfactory performance in clustering high-dimensional and high-noise data, and insufficient accuracy in selecting cluster centers. To overcome these defects, this paper performed cluster analysis on tumor gene data based on an improved Density peaks clustering (DPC) algorithm. At first, this paper elaborated on the composition and storage format of tumor tissue samples used in the experiment, gave the tumor gene expression profile data in the matrix format, and introduced the preprocessing process of gene expression profile data. Then, this paper carried out feature selection of tumor gene expression profile data. At last, this paper innovatively divided the target gene density into two parts of K-nearest neighbor local density and neighborhood density, thereby completing the improvement of conventional DPC algorithm and expanding its application scenarios. Combining with experiment, the clustering results of the algorithm before and after introducing the idea of Approximate Nearest Neighbor (ANN) were given, which had verified the effectiveness of the algorithm proposed in this paper.