This dissertation introduces an integrated computational framework for marker gene discovery and cell-type classification in transcriptomics, with an emphasis on spatial transcriptomics (ST). Leveraging machine learning and explainable AI (XAI), it presents three main contributions: (1) a CNN model using Class Activation Mapping (CAM) to identify interpretable marker genes in OSCC bulk RNA-seq data; (2) the Gene Spatial Integration (GSI) pipeline, which employs an autoencoder to combine spatial and gene expression features for improved ST clustering and batch effect correction; and (3) an extended GSI-based framework for marker gene authentication using ensemble classifiers and SHAP.
The CNN model demonstrates high diagnostic performance and interpretability. GSI outperforms conventional methods in clustering and integration tasks across DLPFC and Mouse Brain datasets. The final component authenticates clustering labels and extracts spatially consistent marker genes, revealing both shared and unique markers across Seurat, GraphST, and GSI clusters.
Overall, this work demonstrates the value of spatial information and XAI in enhancing interpretability, reliability, and biological relevance in transcriptomic analysis, with broad applications in diagnostics and precision medicine.