Cell subpopulation detection using clustering scRNAseq data

Monika Krzak, "Mauro Picone", Naples (part of the Statistics Seminar Series)

Understanding cellular heterogeneity is essential for many applications in biomedical and clinical research including cancer and developmental biology. Nowadays, single-cell RNA-sequencing (scRNAseq) technology allows profiling gene expression of individual cells guiding the identification of known or novel cell subpopulations by means of computational methods.  Nonetheless, the accurate quantification of intracellular heterogeneity is still challenging given technical and biological bias present in scRNAseq data. Currently, there are several available methods for cell population detection based on various clustering techniques and dimension reduction strategies. Such discrepancies across the methods trigger performance of clustering, especially under various data-driven scenarios. Currently, there are a limited amount of studies that evaluate methods with respect to data-driven features like dimensionality, number of putative cell populations or the presence of noise. Moreover, with a growing class of single-cell RNAseq protocols which allow profiling thousands of cells, more accurate and fast methods emerge. For this reason, our current research activity is to provide a comprehensive evaluation of the methods on a set of simulated and real datasets to provide guidance in the usage of methods with the most favorable parameter settings. However, the main aim of our work will be focused on developing a new clustering technique that preferentially overcomes most of the limitations of the above methods.