Repository logo
 
Publication

Parallel clustering of single cell transcriptomic data with split-merge sampling on Dirichlet process mixtures

dc.contributor.authorDuan, Tiehang
dc.contributor.authorPinto, José P.
dc.contributor.authorXie, Xiaohui
dc.date.accessioned2019-04-11T19:30:20Z
dc.date.available2019-04-11T19:30:20Z
dc.date.issued2018-12-25
dc.description.abstractMotivation: With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to understanding many biological processes. While state-of-the-art clustering methods have been applied to the data, they face challenges in the following aspects: (1) the clustering quality still needs to be improved; (2) most models need prior knowledge on number of clusters, which is not always available; (3) there is a demand for faster computational speed. Results: We propose to tackle these challenges with Parallel Split Merge Sampling on Dirichlet Process Mixture Model (the Para-DPMM model). Unlike classic DPMM methods that perform sampling on each single data point, the split merge mechanism samples on the cluster level, which significantly improves convergence and optimality of the result. The model is highly parallelized and can utilize the computing power of high performance computing (HPC) clusters, enabling massive clustering on huge datasets. Experiment results show the model outperforms current widely used models in both clustering quality and computational speed. Availability: Source code is publicly available on https://github.com/tiehangd/Para_DPMM/tree/master/Para_DPMM_packagept_PT
dc.description.sponsorshipNSF DMS1763272 IIS-1715017 Simons Foundation 594598pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.doi10.1093/bioinformatics/bty702pt_PT
dc.identifier.issn1367-4803
dc.identifier.urihttp://hdl.handle.net/10400.1/12472
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherOxford University Presspt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.titleParallel clustering of single cell transcriptomic data with split-merge sampling on Dirichlet process mixturespt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.citation.endPage961pt_PT
oaire.citation.issue6pt_PT
oaire.citation.startPage953pt_PT
oaire.citation.titleBioinformaticspt_PT
oaire.citation.volume35pt_PT
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
bty702.pdf
Size:
5.82 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.46 KB
Format:
Item-specific license agreed upon to submission
Description: