CHOIR (clustering hierachy optimization by iterative random forests) is a clustering algorithm for single-cell data. CHOIR applies a framework of permutation tests and random forest classifiers across a hierarchical clustering tree to statistically identify clusters that represent distinct populations.
Citation
CHOIR is on bioRxiv! You can find it here.
Petersen et al., CHOIR improves significance-based detection of cell types and states from single-cell data. bioRxiv (2024)
Installation
CHOIR is designed to be run on Unix-based operating systems such as macOS and linux.
CHOIR installation currently requires remotes
and BiocManager
for installation of GitHub and Bioconductor packages. Run the following commands to install the various dependencies used by CHOIR:
First, install remotes (for installing GitHub packages) if it isn’t already installed: {r, eval = FALSE} if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")
Then, install BiocManager (for installing bioconductor packages) if it isn’t already installed: {r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
Then, install CHOIR: {r, eval = FALSE} remotes::install_github("corceslab/CHOIR", ref="main", repos = BiocManager::repositories(), upgrade = "never")
Notes:
- Installation should complete in under 2 minutes.
- This package is supported for macOS and Linux.
- CHOIR depends heavily on the Seurat package, which has been undergoing many changes in recent months. It has been tested successfully with Seurat version 4.3.0 and 5.0.1.
- Other package dependencies can be found in the “DESCRIPTION” file.
Usage
Please follow the vignette. The vignette takes less than 10 minutes to run on a standard laptop.
How CHOIR works
CHOIR is a hierarchical clustering algorithm that uses permutation testing for cluster identification by statistical inference.
CHOIR identifies clusters that should be merged by applying a permutation test approach to assess the accuracy of random forest classifiers in predicting cluster assignments from a normalized feature matrix.
CHOIR constructs and iteratively prunes a hierarchical clustering tree using statistical inference to prevent underclustering and overclustering.
CHOIR is developed and maintained by the Corces Lab at the Gladstone Institutes.