Title: | An Automated Cell Type Annotation Tool for Single-Cell RNA-Sequencing Data |
---|---|
Description: | An entirely data-driven cell type annotation tools, which requires training data to learn the classifier, but not biological knowledge to make subjective decisions. It consists of three steps: preprocessing training and test data, model fitting on training data, and cell classification on test data. See Xiangling Ji,Danielle Tsao, Kailun Bai, Min Tsao, Li Xing, Xuekui Zhang.(2022)<doi:10.1101/2022.02.19.481159> for more details. |
Authors: | Xiangling Ji [aut], Danielle Tsao [aut], Kailun Bai [ctb], Min Tsao [aut], Li Xing [aut], Xuekui Zhang [aut, cre] |
Maintainer: | Xuekui Zhang <[email protected]> |
License: | GPL-3 |
Version: | 0.3 |
Built: | 2025-02-07 05:30:32 UTC |
Source: | https://github.com/cran/scAnnotate |
calculate the F1 score of each cell population, mean of F1 score and overall accuracy
eva_cal(prediction, cell_label)
eva_cal(prediction, cell_label)
prediction |
A vector of annotate cell type labels |
cell_label |
A vector of original cell type labels |
A matrix contain the F1 score of each cell population, mean of F1 score and overall accuracy
data(predict_label) data(pbmc2) eva_cal(prediction = predict_label,cell_label = pbmc2[,1])
data(predict_label) data(pbmc2) eva_cal(prediction = predict_label,cell_label = pbmc2[,1])
A subset of human Peripheral Blood Mononuclear Cells (PBMC) scRNA-seq data that was sequenced using Drop-seq platform. The Seurat(version 4.0.5) package was used for normalized using the NormalizeData function with the "LogNormalize" method and a scale factor of 10,000. After modeling the mean-variance relationship with the FindVariableFeautre function within "vst" methods, we selected the top 2,000 highly variable genes and only used this selection going forward. The dataframe of the cell type label and a gene expression matrix of 598 cells in the row and 2,000 genes in the column.
data(pbmc1, package="scAnnotate")
data(pbmc1, package="scAnnotate")
a data frame
Ding, J.et al.(2019). Systematic comparative analysis of single cellrna-sequencing methods.bioRxiv
A subset of human PBMC scRNA-seq data that was sequenced using inDrops platform. The Seurat(version 4.0.5) package was used for normalized using the NormalizeData function with the "LogNormalize" method and a scale factor of 10,000. After modeling the mean-variance relationship with the FindVariableFeautre function within "vst" methods, we selected the top 2,000 highly variable genes and only used this selection going forward. The dataframe of the cell type label and a gene expression matrix of 644 cells in the row and 2,000 genes in the column.
data(pbmc2, package="scAnnotate")
data(pbmc2, package="scAnnotate")
a data frame
Ding, J.et al.(2019). Systematic comparative analysis of single cellrna-sequencing methods.bioRxiv
Cell type annotation of pbmc2 data that training from pbmc1 data by 'scAnnotate'.
data(predict_label, package="scAnnotate")
data(predict_label, package="scAnnotate")
a data frame
Annotate cell type labels of test data using a trained mixture model from training data
scAnnotate( train, test, distribution = c("normal", "dep"), correction = c("auto", "harmony", "seurat"), screening = c("wilcox", "t.test"), threshold = 0, lognormalized = TRUE )
scAnnotate( train, test, distribution = c("normal", "dep"), correction = c("auto", "harmony", "seurat"), screening = c("wilcox", "t.test"), threshold = 0, lognormalized = TRUE )
train |
A data frame of cell type label in the first column and a gene expression matrix where each row is a cell and each column is a gene from training data |
test |
A data matrix where each row is a cell and each column is a gene from test data |
distribution |
A character string indicates the distribution assumption on positive gene expression, which should be one of "normal"(default) or "dep". "dep" refers to depth measure, which is a non-parametric distribution estimation approach. |
correction |
A character string indicates the batch effect removal, which should be one of "auto"(default), "seurat", or "harmony". "auto" will automatically select the batch effect removal to follow our suggestion. That uses Seurat for dataset with at most one rare cell population (at most one cell population less than 100 cells) and Harmony for dataset with at least two rare cell populations (at least two cell populations less than 100 cells). |
screening |
A character string indicates the gene screening methods, which should be one of "wilcox"(default) or "t.test". |
threshold |
A numeric number indicates the threshold used for probabilities to classify cells, which should be a number from "0"(default) to "1". If there's no probability higher than the threshold associated with a cell type, the cell will be labeled as "unassigned." |
lognormalized |
A logical string indicates if both input data are log-normalized or raw matrix. TRUE (default) indicates input data are log-normalized, and FALSE indicates input data are raw data. |
A vector contain annotate cell type labels for test data
data(pbmc1) data(pbmc2) ## Not run: predict_label=scAnnotate(train=pbmc1, test=pbmc2[,-1], distribution="normal", correction ="harmony", screening ="wilcox", threshold=0, lognormalized=TRUE) ## End(Not run)
data(pbmc1) data(pbmc2) ## Not run: predict_label=scAnnotate(train=pbmc1, test=pbmc2[,-1], distribution="normal", correction ="harmony", screening ="wilcox", threshold=0, lognormalized=TRUE) ## End(Not run)