Perform peak-calling on a set of fragments or an ArchR Project

callOpenTiles is the main peak-calling function in MOCHA that serves as a wrapper function to call peaks provided a set of fragment files and an ArchR Project for meta-data purposes

Usage

callOpenTiles(
  ATACFragments,
  cellColData,
  blackList,
  genome,
  cellPopLabel,
  cellPopulations = "ALL",
  studySignal = NULL,
  generalizeStudySignal = FALSE,
  cellCol = "RG",
  TxDb,
  OrgDb,
  outDir,
  numCores = 30,
  verbose = FALSE,
  force = FALSE
)

.callOpenTiles_default(
  ATACFragments,
  cellColData,
  blackList,
  genome,
  cellPopLabel,
  cellPopulations = "ALL",
  studySignal = NULL,
  generalizeStudySignal = FALSE,
  cellCol = "RG",
  TxDb,
  OrgDb,
  outDir,
  numCores = 30,
  verbose = FALSE,
  force = FALSE
)

# S4 method for GRangesList
callOpenTiles(
  ATACFragments,
  cellColData,
  blackList,
  genome,
  cellPopLabel,
  cellPopulations = "ALL",
  studySignal = NULL,
  generalizeStudySignal = FALSE,
  cellCol = "RG",
  TxDb,
  OrgDb,
  outDir,
  numCores = 30,
  verbose = FALSE,
  force = FALSE
)

# S4 method for list
callOpenTiles(
  ATACFragments,
  cellColData,
  blackList,
  genome,
  cellPopLabel,
  cellPopulations = "ALL",
  studySignal = NULL,
  generalizeStudySignal = FALSE,
  cellCol = "RG",
  TxDb,
  OrgDb,
  outDir,
  numCores = 30,
  verbose = FALSE,
  force = FALSE
)

.callOpenTiles_ArchR(
  ATACFragments,
  cellPopLabel,
  cellPopulations = "ALL",
  studySignal = NULL,
  generalizeStudySignal = FALSE,
  TxDb,
  OrgDb,
  outDir = NULL,
  numCores = 30,
  verbose = FALSE,
  force = FALSE
)

Arguments

ATACFragments: an ArchR Project, or a GRangesList of fragments. Each GRanges in the GRanges list must have unique cell IDs in the column given by 'cellCol'.
cellColData: A DataFrame containing cell-level metadata. This must contain both a column 'Sample' with unique sample IDs and the column specified by 'cellPopLabel'.
blackList: A GRanges of blacklisted regions
genome: A BSgenome object, or the full name of an installed BSgenome data package, or a short string specifying the name of an NCBI assembly (e.g. "GRCh38", "TAIR10.1", etc...) or UCSC genome (e.g. "hg38", "bosTau9", "galGal6", "ce11", etc...). The supplied short string must refer unambiguously to an installed BSgenome data package. See getBSgenome.
cellPopLabel: string indicating which column in the ArchRProject metadata contains the cell population label.
cellPopulations: vector of strings. Cell subsets for which to call peaks. This list of group names must be identical to names that appear in the ArchRProject metadata. Optional, if cellPopulations='ALL', then peak calling is done on all cell populations in the ArchR project metadata. Default is 'ALL'.
studySignal: The median signal (number of fragments) in your study. If not set, this will be calculated using the input ArchR project but relies on the assumption that the ArchR project encompasses your whole study (i.e. is not a subset).
generalizeStudySignal: If `studySignal` is not provided, calculate the signal as the mean of the mean & median number of fragments for of individual samples within each cell population. This may improve MOCHA's ability to generalize to datasets with XXXXXX #TODO. Default is FALSE, use the median number of fragments.
cellCol: The column in cellColData specifying unique cell ids or barcodes. Default is "RG", the unique cell identifier used by ArchR.
TxDb: The exact package name of a TxDb-class transcript annotation package for your organism (e.g. "TxDb.Hsapiens.UCSC.hg38.refGene"). This must be installed. See Bioconductor AnnotationData Packages.
OrgDb: The exact package name of a OrgDb-class genome wide annotation package for your organism (e.g. "org.Hs.eg.db"). This must be installed. See Bioconductor AnnotationData Packages
outDir: is a string describing the output directory for coverage files. Must be a complete directory string. With ArchR input, set outDir to NULL to create a directory within the input ArchR project directory named MOCHA for saving files.
numCores: integer. Number of cores to parallelize peak-calling across multiple cell populations.
verbose: Set TRUE to display additional messages. Default is FALSE.
force: Optional, whether to force creation of coverage files if they already exist. Default is FALSE.

Value

tileResults A MultiAssayExperiment object containing ranged data for each tile

Examples

if (FALSE) {
# Starting from an ArchR Project:
tileResults <- MOCHA::callOpenTiles(
  ArchRProj = myArchRProj,
  cellPopLabel = "celltype_labeling",
  cellPopulations = "CD4",
  TxDb = "TxDb.Hsapiens.UCSC.hg38.refGene",
  OrgDb = "org.Hs.eg.db",
  numCores = 1
)
}
# \donttest{
# Starting from GRangesList:
if (
  requireNamespace("BSgenome.Hsapiens.UCSC.hg19") &&
  requireNamespace("TxDb.Hsapiens.UCSC.hg38.refGene") &&
  requireNamespace("org.Hs.eg.db")
) {
  tiles <- MOCHA::callOpenTiles(
    ATACFragments = MOCHA::exampleFragments,
    cellColData = MOCHA::exampleCellColData,
    blackList = MOCHA::exampleBlackList,
    genome = "BSgenome.Hsapiens.UCSC.hg19",
    TxDb = "TxDb.Hsapiens.UCSC.hg38.refGene",
    OrgDb = "org.Hs.eg.db",
    outDir = tempdir(),
    cellPopLabel = "Clusters",
    cellPopulations = c("C2", "C5"),
    numCores = 1
  )
}
#> Loading required namespace: BSgenome.Hsapiens.UCSC.hg19
#> Warning: no function found corresponding to methods exports from ‘BSgenome’ for: ‘releaseName’
#> 
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> 
#> Attaching package: ‘S4Vectors’
#> The following objects are masked from ‘package:base’:
#> 
#>     I, expand.grid, unname
#> 
#> Attaching package: ‘Biostrings’
#> The following object is masked from ‘package:base’:
#> 
#>     strsplit
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Warning: GRanges object contains 1538 out-of-bound ranges located on sequences
#>   chr1 and chr2. Note that ranges located on a sequence whose length is
#>   unknown (NA) or on a circular sequence are not considered out-of-bound
#>   (use seqlengths() and isCircular() to get the lengths and circularity
#>   flags of the underlying sequences). You can use trim() to trim these
#>   ranges. See ?`trim,GenomicRanges-method` for more information.
#> Warning: GRanges object contains 2968 out-of-bound ranges located on sequences
#>   chr1 and chr2. Note that ranges located on a sequence whose length is
#>   unknown (NA) or on a circular sequence are not considered out-of-bound
#>   (use seqlengths() and isCircular() to get the lengths and circularity
#>   flags of the underlying sequences). You can use trim() to trim these
#>   ranges. See ?`trim,GenomicRanges-method` for more information.
#> Warning: GRanges object contains 2682 out-of-bound ranges located on sequences
#>   chr1 and chr2. Note that ranges located on a sequence whose length is
#>   unknown (NA) or on a circular sequence are not considered out-of-bound
#>   (use seqlengths() and isCircular() to get the lengths and circularity
#>   flags of the underlying sequences). You can use trim() to trim these
#>   ranges. See ?`trim,GenomicRanges-method` for more information.
#> Warning: GRanges object contains 5044 out-of-bound ranges located on sequences
#>   chr1 and chr2. Note that ranges located on a sequence whose length is
#>   unknown (NA) or on a circular sequence are not considered out-of-bound
#>   (use seqlengths() and isCircular() to get the lengths and circularity
#>   flags of the underlying sequences). You can use trim() to trim these
#>   ranges. See ?`trim,GenomicRanges-method` for more information.
# }