Compute a PositionCoverage for every position in @loci, allowing a half-window of @halfWindowSize.
Compute a PositionCoverage for every position in @loci, allowing a half-window of @halfWindowSize.
count bases as contributing coverage to a window that extends this many loci in either direction.
Spark Broadcast of a set of loci to compute depths for.
If true, emit (locus, 1) tuples for every region-base before letting Spark do map-side, then reduce-side, reductions. Otherwise, traverse regions, emitting (locus, depth) tuples for all regions in a partition that overlap a current locus, effectively folding the map-side-reduction into application code, as an optimization.
RDD of (Position, Coverage) tuples giving the total coverage, and number of region-starts, at each
genomic position in lociBroadcast
.
Break the input @loci into smaller LociSets such that the number of regions (with a @halfWindowSize grace-window) overlapping each set is ≤ @maxRegionsPerPartition.
Break the input @loci into smaller LociSets such that the number of regions (with a @halfWindowSize grace-window) overlapping each set is ≤ @maxRegionsPerPartition.
First obtains the "coverage" RDD, then takes regions greedily, meaning the end of each partition of the coverage-RDD will tend to have a "remainder" LociSet that has ≈half the maximum regions per partition.
Compute the depth at each locus in @rdd, then group loci into runs that are uniformly below (true) or above (false)
depthCutoff
.
Compute the depth at each locus in @rdd, then group loci into runs that are uniformly below (true) or above (false)
depthCutoff
.
Useful for getting a sense of which parts of the genome have exceedingly high coverage.
see coverage.
see coverage.
separate runs of loci that are uniformly below (or equal to) vs. above (>) this cutoff.
RDD whose elements have:
depthCutoff
, anddepthCutoff
, as described
above.
Compute the coverage-depth at each locus, then aggregate loci into runs that are all above or below depthCutoff
.
Compute the coverage-depth at each locus, then aggregate loci into runs that are all above or below depthCutoff
.
tuple containing an RDD returned by partitionDepths as well as the total numbers of loci with depth
below (or equal to) depthCutoff
(resp. above depthCutoff
).
Augment an RDD[ReferenceRegion] with methods for computing coverage depth, .