org.hammerlab.guacamole.readsets.rdd

CoverageRDD

class CoverageRDD[R <: ReferenceRegion] extends Serializable

Augment an RDD[ReferenceRegion] with methods for computing coverage depth, .

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. CoverageRDD
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CoverageRDD(rdd: RDD[R])(implicit arg0: ClassTag[R])

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def coverage(halfWindowSize: Int, lociBroadcast: Broadcast[LociSet], explode: Boolean = false): RDD[(Position, Coverage)]

    Compute a PositionCoverage for every position in @loci, allowing a half-window of @halfWindowSize.

    Compute a PositionCoverage for every position in @loci, allowing a half-window of @halfWindowSize.

    halfWindowSize

    count bases as contributing coverage to a window that extends this many loci in either direction.

    lociBroadcast

    Spark Broadcast of a set of loci to compute depths for.

    explode

    If true, emit (locus, 1) tuples for every region-base before letting Spark do map-side, then reduce-side, reductions. Otherwise, traverse regions, emitting (locus, depth) tuples for all regions in a partition that overlap a current locus, effectively folding the map-side-reduction into application code, as an optimization.

    returns

    RDD of (Position, Coverage) tuples giving the total coverage, and number of region-starts, at each genomic position in lociBroadcast.

  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  15. def makeCappedLociSets(halfWindowSize: Int, loci: LociSet, maxRegionsPerPartition: Int, explode: Boolean, trimRanges: Boolean): RDD[LociSet]

    Break the input @loci into smaller LociSets such that the number of regions (with a @halfWindowSize grace-window) overlapping each set is ≤ @maxRegionsPerPartition.

    Break the input @loci into smaller LociSets such that the number of regions (with a @halfWindowSize grace-window) overlapping each set is ≤ @maxRegionsPerPartition.

    First obtains the "coverage" RDD, then takes regions greedily, meaning the end of each partition of the coverage-RDD will tend to have a "remainder" LociSet that has ≈half the maximum regions per partition.

  16. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  19. def partitionDepths(halfWindowSize: Int, lociBroadcast: Broadcast[LociSet], depthCutoff: Int): RDD[((ContigName, Boolean), Long)]

    Compute the depth at each locus in @rdd, then group loci into runs that are uniformly below (true) or above (false) depthCutoff.

    Compute the depth at each locus in @rdd, then group loci into runs that are uniformly below (true) or above (false) depthCutoff.

    Useful for getting a sense of which parts of the genome have exceedingly high coverage.

    halfWindowSize

    see coverage.

    lociBroadcast

    see coverage.

    depthCutoff

    separate runs of loci that are uniformly below (or equal to) vs. above (>) this cutoff.

    returns

    RDD whose elements have:

    • a key consisting of a contig name and a boolean indicating whether loci represented by this element have coverage depth ≤ depthCutoff, and
    • a value indicating the length of a run of loci with depth above or below depthCutoff, as described above.
  20. val sc: SparkContext

  21. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  22. def toString(): String

    Definition Classes
    AnyRef → Any
  23. def validLociCounts(halfWindowSize: Int, lociBroadcast: Broadcast[LociSet], depthCutoff: Int): (RDD[((ContigName, Boolean), NumLoci)], NumLoci, NumLoci)

    Compute the coverage-depth at each locus, then aggregate loci into runs that are all above or below depthCutoff.

    Compute the coverage-depth at each locus, then aggregate loci into runs that are all above or below depthCutoff.

    returns

    tuple containing an RDD returned by partitionDepths as well as the total numbers of loci with depth below (or equal to) depthCutoff (resp. above depthCutoff).

  24. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped