org.hammerlab.guacamole.distributed
For a given partition, step through its loci and the reads overlapping each one, applying an arbitrary function and returning its emitted objects.
For a given partition, step through its loci and the reads overlapping each one, applying an arbitrary function and returning its emitted objects.
result data type
this partition's regions, split by sample
this partition's loci
a margin within which reads are considered to effectively overlap a locus
function that maps a contig's loci and regions to a sequence of result objects.
Iterator[T] collected from each contig
FlatMap across loci, and any number of RDDs of regions, where at each locus the provided function is passed a sliding window instance for each RDD containing the regions overlapping an interval of halfWindowSize to either side of a locus.
FlatMap across loci, and any number of RDDs of regions, where at each locus the provided function is passed a sliding window instance for each RDD containing the regions overlapping an interval of halfWindowSize to either side of a locus.
This function supports maintaining some state from one locus to another within a task. The state maintained is of type S. The user function will receive the current state in addition to the sliding windows, and returns a pair of (new state, result data). The state is initialized to initialState for each task, and for each new contig handled by a single task.
region data type (e.g. MappedRead)
result data type
state type
number of samples / input-files whose reads are in @partitionedReads.
partitioned reads RDD; reads that straddle partition boundaries will occur more than once herein.
If True, then the function will only be called on loci where at least one region maps within a window around the locus. If False, then the function will be called at all loci in lociPartitions.
if another region overlaps a halfWindowSize to either side of a locus under consideration, then it is included.
initial state to use for each task and each contig analyzed within a task.
function to flatmap, of type (state, sliding windows) -> (new state, result data)
RDD[T] of flatmap results