Number of partitions to split reads into.
Long >= 1. Number of micro-partitions generated for each of the numPartitions
Spark partitions that will be computed. Higher values of this will result in a
more exact but more expensive computation.
In the extreme, setting this to greater than the number of loci (per partition)
will result in an exact calculation.
Assign loci from a LociSet to partitions, where each partition overlaps approximately the same number of "regions" (reads mapped to a reference genome).
The approach we take is:
(1) chop up the loci uniformly into many genomic "micro partitions."
(2) for each micro partition, calculate the number of regions that overlap it.
(3) using these counts, assign loci to real (Spark) partitions, assuming approximately uniform depth within each micro partition.
Some advantages of this approach are:
accuracy
.LociMap of locus -> partition assignments.