Fit a Gaussian mixture model to the distribution of variant allele frequencies
Fit a Gaussian mixture model to the distribution of variant allele frequencies
RDD of loci with variant allele frequency > 0
Number of Gaussian distributions to fit
Maximum number of iterations to run EM
Largest change in log-likelihood before convergence
GaussianMixtureModel
Generates a count of loci in each variant allele frequency bins
Generates a count of loci in each variant allele frequency bins
RDD of loci with variant allele frequency > 0
Number of bins to group the VAFs into
Map of rounded variant allele frequency to number of loci with that value
Find all non-reference loci in the sample
Find all non-reference loci in the sample
names of underlying samples comprising partitionedReads
partitioned, mapped reads
genome
Percent of non-reference loci to use for descriptive statistics
Minimum read depth before including variant allele frequency
Minimum variant allele frequency to include
Print descriptive statistics for the variant allele frequency distribution
RDD of VariantLocus, which contain the locus and non-zero variant allele frequency