The standard ContigSequence implementation, which is an Array of bases.
A ContigSequence implementation that uses a Map to store only a subset of bases.
Load a ReferenceBroadcast, caching the result.
Load a ReferenceBroadcast, caching the result.
Local path to a FASTA file
the spark context
is this a "partial fasta"? Partial fastas are used in tests to load only a subset of the reference. In production runs this will usually be false.
ReferenceBroadcast which maps contig/chromosome names to broadcasted sequences
Read a regular fasta file
Read a regular fasta file
local path to fasta
the spark context
a ReferenceBroadcast instance containing ArrayBackedReferenceSequence objects.
Read a "partial fasta" from the given path.
Read a "partial fasta" from the given path.
A "partial fasta" is a fasta file where the reference names look like "chr1:9242255-9242454/249250621". That gives the contig name, the start and end locus, and the total contig size. The associated sequence in the file gives the reference sequence for just the sites between start and end.
Partial fastas are used for testing to avoid distributing full reference genomes.
local path to partial fasta
the spark context
a ReferenceBroadcast instance containing MapBackedReferenceSequence objects