If we have unexpectedly large amounts of error k-mers from e.g. the ends of bad FASTQ reads, it can throw distances way off unless the user's setting --min-abun-filter manually. A good way to deal with this would be to allow masking out bad quality score data. A default of filtering all bases below quality 20 (the "bad" cutoff in FastQC; 4 in modern FASTQs, I think?) might also be a good idea?
If we have unexpectedly large amounts of error k-mers from e.g. the ends of bad FASTQ reads, it can throw distances way off unless the user's setting
--min-abun-filtermanually. A good way to deal with this would be to allow masking out bad quality score data. A default of filtering all bases below quality 20 (the "bad" cutoff in FastQC;4in modern FASTQs, I think?) might also be a good idea?