Hi Roderick!
I'm developing a k-mer counting Python package for internal usage and I'm using needletail as a backend. While developing it, I noticed that Kmers and CanonicalKmers are inconsistent regarding non-ATCG characters. While Kmers count them, they are skipped by CanonicalKmers (understandably so).
Because of that, my function only uses CanonicalKmers even when counting non-canonical k-mers (I just reverse complement the sequence if canonical boolean is true), which causes additional computational burden.
I don't know if this decision was made by design, but maybe Kmers should include an argument that allows the user to choose whether non-ATCG characters should be ignored.
Thank you for all your work in needletail!
Hi Roderick!
I'm developing a k-mer counting Python package for internal usage and I'm using needletail as a backend. While developing it, I noticed that
KmersandCanonicalKmersare inconsistent regarding non-ATCG characters. WhileKmerscount them, they are skipped byCanonicalKmers(understandably so).Because of that, my function only uses
CanonicalKmerseven when counting non-canonical k-mers (I just reverse complement the sequence ifcanonicalboolean is true), which causes additional computational burden.I don't know if this decision was made by design, but maybe
Kmersshould include an argument that allows the user to choose whether non-ATCG characters should be ignored.Thank you for all your work in needletail!