Skip to content

How to compare k-mer completeness between phased genomes #166

@desmodus1984

Description

@desmodus1984

Hi,
My group used PacBio HiFi reads to assemble a genome, and then we further scaffolded it with Hi-C data, and got the pseudo-haplotype and two phased genomes.
I first ran the k-mer QV analysis for the pseudo-haploid since it was the one that was released to us first, and I got yesterday the phased.
I read somewhere about shared/private k-mer plots, and I would appreciate if you could tell me how to do this.
I don't have parents, since we work with an elusive species, and finding parent-offspring pairs is very hard.

I am running the regular merqury analysis of the two genomes with code:

merqury.sh Hlep.meryl hap1.fasta hap2.fasta Phased-haps

and I wondered about this haplotype-level analysis, since it is mentioned right-away in the log:

Merqury -haps - Job started: Tue Sep 16 13:55:49 EDT 2025
read: Hlep.meryl

No haplotype dbs provided.
Running Merqury in non-trio mode...

asm1: hap1.fasta
asm2: hap2.fasta
out : Phased-haps

Any help and tips very appreciated.

I also wanted to inquire about one idea.
I am interested/concerned/curious about improvements of genome assemblies. As I mentioned, I QV'ed the scaffolded/non-scaffolded genomes, and I was surprised of the result:
final_assembly 5521 2454370272 69.7014 1.07117e-07
HLep.fx 5521 2454370312 69.7014 1.07117e-07
Both 11042 4908740584 69.7014 1.07117e-07
The QV and error rates of them are exactly the same. Isn't it too unexpected?
This surprises me because the scaffolded genome (final_assembly) is a little longer (2900 bp), has more contigs/less scaffolds, but more and bigger gaps. With these changes, I was expecting the scaffolded assembly to have lower QV but I was wrong.
Also, I wanted to inquire about the impact of read-error correction + genome polishing. Many studies mention genome polishing but not read polishing. For instance, I was thinking on using DeepConsensus (https://www.nature.com/articles/s41587-022-01435-7) to correct the reads, which should increase the number of solid kmers, and then polish the genome.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions