Hi,
My group used PacBio HiFi reads to assemble a genome, and then we further scaffolded it with Hi-C data, and got the pseudo-haplotype and two phased genomes.
I first ran the k-mer QV analysis for the pseudo-haploid since it was the one that was released to us first, and I got yesterday the phased.
I read somewhere about shared/private k-mer plots, and I would appreciate if you could tell me how to do this.
I don't have parents, since we work with an elusive species, and finding parent-offspring pairs is very hard.
I am running the regular merqury analysis of the two genomes with code:
merqury.sh Hlep.meryl hap1.fasta hap2.fasta Phased-haps
and I wondered about this haplotype-level analysis, since it is mentioned right-away in the log:
Merqury -haps - Job started: Tue Sep 16 13:55:49 EDT 2025
read: Hlep.meryl
No haplotype dbs provided.
Running Merqury in non-trio mode...
asm1: hap1.fasta
asm2: hap2.fasta
out : Phased-haps
Any help and tips very appreciated.
I also wanted to inquire about one idea.
I am interested/concerned/curious about improvements of genome assemblies. As I mentioned, I QV'ed the scaffolded/non-scaffolded genomes, and I was surprised of the result:
final_assembly 5521 2454370272 69.7014 1.07117e-07
HLep.fx 5521 2454370312 69.7014 1.07117e-07
Both 11042 4908740584 69.7014 1.07117e-07
The QV and error rates of them are exactly the same. Isn't it too unexpected?
This surprises me because the scaffolded genome (final_assembly) is a little longer (2900 bp), has more contigs/less scaffolds, but more and bigger gaps. With these changes, I was expecting the scaffolded assembly to have lower QV but I was wrong.
Also, I wanted to inquire about the impact of read-error correction + genome polishing. Many studies mention genome polishing but not read polishing. For instance, I was thinking on using DeepConsensus (https://www.nature.com/articles/s41587-022-01435-7) to correct the reads, which should increase the number of solid kmers, and then polish the genome.
Hi,
My group used PacBio HiFi reads to assemble a genome, and then we further scaffolded it with Hi-C data, and got the pseudo-haplotype and two phased genomes.
I first ran the k-mer QV analysis for the pseudo-haploid since it was the one that was released to us first, and I got yesterday the phased.
I read somewhere about shared/private k-mer plots, and I would appreciate if you could tell me how to do this.
I don't have parents, since we work with an elusive species, and finding parent-offspring pairs is very hard.
I am running the regular merqury analysis of the two genomes with code:
and I wondered about this haplotype-level analysis, since it is mentioned right-away in the log:
Any help and tips very appreciated.
I also wanted to inquire about one idea.
I am interested/concerned/curious about improvements of genome assemblies. As I mentioned, I QV'ed the scaffolded/non-scaffolded genomes, and I was surprised of the result:
final_assembly 5521 2454370272 69.7014 1.07117e-07
HLep.fx 5521 2454370312 69.7014 1.07117e-07
Both 11042 4908740584 69.7014 1.07117e-07
The QV and error rates of them are exactly the same. Isn't it too unexpected?
This surprises me because the scaffolded genome (final_assembly) is a little longer (2900 bp), has more contigs/less scaffolds, but more and bigger gaps. With these changes, I was expecting the scaffolded assembly to have lower QV but I was wrong.
Also, I wanted to inquire about the impact of read-error correction + genome polishing. Many studies mention genome polishing but not read polishing. For instance, I was thinking on using DeepConsensus (https://www.nature.com/articles/s41587-022-01435-7) to correct the reads, which should increase the number of solid kmers, and then polish the genome.