Question about SNP sites retained during abundance estimation

Hi:
While examining the abundance estimation step in the `compute_abundances_all.py`, I noticed that SNP sites are currently filtered such that only positions with observed ALT reads in the sample are retained:

```
var_reads = pd.merge(df_read_counts, df_AF,
                        left_on=['position', 'ref', 'base', 'chrom'],
                        right_on=['POS', 'REF', 'ALT', 'CHROM'],
                        how='inner')

ref_reads = pd.merge(df_read_counts, df_AF,
                        left_on=['position', 'ref', 'base', 'chrom'],
                        right_on=['POS', 'REF', 'REF', 'CHROM'],
                        how='inner')

merged_ref_var = pd.merge(ref_reads.iloc[:, :5], var_reads.iloc[:, :5], on=['chrom','position'], how='inner')
```
However, all SNP sites observed in the sample—whether showing only REF reads or including ALT reads—can provide information. In particular, sites with only REF reads in the sample may still carry information about other strains that have ALT alleles at that position.   
Is this filtering intentional, or could it be a potential bug?  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about SNP sites retained during abundance estimation #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about SNP sites retained during abundance estimation #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions