Skip to content

Avoiding training-test contamination in LOOCV using default reference tree #415

@Abelcanc3rhack3r

Description

@Abelcanc3rhack3r

Hi Robyn,
I am evaluating the accuracy of PICRUST2 on my custom trait table
When evaluating PICRUSt2 SC accuracy with a custom trait table, what is the recommended procedure for a rigorous Leave-One-Out Cross-Validation (LOOCV) when the test sequences are already present in the default reference tree?

To ensure no "training-test contamination," I am deciding between two approaches:

Table Removal only: Keep the species/tip in the default reference tree but remove the corresponding entry from the trait table (-i input).

Tree Pruning + Table Removal: Prune the species/tip from the reference tree entirely and remove it from the trait table, then re-place the sequence using place_seqs.py.

Is pruning the tree necessary to accurately simulate the PICRUSt2 pipeline's performance on novel sequences?

Thank you,

Abel Tan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions