Skip to content

ZamAI-ORG/pashto-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Pashto Datasets

Curated and processed Pashto datasets published by ZamAI Labs.
This repository includes custom cleaning, normalization, and consolidation work, plus documentation and attribution to original sources.

Repository layout

  • DATASETS/ contains dataset folders. Each dataset includes:
    • SOURCE.md (where it came from)
    • LICENSE.md (original license or terms, if applicable)
    • raw/ (optional; small samples for testing when permitted)
    • processed/ (normalized/cleaned outputs ready for training)
    • notes.md (what ZamAI changed)

Sources & attribution

All datasets in this repository originate from sources that allow redistribution. Each dataset folder includes source links and license/terms to preserve attribution and compliance.

Raw data note

Some datasets include small raw/ samples used for testing and validation. Processed datasets live under processed/.

About

Curated and processed Pashto datasets for ZamAI Labs (with source attribution and dataset documentation).

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors