Shock uses a YAML-based type system to classify nodes. Types are defined in a Types.yaml configuration file that is loaded at server startup.
Types:
- ID: "type_id"
Description: "Human-readable description"
Priority: 0
Data-Types:
- extension1
- extension2| Field | Type | Description |
|---|---|---|
ID |
string | Unique identifier for the type (e.g. "metagenome", "temp", "default") |
Description |
string | Human-readable description of the type |
Priority |
int | Priority value (0 = lowest, 9+ = highest). Used by the migration system to determine which nodes are eligible for remote storage. Locations with a MinPriority setting will only accept nodes at or above that priority. |
Data-Types |
list | Optional list of file extensions associated with this type (e.g. fastq, fasta, bam) |
- Each node in Shock has a
typefield that references a type ID fromTypes.yaml. - The default type is
"basic"if no type is specified at node creation. - The
Priorityfield is central to the data migration system: it determines which nodes get migrated to remote locations. For example, a location withMinPriority: 7will only accept nodes whose type hasPriority >= 7, preventing temporary or low-value files from being stored in expensive remote storage. - The
Data-Typeslist is informational and describes what file formats are expected for nodes of this type.
This example is from test/config.d-minio/Types.yaml:
Types:
- ID: "default"
Description: "default"
Priority: 0
- ID: "temp"
Description: "temporary file"
Priority: 0
- ID: "VM"
Description: "Virtual Machine"
Priority: 1
- ID: "metagenome"
Description: "MG-RAST metagenome"
Priority: 9
Data-Types:
- fa
- fasta
- fastq
- fq
- bam
- sam
- ID: "image"
Description: "image file"
Priority: 1
Data-Types:
- jpeg
- jpg
- gif
- tif
- png
- ID: "cv"
Description: "Controlled Vocabulary"
Priority: 7
- ID: "backup"
Description: "Backup or Dump from another system e.g. MongoDB or MySQL"
Priority: 9
- ID: "metadata"
Description: "metadata"
Priority: 7
- ID: "mixs"
Description: "GSC MIxS Metadata file XLS format"
Priority: 9
Data-Types:
- xls
- xlsx
- json
- ID: "reference"
Description: "reference database"
Priority: 7The /types API endpoint provides information about configured types:
curl -s http://localhost:7445/types/mixs/info | jq .{
"status": 200,
"data": {
"id": "mixs",
"description": "GSC MIxS Metadata file XLS format",
"priority": 9
},
"error": null
}See the Types API documentation for all available endpoints.
Older Shock deployments may use attribute-based types where the type is stored in node attributes rather than the dedicated type field. These data types are conventions rather than enforced schemas -- Shock does no validation of attribute content.
{
"attributes": {
"type": "data-library",
"name": "Solr M5NR",
"version": "1",
"description": "Solr M5NR v1 with Solr v4.10.3",
"member": "1/1",
"project": "production",
"provenance": {
"creation_type": "manual",
"note": "tar -zcvf solr-m5nr_v1_solr_v4.10.3.tgz -C /mnt/m5nr_1/data/index/ ."
}
}
}Required fields:
- type=data-library -- Application scope/name
- name=<string> -- e.g. "M5NR" or "Bowtie index of human genome"
- version=<string> -- Version number, date, or similar
Optional fields:
- member=<string> -- Name for the data library member (e.g. chunk number)
- description=<string> -- Longer description
- file_format=<string> -- File format (fasta, bt2, etc.)
- provenance -- Object describing how the data was created (
clone,workflow, ormanual)