Skip to content

repo_labor table taking up over half of the available database space #3736

@MoralCode

Description

@MoralCode

I notice that the repo_labor table in our db is consuming 1.3TB according to dbeaver.

from the disk usage measured by sudo df -h, the whole database is 2.3TB.

This table seems to be where the scc metrics get written. I can understand why they might be big if this table is tracking the lines of code and other metrics in every file for every change of every tracked repository.

But since we have several tables representing files (pull_request_files, and eventually commit_files #3682 ) already, why cant we store this data in a way that has a foreign key referring to existing file entries, deduplicating this table to reduce its space usage as much as possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    databaseRelated to Augur's unifed data modeltech debt

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions