I notice that the repo_labor table in our db is consuming 1.3TB according to dbeaver.
from the disk usage measured by sudo df -h, the whole database is 2.3TB.
This table seems to be where the scc metrics get written. I can understand why they might be big if this table is tracking the lines of code and other metrics in every file for every change of every tracked repository.
But since we have several tables representing files (pull_request_files, and eventually commit_files #3682 ) already, why cant we store this data in a way that has a foreign key referring to existing file entries, deduplicating this table to reduce its space usage as much as possible.