Skip to content

Implement --dump-all-indexes option#1

Open
barryhunter wants to merge 5 commits intomasterfrom
feat/dump-all-indexes
Open

Implement --dump-all-indexes option#1
barryhunter wants to merge 5 commits intomasterfrom
feat/dump-all-indexes

Conversation

@barryhunter
Copy link
Copy Markdown
Owner

This commit introduces a new command-line option --dump-all-indexes to indexdump.php.

When this option is used, the script will:

  1. Connect to the Manticore/Sphinx server.
  2. Execute SHOW TABLES to retrieve a list of all available indexes.
  3. Create a new directory named index_dumps_YYYY-MM-DD (e.g., index_dumps_2023-10-27) if it doesn't already exist.
  4. For each index found: a. Dump the schema and data to a new file named index_name.sql within the dated directory. b. The existing options like --schema=0/1 or --data=0/1 and --lock=0/1 are respected for each individual dump.
  5. The script provides feedback on its progress, including which index is currently being dumped and where it's being saved.
  6. Error handling is implemented to catch issues like connection failures, inability to create directories, and errors during individual index dumps. If an error occurs while dumping a specific index, the script will report it and continue with the next index.
  7. A summary is printed at the end, indicating the total number of indexes processed, how many succeeded, and how many failed.

To achieve this, the main dumping logic was refactored into a dump_index() function, which can now direct its output to a specified file handle. This function is used by both the original single-index dump mode (outputting to STDOUT) and the new --dump-all-indexes mode.

The help text and internal comments have been updated to reflect these changes.

google-labs-jules bot and others added 5 commits May 28, 2025 16:31
This commit introduces a new command-line option `--dump-all-indexes` to `indexdump.php`.

When this option is used, the script will:
1. Connect to the Manticore/Sphinx server.
2. Execute `SHOW TABLES` to retrieve a list of all available indexes.
3. Create a new directory named `index_dumps_YYYY-MM-DD` (e.g., `index_dumps_2023-10-27`) if it doesn't already exist.
4. For each index found:
    a. Dump the schema and data to a new file named `index_name.sql` within the dated directory.
    b. The existing options like `--schema=0/1` or `--data=0/1` and `--lock=0/1` are respected for each individual dump.
5. The script provides feedback on its progress, including which index is currently being dumped and where it's being saved.
6. Error handling is implemented to catch issues like connection failures, inability to create directories, and errors during individual index dumps. If an error occurs while dumping a specific index, the script will report it and continue with the next index.
7. A summary is printed at the end, indicating the total number of indexes processed, how many succeeded, and how many failed.

To achieve this, the main dumping logic was refactored into a `dump_index()` function, which can now direct its output to a specified file handle. This function is used by both the original single-index dump mode (outputting to STDOUT) and the new `--dump-all-indexes` mode.

The help text and internal comments have been updated to reflect these changes.
This commit significantly enhances the --tsv command-line option in `indexdump.php` to provide you with more flexible Tab-Separated Values (TSV) output.

Key changes:

1.  **Dual Mode for `--tsv`:**
    *   **Single Index Dump:** When dumping a single index, `--tsv=filename.tsv` can be used to output the data for that index to the specified `filename.tsv`. The main SQL output (schema and INSERTs) continues to go to STDOUT or its designated SQL output.
    *   **`--dump-all-indexes` Mode:** When used with `--dump-all-indexes`, specifying just `--tsv` (as a flag, no filename needed) will now generate a separate `index_name.tsv` file for each dumped index. These `.tsv` files are created within the dated output directory (e.g., `index_dumps_YYYY-MM-DD/`). This is in *addition* to the standard `.sql` file for each index, which still contains the schema and SQL INSERT statements.

2.  **`dump_index` Function Update:**
    *   The core `dump_index` function was modified to accept an optional `$tsv_output_path` parameter.
    *   If this path is provided, the function writes the index's data (including a header row of column names) to the specified TSV file. This TSV generation is independent of and additional to its primary SQL output.

3.  **Integration:**
    *   Argument parsing for `--tsv` was updated to recognize its use as both a flag and an option requiring a value.
    *   The logic for both single index dumps and the `--dump-all-indexes` loop was updated to correctly determine and pass the appropriate TSV output path (or null) to the `dump_index` function.

4.  **Error Handling & Feedback:**
    *   Robust error handling for TSV file operations (opening, writing headers, writing rows) has been implemented within `dump_index`. These errors are reported to STDERR and do not interrupt the SQL dump process.
    *   Help text and internal comments have been updated to reflect the new TSV functionality.
    *   Success messages now also indicate when TSV files have been generated.

This enhancement allows you to easily obtain both SQL dumps (for schema and full data restoration) and TSV data dumps (for easier analysis or import into other systems) simultaneously, for both single indexes and full server backups.
* Fix a few syntax issues introduced by bot (not many!) 
* fix the non-positional argument parsing, wasnt catching limit correctly. Remove the ability for the tsv filename to be non-positional.
The refactored code didnt cope with old servers
Shouldn't dump distributed indexes (nor template)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant