Skip to content

Implement backup & restore#479

Merged
jhamon merged 7 commits intorelease-candidate/2025-04from
jhamon/backup-restore
May 14, 2025
Merged

Implement backup & restore#479
jhamon merged 7 commits intorelease-candidate/2025-04from
jhamon/backup-restore

Conversation

@jhamon
Copy link
Copy Markdown
Collaborator

@jhamon jhamon commented May 13, 2025

Problem

Implement backup & restore

Solution

Added new methods to Pinecone and PineconeAsyncio:

  • create_index_from_backup
  • create_backup
  • list_backups
  • describe_backup
  • delete_backup
  • list_restore_jobs
  • describe_restore_job

These can also be accessed with the new-style syntax, e.g. pc.db.index.create_from_backup, pc.db.backup.create, pc.db.restore_job.list.

More Details:

  • Had to re-run codegen to pull in recent spec changes
  • Organize implementation around resource-types
  • Expose legacy-style names (create_backup, create_index_from_backups) as well as new-style names pc.db.index.create_from_backup. In the upcoming release, both styles will be present. We still need to do some work to reorg methods for some less-used parts of the client (bulk imports, etc) before transitioning fully to the new style in examples and documentation.
  • For new methods being added, begin enforcing keyword argument usage with a new @kwargs_required decorator. I will probably follow up and add this to all new methods added in the recent refactoring PR. Keyword arguments are strongly preferred over positional arguments because the keyword labels act as documentation and having the keyword labels makes them order-independent. This gives a lot of flexibility to expand the signature or change things from required to optional later without creating breaking changes for callers.
  • Wire up the code paths for new methods:
    • Pinecone > DbControl > BackupResource
    • Pinecone > DbControl > IndexResource
    • Pinecone > DbControl > RestoreJobResource
    • PineconeAsyncio > DbControlAsyncio > AsyncioBackupResource
    • PineconeAsyncio > DbControlAsyncio > AsyncioIndexResource
    • PineconeAsyncio > DbControlAsyncio > AsyncioRestoreJobResource
  • Update interface classes so that docs will show information about the new methods.

Usage

Initial setup

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key='key')

# First you need an index
pc.create_index(
    name='foo',
    dimension=2,
    metric='cosine',
    spec=ServerlessSpec(cloud='aws', region='us-east-1')
)

# Upsert some fake data just for demonstration purposes
import random

idx = pc.Index(name='foo')
idx.upsert(
    vectors=[
         (str(i), [random.random(), random.random()] for i in range(1000)
    ]
)

Backups

pc.create_backup(
    index_name='foo', 
    backup_name='bar', 
    description='an example backup'
)

# Describe a backup
pc.describe_backup(backup_id='7c8e6fcf-577b-4df5-9869-3c67f0f3d6e1')
# {
#     "backup_id": "7c8e6fcf-577b-4df5-9869-3c67f0f3d6e1",
#     "source_index_name": "foo",
#     "source_index_id": "4c292a8a-77cc-4a37-917d-51c6051a80bf",
#     "status": "Ready",
#     "cloud": "aws",
#     "region": "us-east-1",
#     "tags": {},
#     "name": "bar",
#     "description": "",
#     "dimension": 2,
#     "record_count": 1000,
#     "namespace_count": 1,
#     "size_bytes": 289392,
#     "created_at": "2025-05-13T14:15:16.908702Z"
# }


# List backups
pc.list_backups()
# [
#     {
#         "backup_id": "7c8e6fcf-577b-4df5-9869-3c67f0f3d6e1",
#         "source_index_name": "foo",
#         "source_index_id": "4c292a8a-77cc-4a37-917d-51c6051a80bf",
#         "status": "Ready",
#         "cloud": "aws",
#         "region": "us-east-1",
#         "tags": {},
#         "name": "bar",
#         "description": "",
#         "dimension": 2,
#         "record_count": 1000,
#         "namespace_count": 1,
#         "size_bytes": 289392,
#         "created_at": "2025-05-13T14:15:16.908702Z"
#     }
# ]

# Delete backup
pc.delete_backup(backup_id='7c8e6fcf-577b-4df5-9869-3c67f0f3d6e1')

Creating an index from backup

# Create index from backup
pc.create_index_from_backup(
  backup_id='7c8e6fcf-577b-4df5-9869-3c67f0f3d6e1',
  name='foo2',
  deletion_protection='enabled',
  tags={'env': 'testing'}
)
# {
#     "name": "foo2",
#     "metric": "cosine",
#     "host": "foo2-dojoi3u.svc.aped-4627-b74a.pinecone.io",
#     "spec": {
#         "serverless": {
#             "cloud": "aws",
#             "region": "us-east-1"
#         }
#     },
#     "status": {
#         "ready": true,
#         "state": "Ready"
#     },
#     "vector_type": "dense",
#     "dimension": 2,
#     "deletion_protection": "enabled",
#     "tags": {
#         "env": "testing"
#     }
# }

Restore job

# List jobs
pc.list_restore_jobs()
# {'data': [{'backup_id': 'e5957dc2-a76e-4b72-9645-569fb7ec143f',
#            'completed_at': datetime.datetime(2025, 5, 13, 14, 56, 13, 939921, tzinfo=tzutc()),
#            'created_at': datetime.datetime(2025, 5, 13, 14, 56, 4, 534826, tzinfo=tzutc()),
#            'percent_complete': 100.0,
#            'restore_job_id': '744ea5bd-7ddc-44ce-81f5-cfb876572e59',
#            'status': 'Completed',
#            'target_index_id': '572130f9-cfdd-42bf-a280-4218cd112bf8',
#            'target_index_name': 'foo2'},
#           {'backup_id': '7c8e6fcf-577b-4df5-9869-3c67f0f3d6e1',
#            'completed_at': datetime.datetime(2025, 5, 13, 16, 27, 10, 290234, tzinfo=tzutc()),
#            'created_at': datetime.datetime(2025, 5, 13, 16, 27, 6, 130522, tzinfo=tzutc()),
#            'percent_complete': 100.0,
#            'restore_job_id': '06aa5739-2785-4121-b71b-99b73c3e3247',
#            'status': 'Completed',
#            'target_index_id': 'd3f31cd1-b077-4bcf-8e7d-d091d408c82b',
#            'target_index_name': 'foo2'}],
#  'pagination': None}

# Describe jobs
pc.describe_restore_job(job_id='504dd1a9-e3cd-420f-8756-65d5411fcb10')
# {'backup_id': '7c8e6fcf-577b-4df5-9869-3c67f0f3d6e1',
#  'completed_at': datetime.datetime(2025, 5, 13, 15, 55, 10, 108584, tzinfo=tzutc()),
#  'created_at': datetime.datetime(2025, 5, 13, 15, 54, 49, 925105, tzinfo=tzutc()),
#  'percent_complete': 100.0,
#  'restore_job_id': '504dd1a9-e3cd-420f-8756-65d5411fcb10',
#  'status': 'Completed',
#  'target_index_id': 'b5607ee7-be78-4401-aaf5-ea20413f409d',
#  'target_index_name': 'foo4'}

Type of Change

  • New feature (non-breaking change which adds functionality)

Test Plan

Describe specific steps for validating this change.

@jhamon jhamon marked this pull request as ready for review May 13, 2025 16:58
@jhamon jhamon requested a review from austin-denoble May 13, 2025 16:58
Copy link
Copy Markdown
Contributor

@austin-denoble austin-denoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, nice work. Even with all the refactoring underneath everything still feels like a clean and easy review.

)
return BackupList(self._index_api.list_index_backups(**args))
else:
return BackupList(self._index_api.list_project_backups())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make sure to pass args through once you've pulled in the changes to add pagination to this one.

deletion_protection: Optional[Union["DeletionProtection", str]] = "disabled",
tags: Optional[Dict[str, str]] = None,
timeout: Optional[int] = None,
) -> "IndexModel":
Copy link
Copy Markdown
Contributor

@austin-denoble austin-denoble May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: Ignore - I saw elsewhere why this returns IndexModel 👍

This doesn't return an IndexModel. It returns a different shape as this initiates a restore job, and the index hasn't been created yet. It should be something like CreateIndexFromBackupResponse, it looks like this:

{
  restore_job_id: string
  index_id: string
}

await self.index_api.create_index_from_backup(
backup_id=backup_id, create_index_from_backup_request=req
)
return await self.__poll_describe_index_until_ready(name, timeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I commented on returning IndexModel rather than the response shape from the API call, but I didn't realize you were awaiting. That's a nice bit of sugar for this.

@jhamon jhamon merged commit 5c0e37c into release-candidate/2025-04 May 14, 2025
64 of 70 checks passed
@jhamon jhamon deleted the jhamon/backup-restore branch May 14, 2025 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants