diff --git a/docs/volume-snapshot-workflow.md b/docs/volume-snapshot-workflow.md new file mode 100644 index 000000000000..286a677a0d71 --- /dev/null +++ b/docs/volume-snapshot-workflow.md @@ -0,0 +1,390 @@ +# Volume Snapshot Workflow in CloudStack + +This document describes the end-to-end workflow for taking volume-level snapshots from the CloudStack Management Server, organized in the sequence that CloudStack orchestrates the operation. + +--- + +## Overview + +A volume snapshot in CloudStack captures the state of a disk at a point in time. The snapshot can be stored on primary storage, secondary storage, or replicated across zones and storage pools. The workflow involves multiple layers: API, orchestration, storage engine, and storage-specific strategy plugins. + +--- + +## Step-by-Step Workflow + +### Step 1 — API Entry Point: `CreateSnapshotCmd.execute()` + +**File:** `api/src/main/java/org/apache/cloudstack/api/command/user/snapshot/CreateSnapshotCmd.java` + +The user (or scheduler) calls the `createSnapshot` API. The command is a `BaseAsyncCreateCmd`, meaning snapshot *allocation* and *execution* happen in two separate phases (create and execute). + +In the `execute()` phase: + +```java +snapshot = _volumeService.takeSnapshot( + getVolumeId(), getPolicyId(), getEntityId(), + getAccount(), getQuiescevm(), getLocationType(), + getAsyncBackup(), getTags(), getZoneIds(), + getStoragePoolIds(), useStorageReplication()); +``` + +Key parameters available to the caller: +- `volumeId` – the volume to snapshot +- `policyId` – optional snapshot policy to apply +- `locationType` – `PRIMARY` or `SECONDARY` +- `asyncBackup` – whether to back up to secondary asynchronously +- `zoneIds` – destination zones to copy the snapshot to +- `storagePoolIds` – specific primary storage pools to copy the snapshot to +- `useStorageReplication` – use native cross-zone storage replication (StorPool) + +--- + +### Step 2 — Allocation Phase: `VolumeApiServiceImpl.allocSnapshot()` + +**File:** `server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java` + +Before `execute()` is called, `create()` runs `allocSnapshot()` which: + +1. Verifies the caller has access to the volume. +2. Validates resource limits (snapshot count, secondary storage quota). +3. Generates a snapshot name in the format `__`. +4. Creates a `SnapshotVO` record in the database in state `Allocated`. +5. Increments resource counters for the account (snapshot count, storage size). + +The allocation returns the snapshot ID, which is then used by the `execute()` phase. + +--- + +### Step 3 — Validation and Path Selection: `VolumeApiServiceImpl.takeSnapshotInternal()` + +**File:** `server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java` + +`takeSnapshotInternal()` performs pre-flight checks before dispatching work: + +1. Re-validates volume exists and is in `Ready` state. +2. Rejects snapshots on `External` hypervisor type volumes. +3. Resolves `zoneIds` and `poolIds` from snapshot policy details if a `policyId` is provided. +4. Validates each destination zone exists. +5. Checks that the caller has access to both the volume and (if attached) the VM. +6. If the storage pool is managed and `locationType` is unset, defaults to `LocationType.PRIMARY`. +7. Calls `snapshotHelper.addStoragePoolsForCopyToPrimary()` to resolve storage pool IDs when `useStorageReplication` is enabled. + +**Path selection based on VM attachment:** + +``` +Volume attached to running VM? +├── YES → Serialize via VM Work Job Queue +│ (Step 4a — job queue path) +└── NO → Direct execution + (Step 4b — direct path) +``` + +--- + +### Step 4a — Serialized Execution via VM Work Job Queue + +**File:** `server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java` + +When the volume is attached to a VM, CloudStack serializes the operation using the VM Work Job queue. This prevents concurrent conflicting operations on the same VM. + +```java +Outcome outcome = takeVolumeSnapshotThroughJobQueue( + vm.getId(), volumeId, policyId, snapshotId, + account.getId(), quiesceVm, locationType, + asyncBackup, zoneIds, poolIds); +``` + +A `VmWorkTakeVolumeSnapshot` work item is created and dispatched. The job framework eventually calls `orchestrateTakeVolumeSnapshot(VmWorkTakeVolumeSnapshot work)` from within the VM work job dispatcher. + +If the current thread is *already* running inside the job dispatcher (re-entrant case), a placeholder work record is created and `orchestrateTakeVolumeSnapshot()` is called directly to avoid deadlock. + +**`VmWorkTakeVolumeSnapshot` carries:** + +```java +// engine/components-api/src/main/java/com/cloud/vm/VmWorkTakeVolumeSnapshot.java +new VmWorkTakeVolumeSnapshot(userId, accountId, vmId, handlerName, + volumeId, policyId, snapshotId, quiesceVm, + locationType, asyncBackup, zoneIds, poolIds); +``` + +--- + +### Step 4b — Direct Execution (Volume Not Attached to VM) + +**File:** `server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java` + +When the volume is not attached to a VM, a `CreateSnapshotPayload` is built and attached directly to the volume: + +```java +CreateSnapshotPayload payload = new CreateSnapshotPayload(); +payload.setSnapshotId(snapshotId); +payload.setSnapshotPolicyId(policyId); +payload.setAccount(account); +payload.setQuiescevm(quiescevm); +payload.setLocationType(locationType); +payload.setAsyncBackup(asyncBackup); +payload.setZoneIds(zoneIds); +payload.setStoragePoolIds(poolIds); + +volume.addPayload(payload); +return volService.takeSnapshot(volume); +``` + +--- + +### Step 5 — Orchestration: `orchestrateTakeVolumeSnapshot()` + +**File:** `server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java` + +Whether coming from the job queue or directly, `orchestrateTakeVolumeSnapshot()` handles the final preparation: + +1. Re-validates the volume is still `Ready`. +2. Detects whether the volume is encrypted and on a running VM; rejects such snapshots unless the storage is StorPool (which supports live encrypted volume snapshots). +3. Builds the `CreateSnapshotPayload` with all execution parameters. +4. Attaches the payload to the volume. +5. Calls `volService.takeSnapshot(volume)` — delegating to `SnapshotManagerImpl`. + +**StorPool encrypted volume exception:** + +```java +boolean isSnapshotOnStorPoolOnly = + volume.getStoragePoolType() == StoragePoolType.StorPool && + SnapshotInfo.BackupSnapshotAfterTakingSnapshot.value(); +// Allow live snapshot of encrypted volumes on StorPool primary storage +``` + +--- + +### Step 6 — Strategy Selection and Snapshot Execution: `SnapshotManagerImpl.takeSnapshot()` + +**File:** `server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java` + +This is the core snapshot execution method: + +1. Extracts `CreateSnapshotPayload` from the volume. +2. Determines whether to use KVM file-based storage path. +3. Checks if backup to secondary storage is needed for this zone. +4. For KVM file-based storage with secondary backup, allocates an image store. +5. Selects the appropriate `SnapshotStrategy` via `StorageStrategyFactory.getSnapshotStrategy(snapshot, TAKE)`. + +**Strategy selection priority (highest wins):** + +| Strategy | Priority | Handles | +|---|---|---| +| `StorPoolSnapshotStrategy` | HIGHEST (for DELETE/COPY) | DELETE, COPY on StorPool storage | +| `StorageSystemSnapshotStrategy` | HIGH | Managed storage (TAKE, DELETE) | +| `DefaultSnapshotStrategy` | DEFAULT | File-based hypervisor snapshots | +| `CephSnapshotStrategy` | HIGH | Ceph RBD snapshots | +| `ScaleIOSnapshotStrategy` | HIGH | ScaleIO/PowerFlex snapshots | + +6. Calls `snapshotStrategy.takeSnapshot(snapshot)` which returns a `SnapshotInfo` on primary storage. + +--- + +### Step 7 — Primary Storage Snapshot Creation: `SnapshotServiceImpl.takeSnapshot()` + +**File:** `engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/SnapshotServiceImpl.java` + +The storage engine creates the snapshot on primary storage: + +1. Creates a snapshot state object on the primary data store. +2. Transitions snapshot state: `CreateRequested`. +3. Transitions volume state: `Volume.Event.SnapshotRequested`. +4. Issues an asynchronous command to the primary data store driver (`PrimaryDataStoreDriver.takeSnapshot()`). +5. Waits for the async callback via `AsyncCallFuture`. +6. On success: + - Updates physical size from the driver response. + - Publishes `EVENT_SNAPSHOT_ON_PRIMARY` usage event. + - Transitions volume: `Volume.Event.OperationSucceeded`. +7. On failure: + - Transitions snapshot to `OperationFailed`. + - Transitions volume: `Volume.Event.OperationFailed`. + +--- + +### Step 8 — Secondary Storage Backup Decision + +**File:** `server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java` + +After the snapshot is created on primary, CloudStack decides whether to back it up: + +``` +BackupSnapshotAfterTakingSnapshot == true? +├── YES +│ ├── KVM file-based → postSnapshotDirectlyToSecondary() +│ │ (snapshot already on secondary — update DB reference only) +│ └── Otherwise → backupSnapshotToSecondary() +│ ├── asyncBackup == true → schedule BackupSnapshotTask +│ └── asyncBackup == false → synchronous backupSnapshot() + postSnapshotCreation() +└── NO + ├── storagePoolIds provided AND asyncBackup → schedule BackupSnapshotTask for pool copy + └── Otherwise → markBackedUp() (snapshot stays on primary only) +``` + +**`BackupSnapshotTask`** (async retry runner): +- Retries backup up to `snapshot.backup.to.secondary.retries` times. +- On exhausting retries, calls `snapshotSrv.cleanupOnSnapshotBackupFailure()` to remove the snapshot record. + +--- + +### Step 9 — StorPool Cross-Zone Snapshot Copy: `StorPoolSnapshotStrategy.copySnapshot()` + +**File:** `plugins/storage/volume/storpool/src/main/java/org/apache/cloudstack/storage/snapshot/StorPoolSnapshotStrategy.java` + +When `storagePoolIds` are provided and the storage is StorPool, the snapshot is replicated natively between clusters: + +1. **Export** the snapshot from the local StorPool cluster to the remote location using `snapshotExport()`. +2. **Persist recovery information** in `snapshot_details` table with the exported name and location, so that partial cross-zone copies can be recovered. +3. **Copy from remote** on the destination StorPool cluster using `snapshotFromRemote()`. +4. **Reconcile** the snapshot on the remote cluster using `snapshotReconcile()`. +5. **Update** the `snapshot_store_ref.install_path` in the database to reflect the destination path. +6. Invoke the async callback with success or failure. + +**Recovery detail saved:** + +```java +// Stored so incomplete exports can be cleaned up later +String detail = "~" + snapshotName + ";" + location; +new SnapshotDetailsVO(snapshot.getId(), SP_RECOVERED_SNAPSHOT, detail, true); +``` + +--- + +### Step 10 — Post-Snapshot Processing: `postCreateSnapshot()` and Zone/Pool Copies + +**File:** `server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java` + +After snapshot creation (and optional backup): + +1. **`postCreateSnapshot()`**: Updates snapshot policy retention — removes the oldest snapshot if the retention count is exceeded. +2. **`snapshotZoneDao.addSnapshotToZone()`**: Associates the snapshot with its origin zone. +3. **Usage event**: Publishes `EVENT_SNAPSHOT_CREATE` with the physical size of the snapshot. +4. **Resource limit correction**: For delta (incremental) snapshots, decrements the pre-allocated resource count by `(volumeSize − snapshotPhysicalSize)` since the actual snapshot is smaller than the volume. +5. **`copyNewSnapshotToZones()`** *(synchronous backup path only)*: Copies the snapshot to secondary storage in additional destination zones. +6. **`copyNewSnapshotToZonesOnPrimary()`** *(synchronous backup path only)*: Copies the snapshot to additional primary storage pools. + +--- + +### Step 11 — Rollback on Failure + +**File:** `server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java` + +The outer `try/catch` in `takeSnapshot()` ensures resource cleanup on any failure: + +```java +} catch (CloudRuntimeException | UnsupportedOperationException cre) { + ResourceType storeResourceType = getStoreResourceType(...); + _resourceLimitMgr.decrementResourceCount(snapshotOwner.getId(), ResourceType.snapshot); + _resourceLimitMgr.decrementResourceCount(snapshotOwner.getId(), storeResourceType, volumeSize); + throw cre; +} catch (Exception e) { + // Same resource rollback + throw new CloudRuntimeException("Failed to create snapshot", e); +} +``` + +**Additional cleanup methods:** + +| Method | Trigger | Action | +|---|---|---| +| `cleanupVolumeDuringSnapshotFailure()` | Snapshot creation fails completely | Removes `snapshot_store_ref` entries (non-Destroyed) and deletes the `SnapshotVO` record | +| `cleanupOnSnapshotBackupFailure()` | Async backup exhausts all retries | Transitions snapshot state, removes async job MS_ID, deletes snapshot record | +| `StorPoolSnapshotStrategy.deleteSnapshot()` | Snapshot DELETE operation on StorPool | Calls StorPool API `snapshotDelete`, transitions state, cleans up DB | + +--- + +## Sequence Diagram (Text Form) + +``` +User/Scheduler + │ + ▼ +CreateSnapshotCmd.create() + │ allocSnapshot() → SnapshotVO persisted (Allocated state) + ▼ +CreateSnapshotCmd.execute() + │ + ▼ +VolumeApiServiceImpl.takeSnapshot() + │ + ▼ +takeSnapshotInternal() + │ validate volume, account, zones, policies + │ + ├── [Volume attached to VM] ─────────────────────────────┐ + │ takeVolumeSnapshotThroughJobQueue() │ + │ VmWorkTakeVolumeSnapshot dispatched │ + │ ← job queue serializes VM operations → │ + │ ▼ + └── [Volume not attached] ──► orchestrateTakeVolumeSnapshot() + │ build CreateSnapshotPayload + │ volume.addPayload(payload) + ▼ + SnapshotManagerImpl.takeSnapshot() + │ + │ StorageStrategyFactory.getSnapshotStrategy(TAKE) + ▼ + snapshotStrategy.takeSnapshot(snapshot) + │ + ▼ + SnapshotServiceImpl.takeSnapshot() + │ PrimaryDataStoreDriver.takeSnapshot() [async] + │ ← waits on AsyncCallFuture → + │ snapshot created on primary storage + ▼ + Backup decision + ├── BackupSnapshotAfterTakingSnapshot=true + │ backupSnapshotToSecondary() [sync or async] + └── BackupSnapshotAfterTakingSnapshot=false + markBackedUp() / schedule pool copy + ▼ + postCreateSnapshot() + snapshotZoneDao.addSnapshotToZone() + UsageEventUtils.publishUsageEvent() + _resourceLimitMgr.decrementResourceCount() + copyNewSnapshotToZones() [if zoneIds] + copyNewSnapshotToZonesOnPrimary() [if poolIds] + ▼ + Return SnapshotInfo to caller +``` + +--- + +## Key Classes and Their Roles + +| Class | Package | Role | +|---|---|---| +| `CreateSnapshotCmd` | `api/.../command/user/snapshot` | API command entry point; two-phase create+execute | +| `VolumeApiServiceImpl` | `server/.../storage` | Validates, dispatches, and orchestrates snapshot requests | +| `VmWorkTakeVolumeSnapshot` | `engine/components-api/.../vm` | Work item for job queue; carries all snapshot parameters | +| `SnapshotManagerImpl` | `server/.../storage/snapshot` | Core business logic; strategy selection; resource accounting | +| `SnapshotHelper` | `server/.../snapshot` | Resolves storage pool IDs for cross-zone replication | +| `SnapshotServiceImpl` | `engine/storage/snapshot` | Interacts with primary data store driver asynchronously | +| `DefaultSnapshotStrategy` | `engine/storage/snapshot` | Hypervisor-based (file) snapshot implementation | +| `StorageSystemSnapshotStrategy` | `engine/storage/snapshot` | Managed storage native snapshot implementation | +| `StorPoolSnapshotStrategy` | `plugins/storage/volume/storpool` | StorPool native snapshot; handles DELETE and cross-zone COPY | +| `StorageStrategyFactory` | `engine/storage` | Selects the highest-priority strategy for each operation | + +--- + +## Key Configuration Parameters + +| Parameter | Default | Description | +|---|---|---| +| `backup.snapshot.after.taking.snapshot` (`BackupSnapshotAfterTakingSnapshot`) | `true` | Whether to back up snapshot to secondary storage after creation | +| `snapshot.backup.retries` | `3` | Number of retry attempts for asynchronous snapshot backup | +| `snapshot.backup.retry.interval` | `300` (seconds) | Interval between retry attempts for async backup | +| `use.storage.replication` | `false` | Use native storage replication (e.g., StorPool cross-zone copy) instead of secondary storage copy | +| `snapshot.copy.multiply.exp.backoff` | — | Exponential backoff configuration for snapshot copy retries | + +--- + +## Rollback Summary + +CloudStack implements rollback at multiple layers to maintain consistency: + +1. **Resource limit rollback** — On any exception in `SnapshotManagerImpl.takeSnapshot()`, snapshot count and storage quotas are decremented back to their original values. +2. **Volume state rollback** — `Volume.Event.OperationFailed` is fired so the volume returns to `Ready` state. +3. **Snapshot state machine** — Snapshot transitions to `Error` or `Destroyed` so it can be cleaned up by the background expunge process. +4. **Async backup failure cleanup** — After exhausting all retries, `cleanupOnSnapshotBackupFailure()` runs in a transaction to delete the snapshot record and associated job metadata. +5. **StorPool cross-zone recovery** — The exported (but not yet imported) snapshot name is persisted in `snapshot_details` with the key `SP_RECOVERED_SNAPSHOT`, enabling manual or automated cleanup of partial cross-zone copies.