From a8032401ebf142bedf780e91463454b3ee7b56d5 Mon Sep 17 00:00:00 2001 From: Joe Corall Date: Thu, 16 Apr 2026 08:36:10 -0400 Subject: [PATCH] Add docs for islandora_events --- docs/alpaca/alpaca-technical-stack.md | 7 + docs/installation/manual/installing-alpaca.md | 6 + docs/technical-documentation/alpaca-tips.md | 7 + docs/technical-documentation/diagram.md | 81 +-- .../islandora-events.md | 578 ++++++++++++++++++ docs/technical-documentation/scaling.md | 416 +++++++++++++ docs/user-documentation/versioning.md | 6 +- mkdocs.yml | 2 + 8 files changed, 1066 insertions(+), 37 deletions(-) create mode 100644 docs/technical-documentation/islandora-events.md create mode 100644 docs/technical-documentation/scaling.md diff --git a/docs/alpaca/alpaca-technical-stack.md b/docs/alpaca/alpaca-technical-stack.md index 9ebc7316f..4f0566cf9 100644 --- a/docs/alpaca/alpaca-technical-stack.md +++ b/docs/alpaca/alpaca-technical-stack.md @@ -1,4 +1,11 @@ # Alpaca Technical Stack + +!!! warning "Deprecated guidance" + This page documents the legacy ActiveMQ/Alpaca stack. For current + Islandora deployments, use + [Islandora Events](../technical-documentation/islandora-events.md) + instead. + As of version 2.0.0, Alpaca contains several tools bundled into a single runnable [jar](https://en.wikipedia.org/wiki/JAR_(file_format)) file. The different tools can be enabled/disabled depending on the configuration you define. ## [Gradle](https://docs.gradle.org/current/userguide/tutorial_using_tasks.html) diff --git a/docs/installation/manual/installing-alpaca.md b/docs/installation/manual/installing-alpaca.md index 4438f9c92..067e2662b 100644 --- a/docs/installation/manual/installing-alpaca.md +++ b/docs/installation/manual/installing-alpaca.md @@ -1,5 +1,11 @@ # Installing ActiveMQ and Alpaca +!!! warning "Deprecated guidance" + This page documents the legacy ActiveMQ/Alpaca stack. For current + Islandora deployments, use + [Islandora Events](../../technical-documentation/islandora-events.md) + instead. + !!! note "Karaf no longer needed" You no longer need to install Karaf. We no longer do this, we just deploy the java apps. diff --git a/docs/technical-documentation/alpaca-tips.md b/docs/technical-documentation/alpaca-tips.md index fb9bec185..50cdc749a 100644 --- a/docs/technical-documentation/alpaca-tips.md +++ b/docs/technical-documentation/alpaca-tips.md @@ -1,5 +1,12 @@ # Alpaca Tips +!!! warning "Deprecated guidance" + This page documents the legacy ActiveMQ/Alpaca stack. For current + Islandora deployments, use + [Islandora Events](islandora-events.md) instead. + +[Alpaca](https://github.com/Islandora/Alpaca) is event-driven middleware based on [Apache Camel](https://camel.apache.org/) for Islandora + [Alpaca] ships with four event-driven components - [islandora-connector-derivative](#islandora-connector-derivative) diff --git a/docs/technical-documentation/diagram.md b/docs/technical-documentation/diagram.md index 67bb77a52..9fe359acf 100644 --- a/docs/technical-documentation/diagram.md +++ b/docs/technical-documentation/diagram.md @@ -88,7 +88,11 @@ You can read more about this in [Islandora's Flysystem documentation](../user-do ## Microservices -In addition to all the tools Drupal provides, Islandora extends the Drupal site's capabilities using an event-driven, distributed architecture of [Microservices]. When a repository manager creates, updates, or deletes Drupal [entities], the Islandora Drupal module generates an event message which is put on Islandora's [ActiveMQ] queue. +In addition to all the tools Drupal provides, Islandora extends the Drupal +site's capabilities using an event-driven architecture of [Microservices]. +When a repository manager creates, updates, or deletes Drupal [entities], +Islandora records durable work in Drupal and dispatches a message through +Symfony Messenger for asynchronous processing. There are two different types of events Islandora emits: @@ -97,21 +101,25 @@ There are two different types of events Islandora emits: ### Derivative Events -Below is a full diagram of the different microservices Islandora provides. You can see as an animation in the diagram what happens when an Islandora repository manager uploads an image to their Islandora repository. First, Drupal emits an event to generate a thumbnail for that image. That event is put on the [ActiveMQ] event queue, [alpaca] reads the message from the queue, and forwards the event to the configured service. In the case of a thumbnail, [houdini] handles generating the thumbnail for the uploaded image. [Houdini] creates the thumbnail and alpaca saves the thumbnail in Drupal. +Below is a simplified diagram of the derivative flow. Drupal queues a +derivative job, a Messenger worker receives it, and the worker invokes the +configured derivative service. By default, worker processes run in the Drupal +deployment. CPU-intensive or memory-intensive derivative work can also be +distributed to external services. See [Scaling Islandora Events](scaling.md) +for the deployment patterns and operator guidance. ```mermaid flowchart TD drupal([Islandora Drupal Website]) - drupal e1@-->|publishes drupal entity event| activemq + drupal e1@-->|queues derivative job + ledger record| messenger - subgraph broker[Message Broker] - activemq[ActiveMQ] - alpaca[Alpaca] - activemq e2@-->|alpaca receives event| alpaca + subgraph runtime[Drupal Messenger Runtime] + messenger[Symfony Messenger + sm_ledger] + worker[Derivative Worker] + messenger e2@-->|worker receives derivative message| worker end - subgraph microservices[scyllaridae microservices] fits[FITS] homarus[Homarus] @@ -119,16 +127,17 @@ flowchart TD hypercube[Hypercube] end - alpaca --> fits - alpaca --> homarus - alpaca e3@--> houdini - alpaca --> hypercube + worker --> fits + worker --> homarus + worker e3@--> houdini + worker --> hypercube + + fits -.->|derivative streamed back| worker + homarus -.->|derivative streamed back| worker + houdini e4@-.->|derivative streamed back| worker + hypercube -.->|derivative streamed back| worker - fits -.->|derivative streamed back| alpaca - homarus -.->|derivative streamed back| alpaca - houdini e4@-.->|derivative streamed back| alpaca - hypercube -.->|derivative streamed back| alpaca - alpaca e5@-.->|alpaca saves the derivative| drupal + worker e5@-.->|worker saves derivative + updates ledger| drupal class e1 flow0; class e2 flow1; @@ -139,30 +148,37 @@ flowchart TD ### Index Events -There are two systems that are populated using Islandora Index Events: [Blazegraph] and [Fedora (Repository Software)]. - -- For [Blazegraph], [Alpaca] is fully implemented to be able to index content from Drupal directly into Blazegraph using [RDF]. -- For [Fedora (Repository Software)] an intermediate service is used called [Milliner] to store the metadata in Fedora. +Drupal dispatches indexing work through Symfony Messenger-backed workers that +write directly to [Blazegraph] and [Fedora (Repository Software)]. Like the +derivative flow, the worker runtime can stay in the Drupal deployment or be +scaled out by moving transport or downstream execution to separate services. +See [Scaling Islandora Events](scaling.md) for those scaling patterns. ```mermaid flowchart TD drupal([Islandora Drupal Website]) - drupal e1@-->|publishes drupal entity event| activemq + drupal e1@-->|queues index job + ledger record| messenger - subgraph broker[Message Broker] - activemq[ActiveMQ] - alpaca[Alpaca] - activemq e2@-->|alpaca receives event| alpaca + subgraph runtime[Drupal Messenger Runtime] + messenger[Symfony Messenger + sm_ledger] + worker[Index Worker] + messenger e2@-->|worker receives index message| worker end + fedora_indexer[Fedora indexer] + blazegraph_indexer[Blazegraph indexer] - alpaca e3@--> milliner + worker e3@--> fedora_indexer + worker --> blazegraph_indexer fedora[(Fedora)] - milliner e4@-.->|syncs resource to| fedora - blazegraph[(blazegraph)] - alpaca -.->|sends RDF| blazegraph + fedora_indexer e4@-.->|syncs resource to| fedora + + triplestore[(Blazegraph)] + blazegraph_indexer -.->|writes RDF to| triplestore + + worker e5@-.->|worker updates ledger| drupal class e1 flow0; class e2 flow1; @@ -182,14 +198,12 @@ The following components are microservices developed and maintained by the Islan * [Homarus] * [Houdini] * [Hypercube] -* [Milliner] (uses [Crayfish]) ### Other Open Source The following components are deployed with Islandora, but are developed and maintained by other open source projects: * [Apache] - * [ActiveMQ] * [Tomcat] * [Solr] * [Blazegraph] @@ -197,8 +211,7 @@ The following components are deployed with Islandora, but are developed and main * [Drupal] * [FITS] * [Fedora (Repository Software)] -* [MariaDB] +* [MariaDB] or [PostgreSQL] * [NGINX] -* [PostgreSQL] * [Traefik] * Triplestore - See [Blazegraph] diff --git a/docs/technical-documentation/islandora-events.md b/docs/technical-documentation/islandora-events.md new file mode 100644 index 000000000..18384cf97 --- /dev/null +++ b/docs/technical-documentation/islandora-events.md @@ -0,0 +1,578 @@ +# Islandora Events + +`islandora_events` is the Drupal-native replacement for the legacy +ActiveMQ/Alpaca stack. It handles derivative generation, durable queueing, and +operator-visible job state through Symfony Messenger and the `sm_ledger` +module. Service-specific indexing logic can live in submodules, with +`islandora_events_fcrepo` owning direct Fedora/fcrepo indexing and +`islandora_events_blazegraph` owning direct Blazegraph indexing. + +For most sites, the point is not "use Messenger because Symfony has it". The +point is that Drupal now owns the queue, the worker, the retry model, and the +operator-visible state that used to be split across ActiveMQ, Alpaca, and +downstream callback behavior. + +Container-managed Messenger workers require the `islandora/drupal` image at +version 6.4.0 or newer. If you are enabling the s6-supervised worker services +in `docker-compose.yml`, confirm the Drupal image tag meets that minimum before +switching `DRUPAL_SM_WORKERS_MODE` to `container`. + +## What it replaces + +The old stack depended on: + +- ActiveMQ for transport +- Alpaca for routing and connector orchestration +- Java and Camel changes for many connector customizations + +`islandora_events` keeps the same high-level repository workflows, but moves +the transport, worker lifecycle, retry visibility, and operational state into +Drupal. + +## Distributed architecture translation + +Use this table when you are translating Kafka, RabbitMQ, or general +distributed-systems concepts into the Drupal and Islandora implementation. + +| Distributed concept | Drupal / `islandora_events` equivalent | +| --- | --- | +| Broker topic | SM transport such as `islandora_derivatives`, `islandora_index_fedora`, or `failed` | +| Consumer group offset | SQL transport claim state through `claimed_at` and `claim_token`; some scanner cursors use Drupal `state` | +| Partition | One transport queue or table per concern; parallel workers rely on transport claiming instead of shared-memory threading | +| Message ID | Transport row `id` plus a business dedupe key | +| Producer | Drupal hooks and services such as `DerivativeQueueService` and `IndexEventService` | +| Consumer | Long-lived `drush sm:consume ...` worker process supervised by `systemd`, `supervisor`, `s6`, Kubernetes, or similar | +| Dead-letter topic | The `failed` transport and its corresponding failure handling | +| Consumer group registry | `sm_workers` worker-definition registry | +| Connector configuration | `WorkerDefinitionProviderInterface` implementations and module transport routing config | +| Schema registry | Symfony Messenger serializer plus PHP message classes | +| Offset commit / ack | Transport `ack()` after successful handling; SQL transport deletes the claimed row after claim-token validation | + +Two translation caveats matter: + +- Drupal workers are long-lived PHP CLI processes, not threads. +- `islandora_events` now relies on `drupal/sm` transport implementations + directly instead of shipping its own custom transport. + +## Core concepts + +- **Derivative runners** define how a derivative queue is executed. +- **Index targets** define how an entity event is sent to an indexing backend. +- **Ledger records** provide the durable operator-facing history for queued, + running, failed, and completed work. +- **Circuit breakers** protect downstream HTTP integrations from repeated + failures. + +!!! islandora "Lobster trap" + `index_targets` is the indexing configuration model. If an integration + needs a new target, add or install a Drupal service for that target rather + than relying on legacy queue-name compatibility settings. + +## Choose the right workflow + +Use the path that matches your role: + +- **Operator, no code**: configure a new HTTP derivative endpoint in + `derivative_runners`, point an Islandora action at that queue, and run the + existing derivative worker. +- **Operator, Fedora/fcrepo**: enable `islandora_events_fcrepo`, set + `index_targets.fedora.endpoint` to the Fedora REST base URL, and run the + Fedora indexing worker. +- **Operator, new indexing integration**: install the module that provides the + target service, enable the target in configuration, and run either the + shared custom-index worker or the dedicated worker shipped by that module. +- **Developer, PHP**: implement `IndexTargetInterface` and tag the service as + `islandora_events.index_target` to add a new indexing target. +- **Integrator, PHP plus transport**: add a custom message class, transport + routing, and worker definition when a workflow needs its own queue and worker + topology instead of the shared derivative or custom-index transports. + +## Configure an HTTP derivative microservice + +Use this when a derivative service is reachable over HTTP and does not need a +local command runner. + +1. Open **Configuration** >> **Web services** >> *Islandora Events settings* + (`/admin/config/services/islandora-events/settings`). +2. Expand **Derivative queue runners**. +3. Add a queue entry under `derivative_runners` keyed by the queue name used by + your Islandora action. +4. Set: + - `execution_mode: http` + - `endpoint: http://your-service:port/` + - `timeout: 300` or another appropriate value +5. Save configuration. +6. Confirm the Islandora action entity uses the same queue name. +7. Start or restart the derivative worker: + +```bash +drush sm:consume islandora_derivatives --time-limit=3600 +``` + +Example: + +```yaml +derivative_runners: + islandora-connector-myservice: + execution_mode: http + endpoint: 'http://myservice:8080/' + timeout: 300 +``` + +Matching action shape: + +```yaml +queue: islandora-connector-myservice +event: Generate Derivative +``` + +Validation flow: + +1. Trigger the action from Drupal. +2. Confirm a ledger row is created. +3. Confirm the derivative worker processes it. +4. Confirm the breaker appears in **Configuration** >> **Web services** >> + *SM Workers Circuit Breakers* after first use. + +!!! islandora "Config-only path" + This derivative HTTP workflow is config-driven. Adding the endpoint does + not require PHP code unless you also need a new execution mode, message + type, or dedicated transport. + +## Enable a command-mode derivative runner + +Use this when the derivative workflow must run a local binary instead of making +an HTTP request. + +1. Add or update the queue entry in `derivative_runners`. +2. Set `execution_mode: command`. +3. Set `command` and, if applicable, `config_path`. +4. In `settings.php`, explicitly allow command execution and allowlist the + binary. +5. Restart the derivative worker. + +Example `settings.php`: + +```php +$settings['islandora_events_derivative_command'] = [ + 'enabled' => TRUE, + 'allowed_binaries' => [ + '/usr/bin/scyllaridae', + ], + 'allow_insecure_args' => FALSE, +]; +``` + +Example runner: + +```yaml +derivative_runners: + islandora-connector-myservice: + execution_mode: command + command: '/usr/bin/scyllaridae' + config_path: '/opt/scyllaridae/myservice/scyllaridae.yml' + timeout: 300 +``` + +!!! note "Command mode is privileged" + Command-mode execution is disabled unless `settings.php` enables it. This + is intentional and should be treated like any other local process-execution + permission. + +## Add a new index target + +Adding a new index target is a code change, not just a config change. + +1. Create a class implementing `IndexTargetInterface`. +2. Tag it as `islandora_events.index_target` in your module's service + definition. +3. Add target-specific configuration under `index_targets`. +4. Clear caches so Drupal rebuilds the tagged service container. +5. In the common case, no additional transport setup is required. Custom + targets dispatch through `islandora_index_custom`. +6. Only if the target needs its own dedicated worker transport, add: + - a new `IndexEventMessage` subclass + - an `sm.routing.yml` route for that subclass + - a matching `sm.transports.yml` transport + - a dedicated worker command for that transport + +The `islandora_events_blazegraph` and `islandora_events_fcrepo` submodules are +examples of the dedicated transport pattern. + +## Configure the direct Blazegraph target + +Use this when you want Drupal to write SPARQL updates directly to Blazegraph +without a separate Alpaca triplestore indexer service. + +1. Enable `islandora_events_blazegraph`. + Run `composer install` or `composer update` first if the submodule was just + added to the codebase, so the required PHP RDF library is available. +2. Open **Configuration** >> **Web services** >> *Islandora Events settings* + (`/admin/config/services/islandora-events/settings`). +3. In the **Blazegraph indexing target** section, enable the target. +4. Set the endpoint to the Blazegraph SPARQL update URL, for example + `http://blazegraph:8080/bigdata/namespace/islandora/sparql`. +5. Optional: set a named graph URI if your repository writes repository + triples into a non-default graph. +6. Start or restart the Blazegraph indexing worker: + +```bash +drush sm:consume islandora_index_blazegraph --time-limit=3600 +``` + +Useful operator command: + +```bash +drush islandora-events-blazegraph:index-record 123 +``` + +## Configure the direct Fedora/fcrepo target + +Use this when you want Drupal to index directly into Fedora without a separate +Milliner service. Revision-backed Drupal updates also create Fedora mementos in +this path, so Fedora versioning no longer depends on Alpaca or Milliner. + +1. Enable `islandora_events_fcrepo`. +2. Open **Configuration** >> **Web services** >> *Islandora Events settings* + (`/admin/config/services/islandora-events/settings`). +3. In the **Fedora/fcrepo indexing target** section, enable the target. +4. Set the endpoint to the Fedora REST base URL, for example + `http://fcrepo:8080/fcrepo/rest`. +5. Start or restart the Fedora indexing worker: + +```bash +drush sm:consume islandora_index_fedora --time-limit=3600 +``` + +Useful operator command: + +```bash +drush islandora-events-fcrepo:index-record 123 +``` + +## Start workers + +When using the container-managed worker model, set +`DRUPAL_SM_WORKERS_MODE=container` in the Drupal service environment. Leave it +as `external` when workers run in a separate container or host. + +Start the worker that matches the transport you configured: + +```bash +drush sm:consume islandora_derivatives --time-limit=3600 +drush sm:consume islandora_index_fedora --time-limit=3600 +drush sm:consume islandora_index_blazegraph --time-limit=3600 +drush sm:consume islandora_index_custom --time-limit=3600 +``` + +If you add a dedicated custom transport, start a worker for that transport as +well. + +When optional scheduler-driven submodules are enabled, keep their workers +separate from the main request-triggered transports. Typical examples are: + +- `scheduler_islandora_events_backfill` +- `islandora_backfill` +- `scheduler_islandora_events_mergepdf` +- `islandora_mergepdf` + +Do not rely on web requests to drain queues. Run workers under a process +manager, container supervisor, or orchestration platform. + +Use `drush sm-workers:list` to inspect the canonical worker commands provided +by the enabled module set. Those commands can then be run under `systemd`, +`supervisor`, `s6`, Kubernetes, or another process manager. + +## Worker topology and scaling + +Start with one worker per core transport and scale by transport, not by one +undifferentiated worker pool. + +Recommended tuning order: + +1. Add derivative workers first. +2. Tune Fedora indexing workers separately. +3. Tune Blazegraph indexing workers separately. +4. Keep scheduler and reconciliation work isolated from normal ingest. + +Use `drush islandora-events:capacity-report --window-minutes=15` before +changing topology. In general: + +- if queue wait stays low and throughput meets your target, keep the current + worker placement +- if queue depth and queue wait stay elevated, add transport-specific workers + first +- if the database becomes the limiting factor, re-evaluate the transport + backend before changing the ledger model + +`sm_ledger` remains the durable operator projection even if transport +implementation changes later. + +## Transport notes + +`islandora_events` is transport-aware but no longer transport-owned. + +- The worker model, message handlers, and `sm_workers` definitions work with + any transport that `drupal/sm` exposes. +- The default Islandora transports in this repository now use `drupal-sql://`. +- If you switch to Redis, ActiveMQ, or another transport, the worker and + handler layers should not need redesign. The transport DSNs and deployment + topology are the main moving parts. +- The durable correctness guarantees live in the ledger and handler layers: + enqueue-time deduplication, `findRecentByDedupeKey()` checks, and + `isQueuedForProcessing()` guards. + +## Operational limits + +The current design is appropriate for normal Islandora deployments and moderate +parallelism. For larger fleets, keep these constraints in mind: + +- Backfill scans are now guarded by a distributed lock so multiple web heads do + not run the same scanner at once. +- `processNativeQueue()` still filters by queue name after fetching candidate + records from the ledger. That is acceptable today, but at much larger scale a + first-class queue field or another queryable selector would be better. +- Queue depth metrics can show when workers are falling behind, but autoscaling + or admission control must still be provided by the deployment platform. +- Very large SQL-backed transports may eventually need sharding or a different + transport backend rather than more PHP workers alone. + +## Potential upstream work + +Most of the removed custom transport behavior should stay removed. + +- Transport-level UPSERT deduplication is not a good upstream fit. It depended + on message-specific business keys and duplicated responsibilities that belong + in the application and ledger layers. +- Claim-token stamps were only necessary because of that transport-level + deduplication and should not be reintroduced. +- Custom dead-letter row moves are also unnecessary because `drupal/sm` + already has failure-transport handling. + +The one improvement that does look like a reasonable upstream contribution is +better concurrent claiming for SQL-backed transports, especially support for +`SKIP LOCKED` where the database supports it. That is a transport-runtime +improvement that could benefit any Drupal project using concurrent Messenger +workers. + +## Observe work and failures + +Use these places first: + +- **Configuration** >> **Web services** >> *SM Ledger* for durable job state +- **Reports** >> *SM Ledger* for operational views +- **Configuration** >> **Web services** >> *SM Workers Circuit Breakers* + for downstream HTTP breaker state + +Useful commands: + +```bash +drush islandora-events:capacity-report --window-minutes=15 +drush sm:failed:show +drush sm:failed:retry +``` + +If `islandora_events_metrics` is enabled, use its Prometheus-style endpoint for +fleet-level dashboards and alerting. If `islandora_events_otel` is enabled, use +trace context recorded with ledger metadata to correlate worker activity with +external telemetry systems. + +## Failure triage + +Use this order when a job is failing: + +1. Open the ledger record in Drupal. +2. Confirm status, retry count, timing, and `last_error`. +3. Note the source entity, action or plugin, and correlation key. +4. Check the matching worker logs for the same time window. +5. Check downstream service logs when the failure came from HTTP execution, + Fedora, Blazegraph, or another remote dependency. +6. Requeue only after confirming the failure was transient or corrected. + +Use the ledger to answer "what failed?" Use logs and external telemetry to +answer "why did it fail?" + +Typical derivative failure sources: + +- HTTP timeouts +- connection refusals +- authentication failures +- invalid derivative payloads +- destination write-back failures +- local command denials because privileged command execution is disabled or the + binary is not allowlisted + +Typical indexing failure sources: + +- Fedora endpoint failures +- Blazegraph endpoint failures +- serialization or payload issues +- disabled or misconfigured targets + +Typical worker/runtime failure sources: + +- repeated worker exits +- database connectivity problems +- queue growth with no matching worker throughput +- redelivery or retry loops + +## Validation checklist + +Use this checklist when validating a deployment or major topology change: + +1. Confirm `islandora_events` is enabled and `drush sm:stats` shows the + expected transports. +2. Start the derivative and indexing workers. +3. Create or update content that should emit derivative and indexing work. +4. Confirm ledger records move through `queued`, `in_progress`, and either + `completed` or the expected retry/failure states. +5. Capture a baseline capacity snapshot with + `drush islandora-events:capacity-report --window-minutes=15`. +6. Force one downstream failure and confirm retry metadata, breaker behavior, + and logs all line up. +7. Requeue a failed or abandoned record and confirm it is processed correctly. + +## Deployment paths + +The root `docker-compose.yml` supports two operational paths: + +- **Path A: hybrid**. Keep the legacy HTTP microservices, but run Symfony + Messenger workers in the Drupal container by setting + `DRUPAL_SM_WORKERS_MODE=container`. +- **Path B: simplified**. Move Drupal onto in-container Messenger workers and + remove the legacy broker and HTTP derivative microservices after migration. + +For Path B, the legacy `DRUPAL_DEFAULT_BROKER_URL` setting is not used and +should be removed to avoid implying an ActiveMQ dependency in Drupal. + +After validating Path B in your environment, the legacy stack components that +can be removed are: + +- `activemq` +- `alpaca` +- `milliner` +- `crayfits` +- `homarus` +- `houdini` +- `hypercube` +- `mergepdf` +- `fits` +- `activemq-data` +- `ACTIVEMQ_PASSWORD` +- `ACTIVEMQ_WEB_ADMIN_PASSWORD` +- `ALPACA_JMS_PASSWORD` + +`fcrepo` does not require ActiveMQ for its REST API. If you remove +`activemq`, ensure `fcrepo` depends only on its database or relies on its own +startup retry behavior. + +## Migrate a legacy HTTP/Alpaca action + +This walkthrough shows how to migrate an existing derivative action from the +legacy ActiveMQ/Alpaca path to `islandora_events`. + +### Scenario: Generate a thumbnail + +In the legacy stack, the "Generate a thumbnail image" action emitted a +JSON-LD event onto the `islandora-connector-houdini` queue, Alpaca consumed +that event, added JWT credentials, forwarded it to Houdini over HTTP, and +Houdini posted the result back to Drupal. + +Typical legacy action shape: + +- Action: Generate a Thumbnail +- Queue: `islandora-connector-houdini` +- Event: Generate Derivative +- Destination URI: `fedora://...` + +### Step 1: verify prerequisites + +Confirm `islandora_events` is enabled and the Messenger transports exist: + +```bash +drush pm:list --status=enabled | grep islandora_events +drush sm:stats +``` + +You should see `islandora_events` enabled and the `islandora_derivatives` +transport listed. + +### Step 2: choose the execution path + +For Path A, keep the existing HTTP derivative service. `islandora_events` +dispatches onto the `islandora_derivatives` transport, the derivative handler +loads the configured action, and `HttpDerivativeExecutionStrategy` calls the +existing endpoint directly. Alpaca is no longer required for that route. + +For Path B, replace the HTTP callback flow with command execution inside the +Drupal container. The required binary must exist in the image and be explicitly +allowlisted in Drupal settings. + +### Step 3: allowlist the command runner for Path B + +Example `settings.php` configuration: + +```php +$settings['islandora_events_derivative_command'] = [ + 'enabled' => TRUE, + 'allowed_binaries' => [ + '/usr/bin/convert', + ], + 'allow_insecure_args' => FALSE, +]; +``` + +Adjust the allowlisted binary paths to match the derivative you are migrating. + +### Step 4: verify the action entity still exists + +The queueing path still loads the Drupal action entity by machine name. Check +that the action survives the migration unchanged: + +```bash +drush ev "print_r(\Drupal::entityTypeManager()->getStorage('action')->load('generate_a_thumbnail_image'));" +``` + +As long as the action entity remains present, `islandora_events` can enqueue +and execute it without the old broker bridge. + +### Step 5: disable the legacy Alpaca route + +Remove or comment out the Alpaca route that consumes the old queue, such as +`islandora-connector-houdini`. Do not leave both stacks active for the same +action or they will race to process the same derivative requests. + +### Step 6: test end to end + +Requeue work for a known entity and inspect the ledger: + +```bash +drush islandora-events:process-derivatives --limit=1 --dry-run +drush islandora-events:capacity-report --window-minutes=15 +drush ev " + \$records = \Drupal::database()->select('sm_ledger_event_record', 'r') + ->fields('r', ['id', 'status', 'action_plugin_id', 'target_system', 'created']) + ->orderBy('r.id', 'DESC')->range(0, 10)->execute()->fetchAll(); + print_r(\$records); +" +``` + +### Step 7: remove the HTTP microservice for Path B + +Once the derivative succeeds through the command runner, remove the now-unused +HTTP microservice from `docker-compose.yml` and any associated JWT or service +configuration that only supported that legacy route. + +### Rollback + +You can roll back by re-enabling the Alpaca route, but only after disabling the +`islandora_events` path for the same action. The ledger records remain valid; +the main risk is duplicate execution if both systems consume the same work. + +## See also + +- [Alpaca Tips](alpaca-tips.md) (deprecated, migration reference) +- [Installing ActiveMQ and Alpaca](../installation/manual/installing-alpaca.md) + (deprecated, migration reference) +- [Alpaca Technical Stack](../alpaca/alpaca-technical-stack.md) + (deprecated, migration reference) diff --git a/docs/technical-documentation/scaling.md b/docs/technical-documentation/scaling.md new file mode 100644 index 000000000..1a4122186 --- /dev/null +++ b/docs/technical-documentation/scaling.md @@ -0,0 +1,416 @@ +# Scaling Islandora Events + +Islandora's default Islandora Events deployment keeps the worker runtime close +to the Drupal site: + +- Drupal records ledger rows and dispatches Symfony Messenger messages +- workers consume those messages from the configured transport +- workers execute derivative and indexing work +- by default, the SQL transport uses the same database as the Drupal site + +That default is intentionally simple and works well for small and moderate +deployments. It also means the Drupal stack, worker runtime, and SQL transport +can contend for the same CPU, memory, I/O, and database capacity during large +ingests or rebuilds. + +This page explains the two main scaling levers in Islandora Events: + +1. move CPU-intensive or memory-intensive execution out of the Drupal runtime +2. move the Messenger transport backend out of the Drupal database + +The goal is to help operators choose a sensible starting topology and plan +benchmarking before production ingest begins. + +## Baseline topology + +The default topology keeps all core moving parts within the normal Drupal +deployment boundary. + +```mermaid +flowchart TD + drupal([Islandora Drupal Website]) + + drupal e1@-->|queues derivative or index job + ledger record| messenger + + subgraph runtime[Drupal Messenger Runtime] + messenger[Symfony Messenger + sm_ledger] + worker[Workers in Drupal deployment] + messenger e2@-->|worker receives message| worker + end + + subgraph services[Derivative and index targets] + fits[FITS] + homarus[Homarus] + houdini[Houdini] + hypercube[Hypercube] + fedora[(Fedora)] + blazegraph[(Blazegraph)] + end + + worker e3@--> houdini + worker --> fits + worker --> homarus + worker --> hypercube + worker --> fedora + worker --> blazegraph + + houdini e4@-.->|result or side effect| worker + worker e5@-.->|writes derivative or index result + updates ledger| drupal + + class e1 flow0; + class e2 flow1; + class e3 flow2; + class e4 flow3; + class e5 flow4; +``` + +## When to keep the default topology + +Start with the default topology when: + +- the repository is small or moderate in size +- ingest is occasional rather than continuous +- you want the simplest deployment and operational model +- queue wait stays low under expected load +- the database has enough headroom for both Drupal traffic and SQL-backed + transport work + +This is the recommended starting point for most new installations. + +## Scaling option 1: move heavy execution to external services + +Derivative and indexing workers can coordinate work while delegating the +heavyweight processing itself to external services. + +This is useful when: + +- image, video, OCR, or media processing is CPU-intensive +- command-mode derivative runners consume too much memory inside the Drupal + deployment +- you want to isolate worker orchestration from service-specific compute spikes + +In Islandora Events, this usually means keeping the worker in the Drupal +deployment while configuring the execution strategy so the heavy work happens +outside the Drupal container or host. + +### Command and HTTP execution models + +Islandora Events supports multiple worker execution definitions: + +- `execution_mode: command` runs an approved local command, often through a + `scyllaridae` wrapper and service-specific config +- `execution_mode: http` calls a remote service endpoint directly + +Those execution definitions are transport-independent. You can keep the SQL +transport in the Drupal database while still moving derivative processing to +remote services. + +### Example: move Homarus or FFmpeg-style work out of the Drupal container + +The diagram below shows the same worker flow, but with the expensive derivative +step executed by an external service instead of inside the Drupal deployment. + +```mermaid +flowchart TD + drupal([Islandora Drupal Website]) + + drupal e1@-->|queues derivative job + ledger record| messenger + + subgraph runtime[Drupal Messenger Runtime] + messenger[Symfony Messenger + sm_ledger] + worker[Derivative Worker] + messenger e2@-->|worker receives derivative message| worker + end + + subgraph external[External derivative services] + homarus[Homarus] + ffmpeg[FFmpeg-style media service] + end + + worker e3@-->|HTTP or command execution definition| homarus + worker -->|HTTP or command execution definition| ffmpeg + + homarus e4@-.->|derivative streamed back| worker + ffmpeg -.->|derivative streamed back| worker + worker e5@-.->|worker saves derivative + updates ledger| drupal + + class e1 flow0; + class e2 flow1; + class e3 flow2; + class e4 flow3; + class e5 flow4; +``` + +### What scales out in this topology + +- derivative or indexing compute shifts away from the Drupal deployment +- worker coordination, routing, and ledger projection remain in Drupal +- the Messenger transport backend stays the same unless changed separately + +### Tradeoffs + +- reduces CPU and memory pressure on the Drupal deployment +- keeps deployment simpler than introducing a new transport backend +- still leaves transport load and queue persistence in the Drupal database +- still requires enough Drupal-side capacity for worker processes and ledger + writes + +## Scaling option 2: move the Messenger transport backend out of the Drupal database + +The second scaling lever is the transport backend. + +The default Islandora transports use a SQL transport in the same database as +the Drupal site. At larger scale, database-backed transport throughput or queue +contention may become the limiting factor before derivative services do. + +When that happens, the transport backend can be moved to a dedicated messaging +system such as ActiveMQ while keeping the worker, handler, and ledger model the +same. + +### Example: swap the Drupal database transport for ActiveMQ + +```mermaid +flowchart TD + drupal([Islandora Drupal Website]) + ledger[(Drupal database
ledger + site data)] + + drupal e1@-->|records ledger row + dispatches message| transport + drupal --> ledger + + subgraph runtime[Messenger Runtime] + transport[ActiveMQ transport] + worker[Workers in Drupal deployment] + transport e2@-->|worker receives message| worker + end + + subgraph services[Derivative and index targets] + houdini[Houdini] + fedora[(Fedora)] + blazegraph[(Blazegraph)] + end + + worker e3@--> houdini + worker --> fedora + worker --> blazegraph + + houdini e4@-.->|result or side effect| worker + worker e5@-.->|updates ledger in Drupal database| ledger + + class e1 flow0; + class e2 flow1; + class e3 flow2; + class e4 flow3; + class e5 flow4; +``` + +### What changes in this topology + +- the transport queue no longer shares the Drupal site database +- workers still execute the same handlers +- `sm_ledger` still stores the durable operator projection in Drupal +- derivative and indexing services do not need redesign + +### Tradeoffs + +- increases transport throughput headroom +- reduces queue contention in the Drupal database +- introduces another operational dependency to deploy, monitor, and back up +- does not eliminate the need for idempotent handlers or ledger-based operator + state + +## Combining both scaling options + +Large deployments may need both: + +- remote derivative or indexing services for compute-heavy work +- an external transport backend for queue throughput + +That combined topology keeps the same application model: + +- ledger state stays in Drupal +- Messenger still owns delivery +- workers still own execution and emit lifecycle events +- downstream services do the expensive work + +## Planning guidance before ingest + +Choose the simplest topology that matches your expected ingest volume and +performance envelope. + +### Good initial questions + +- How many objects will be ingested in the first sustained load event? +- How many concurrent users need acceptable site response times during ingest? +- Are derivatives mostly images and PDFs, or larger video/audio workloads? +- Is the database already shared with other heavy Drupal workloads? +- Do you need burst throughput for backfills and reindexing, or mostly steady + day-to-day ingest? + +### Practical starting guidance + +- start with the default SQL transport and in-deployment workers for small and + moderate repositories +- move derivative execution to external services first when CPU or memory + contention is the main problem +- move the transport backend next when queue persistence and dequeue throughput + become the main problem +- scale by transport and workload type rather than building one undifferentiated + worker pool + +### Signals that the default topology is struggling + +- queue depth remains elevated during or after ingest +- queue wait time remains high after adding transport-specific workers +- Drupal response times degrade sharply during worker activity +- the database becomes the bottleneck rather than the downstream services +- derivative runners are starved for CPU or memory inside the Drupal runtime + +## Benchmarking methodology + +The benchmark sections below are intended to capture repeatable measurements +for different deployment topologies. Populate them with real measurements from +your environment; do not assume one topology is always superior. + +For each test run, record: + +- repository size before ingest +- ingest batch size +- object mix and derivative profile +- worker counts per transport +- CPU and memory available to Drupal, the database, and remote services +- ingest duration +- time until all queued messages finish processing +- site response time during ingest + +Use the same ingest process and the same content profile for each topology so +the results are directly comparable. + +### Collecting benchmark data + +This repository includes a benchmark harness at +[`scripts/benchmark-islandora-events.sh`](../../../scripts/benchmark-islandora-events.sh). +The harness is intended to wrap an existing ingest script rather than replace +it. + +For each run, the harness: + +- records the current maximum ledger row ID before ingest starts +- runs the ingest script +- polls `sm_ledger_event_record` until every new row has left `queued` +- records final status counts such as `completed`, `retry_due`, and `failed` +- samples homepage response time during the run +- samples host load and available memory during the run +- captures `docker stats` snapshots when Docker is available + +Example using a Workbench ingest script: + +```bash +./scripts/benchmark-islandora-events.sh \ + --url http://islandora.local/ \ + --ingest-script ./scripts/run-workbench-ingest.sh \ + --label sql-local \ + --output-dir ./benchmark-results/sql-local +``` + +The harness writes a `summary.md` file and raw sample TSV files in the selected +output directory. Use those raw files to populate the benchmark matrices below. + +## Benchmark matrix: default SQL transport and local execution + +### Environment + +Populate this section with the actual resources used for the benchmark. + +| Component | CPU | Memory | Notes | +|---|---:|---:|---| +| Drupal web + workers | TBD | TBD | | +| Database | TBD | TBD | | +| FITS / Homarus / Houdini / Hypercube | TBD | TBD | local to Drupal deployment or same host | + +### Results + +| Existing repository size | Ingest batch | Ingest duration | Time until all messages processed | Site response time during ingest | Notes | +|---:|---:|---|---|---|---| +| 10,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 100,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 500,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 1,000,000 items | 10,000 nodes/media | TBD | TBD | TBD | | + +## Benchmark matrix: SQL transport and remote service execution + +### Environment + +| Component | CPU | Memory | Notes | +|---|---:|---:|---| +| Drupal web + workers | TBD | TBD | | +| Database | TBD | TBD | | +| Remote derivative/index services | TBD | TBD | command-mode or HTTP services outside Drupal deployment | + +### Results + +| Existing repository size | Ingest batch | Ingest duration | Time until all messages processed | Site response time during ingest | Notes | +|---:|---:|---|---|---|---| +| 10,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 100,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 500,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 1,000,000 items | 10,000 nodes/media | TBD | TBD | TBD | | + +## Benchmark matrix: ActiveMQ transport + +### Environment + +| Component | CPU | Memory | Notes | +|---|---:|---:|---| +| Drupal web + workers | TBD | TBD | | +| Database | TBD | TBD | ledger + Drupal site data | +| ActiveMQ | TBD | TBD | transport backend | +| Derivative/index services | TBD | TBD | note whether execution stayed local or moved remote | + +### Results + +| Existing repository size | Ingest batch | Ingest duration | Time until all messages processed | Site response time during ingest | Notes | +|---:|---:|---|---|---|---| +| 10,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 100,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 500,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 1,000,000 items | 10,000 nodes/media | TBD | TBD | TBD | | + +## Benchmark matrix: legacy Alpaca and ActiveMQ comparison + +Use this section for an apples-to-apples comparison with the previous +Alpaca-based architecture. + +### Environment + +| Component | CPU | Memory | Notes | +|---|---:|---:|---| +| Drupal web | TBD | TBD | | +| ActiveMQ | TBD | TBD | | +| Alpaca | TBD | TBD | | +| Downstream services | TBD | TBD | | + +### Results + +| Existing repository size | Ingest batch | Ingest duration | Time until all messages processed | Site response time during ingest | Notes | +|---:|---:|---|---|---|---| +| 10,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 100,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 500,000 items | 10,000 nodes/media | TBD | TBD | TBD | | +| 1,000,000 items | 10,000 nodes/media | TBD | TBD | TBD | | + +## Interpreting the benchmark results + +Look at all three metrics together: + +- ingest duration shows how long it takes to submit the workload +- time until all messages are processed shows the actual backlog drain time +- site response time during ingest shows whether the topology remains usable for + interactive users + +A topology with the fastest ingest is not always the best choice if user-facing +response times collapse during the run. + +## Related documentation + +- [Islandora Architecture](diagram.md) +- [Islandora Events](islandora-events.md) diff --git a/docs/user-documentation/versioning.md b/docs/user-documentation/versioning.md index 33215a434..082de491c 100644 --- a/docs/user-documentation/versioning.md +++ b/docs/user-documentation/versioning.md @@ -13,6 +13,6 @@ Fedora implements the [Memento](http://mementoweb.org/about/) specification for ## Basic Data Flow 1. A node or media object is created or updated in Drupal. -2. When an entity is revisionable, and it isn't the initial creation, it [adds a flag](https://github.com/Islandora/islandora/blob/8.x-1.x/src/EventGenerator/EventGenerator.php#L109) to the event object that gets passed to Alpaca. -3. The [islandora-indexing-fcrepo module](https://github.com/Islandora/Alpaca/tree/dev/islandora-indexing-fcrepo) of Alpaca looks for that flag and fires a call to the [versioning endpoint](https://github.com/Islandora/Crayfish/blob/dev/Milliner/src/app.php#L52) of [Milliner](https://github.com/Islandora/Crayfish/tree/dev/Milliner). -4. Milliner uses the [Chullo library](https://github.com/Islandora/chullo/blob/dev/src/FedoraApi.php#L320) to [create a version](https://github.com/Islandora/Crayfish/blob/dev/Milliner/src/Service/MillinerService.php#L551) in Fedora. +2. When a revisionable entity is updated with a new Drupal revision, `islandora_events_fcrepo` records that the Fedora indexing run should also create a version snapshot. +3. The Fedora indexing worker updates the live Fedora resource directly from Drupal-managed JSON or JSON-LD. +4. The same Drupal worker then creates a Fedora memento for that updated resource, so Fedora versioning happens without Alpaca or Milliner. diff --git a/mkdocs.yml b/mkdocs.yml index a3bff8f6b..518fb10c5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -308,6 +308,8 @@ nav: - Developer Resources: - 'Stack Overview': 'installation/component-overview.md' - 'Islandora Architecture': 'technical-documentation/diagram.md' + - 'Islandora Events': 'technical-documentation/islandora-events.md' + - 'Scaling Islandora Events': 'technical-documentation/scaling.md' - REST Documentation: - 'Introduction': 'technical-documentation/using-rest-endpoints.md' - 'Authorization': 'technical-documentation/rest-authorization.md'