feat: provision per-host Postgres user for RAG service instances#299
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable the changed files summary in the walkthrough.Disable the |
57345d8 to
4392b87
Compare
|
@coderabbitai review |
|
🧠 Learnings used✅ Actions performedReview triggered.
|
| defer conn.Close(ctx) | ||
|
|
||
| var exists bool | ||
| err = conn.QueryRow(ctx, |
There was a problem hiding this comment.
Non-blocking suggestion: the inverse of this query is repeated in server/internal/postgres/roles.go in CreateUserRole. We could refactor it out into its own function and reuse it here and in CreateUserRole, like:
func UserRoleNeedsCreate(name string) Query[bool] {
return Query[bool]{
SQL: "SELECT NOT EXISTS (SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = @name);",
Args: pgx.NamedArgs{
"name": name,
},
}
}Then you would invert the logic here:
needsCreate, err := postgres.UserRoleNeedsCreate(r.Username).Exec(ctx, conn)
if err != nil {
// ...
}
if needsCreate {
return resource.ErrNotFound
}| logger.Info().Msg("creating RAG service user role") | ||
|
|
||
| r.Username = database.GenerateServiceUsername(r.ServiceInstanceID) | ||
| password, err := utils.RandomString(32) |
There was a problem hiding this comment.
It's not safe to rely on Create only getting called once, so we need to try and make all resource lifecycle methods idempotent. Could you change this to only generate a new password if the password is unset?:
if r.Password == "" {
password, err := utils.RandomString(32)
if err != nil {
return fmt.Errorf("failed to generate password: %w", err)
}
r.Password = password
}| ).Scan(&exists) | ||
| if err != nil { | ||
| // On query failure, assume it exists | ||
| logger.Warn().Err(err).Msg("pg_roles query failed, assuming RAG role exists") |
There was a problem hiding this comment.
This doesn't seem like a safe assumption to me. IMO we should return this error and fail the Refresh so that the user can resolve the issue and retry the operation.
| Logger() | ||
| logger.Info().Msg("deleting RAG service user from database") | ||
|
|
||
| conn, err := r.connectToColocatedPrimary(ctx, rc, logger, "postgres") |
There was a problem hiding this comment.
Change this to connect to r.Database so that Spock will replicate the user to every instance.
| return nil | ||
| } | ||
|
|
||
| // connectToColocatedPrimary finds the primary Postgres instance on the same |
There was a problem hiding this comment.
The things we're doing in connectToColocatedPrimary, resolveColocatedPrimary, colocatedInstances, and findPrimaryAmong are not necessary. You can rely on two things in this resource:
- Your lifecycle methods are being evaluated on the host with the primary instance because you're using
resource.PrimaryExecutor(r.NodeName) - As long as you connect to
r.DatabaseName, Spock will replicate whichever operations you perform on the user
So, you can use the same methods we use in subscription_resource.go:
primary, err := database.GetPrimaryInstance(ctx, rc, r.NodeName)
if err != nil {
return fmt.Errorf("failed to get primary instance: %w", err)
}
conn, err := primary.Connection(ctx, rc, r.DatabaseName)
if err != nil {
return fmt.Errorf("failed to connect to database %s on node %s: %w", subscriber.Spec.DatabaseName, s.SubscriberNode, err)
}
defer conn.Close(ctx)| // The role is created on the primary of the co-located Postgres instance | ||
| // (same HostID) and granted the pgedge_application_read_only built-in role. | ||
| type RAGServiceUserRole struct { | ||
| ServiceInstanceID string `json:"service_instance_id"` |
There was a problem hiding this comment.
We only need one of these resources per rag service, so we can remove this property.
| ServiceInstanceID string `json:"service_instance_id"` |
| } | ||
|
|
||
| func (r *RAGServiceUserRole) Identifier() resource.Identifier { | ||
| return RAGServiceUserRoleIdentifier(r.ServiceInstanceID) |
There was a problem hiding this comment.
We only need one of these per service, so this should use r.ServiceID:
| return RAGServiceUserRoleIdentifier(r.ServiceInstanceID) | |
| return RAGServiceUserRoleIdentifier(r.ServiceID) |
Summary
This PR introduces RAGServiceUserRole, a new resource that creates a dedicated PostgreSQL database user for each RAG service instance running on its co-located host.
Changes
rag_service_user_role.go— New resource keyed byserviceInstanceID(not serviceID). HandlesCreate/Delete/Refreshlifecycle. Refresh queriespg_catalog.pg_rolesto verify role existence.connectToColocatedPrimaryfilters instances by HostID before Patroni primary lookup, ensuring the role lands on the correct node.orchestrator.go — Refactored
GenerateServiceInstanceResourcesto dispatch by ServiceType. MCP logic extracted intogenerateMCPInstanceResources. NewgenerateRAGInstanceResourcesand sharedbuildServiceInstanceResourceshelper added.resources.go — Registers
ResourceTypeRAGServiceUserRole.plan_update.go — Skips
ServiceInstanceMonitorResourcefor RAG instances since no Docker container exists yet (swarm.service_instance dependency would be unsatisfied).Testing
Manual Verification:
Created Cluster
Created a database using the following command:
restish control-plane-local-1 create-database < ../demo/488/rag_create_db.json
rag_create_db.json
The database created successfully
Connect to db and confirm that rg service user created
Checklist
Notes for Reviewers
PLAT-489