Skip to content

dpd: Want a better API for bulk NAT entry operations #255

@jgallagher

Description

@jgallagher

As a part of oxidecomputer/omicron#10167, sled-agent running on each scrimlet wants to ensure the NAT entries for all Omicron services are correct on its switch zone's dpd. The information it has is a generation-numbered-guarded declarative set of "entries that should exist".

The existing dpd API makes it easy to ensure each of those entries does exist, but I don't think there's an easy way for sled-agent to remove stale Omicron service NAT entries that shouldn't exist. It doesn't know what entries might have existed in the past; only what should exist now.

I have a prototype branch that makes this work, but it's pretty gnarly: it reads all current NAT entries from dpd across a series of API calls:

  • For each IP returned by looping over all pages of nat_ipv4_addresses_list, fetch all NAT entries associated with that IP via nat_ipv4_list.
  • For each IP returned by looping over all pages of nat_ipv6_addresses_list, fetch all NAT entries associated with that IP via nat_ipv6_list.

As a heuristic for "what entries exist that shouldn't", it filters all the returned NAT entries down to just those with a VNI matching the Omicron services VNI. Now it can diff locally and decide whether any entries exist that shouldn't, and delete those.

This issue is similar in spirit to #209. In this specific case we don't necessarily need a generation number on the dpd side for correctness, since there is only one client (the local sled-agent) and it already guards changes via a generation number of its own. But I could imagine wanting a similar kind of "bulk" API if we wanted to invert the Nexus <-> dpd NAT syncing path (which we do at some point - this is #83), and it would be easy for us to pass through the generation number we do have if needed.

I'm not sure exactly what shape this should be. The features I'd like are:

  • Some way of identifying an entire set of related NAT entries. I'm using VNI to do this now - maybe that's fine / correct? Or there could be some metadata that described a group of "all Omicron service entries".
  • Either:
    • A single "ensure" API that takes a generation-guarded, identified-by-group set of NAT entries, and then dpd internally removes any stale entries that are in the same group but not in the new request, and adds/updates all entries that are in the new request. (I believe this is similar to the way BGP is configured in mgd - there's a single bgp_apply() endpoint that takes the config that should exist, then internally does whatever it needs to do to make that happen.)
    • A bulk "get" API (presumably paginated) that returns all the current entries in a group. This would be more similar to the prototype branch I have, but the filter-down-to-a-group would happen on the dpd side instead of the client needing to fetch all NAT entries. Given this, the client can do its own diffing and then use the existing add/remove APIs to make updates as needed.

I think the first of those options is more generally useful - the second is only useful if there's a single client that can't race with others, since it opens an obvious TOCTOU between the "bulk get" and the "local diff then apply changes". But either would be a big improvement for the current project, and I assume (perhaps naively) that the second would be a smaller lift.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions