As a part of oxidecomputer/omicron#10167, sled-agent running on each scrimlet wants to ensure the NAT entries for all Omicron services are correct on its switch zone's dpd. The information it has is a generation-numbered-guarded declarative set of "entries that should exist".
The existing dpd API makes it easy to ensure each of those entries does exist, but I don't think there's an easy way for sled-agent to remove stale Omicron service NAT entries that shouldn't exist. It doesn't know what entries might have existed in the past; only what should exist now.
I have a prototype branch that makes this work, but it's pretty gnarly: it reads all current NAT entries from dpd across a series of API calls:
- For each IP returned by looping over all pages of
nat_ipv4_addresses_list, fetch all NAT entries associated with that IP via nat_ipv4_list.
- For each IP returned by looping over all pages of
nat_ipv6_addresses_list, fetch all NAT entries associated with that IP via nat_ipv6_list.
As a heuristic for "what entries exist that shouldn't", it filters all the returned NAT entries down to just those with a VNI matching the Omicron services VNI. Now it can diff locally and decide whether any entries exist that shouldn't, and delete those.
This issue is similar in spirit to #209. In this specific case we don't necessarily need a generation number on the dpd side for correctness, since there is only one client (the local sled-agent) and it already guards changes via a generation number of its own. But I could imagine wanting a similar kind of "bulk" API if we wanted to invert the Nexus <-> dpd NAT syncing path (which we do at some point - this is #83), and it would be easy for us to pass through the generation number we do have if needed.
I'm not sure exactly what shape this should be. The features I'd like are:
- Some way of identifying an entire set of related NAT entries. I'm using VNI to do this now - maybe that's fine / correct? Or there could be some metadata that described a group of "all Omicron service entries".
- Either:
- A single "ensure" API that takes a generation-guarded, identified-by-group set of NAT entries, and then
dpd internally removes any stale entries that are in the same group but not in the new request, and adds/updates all entries that are in the new request. (I believe this is similar to the way BGP is configured in mgd - there's a single bgp_apply() endpoint that takes the config that should exist, then internally does whatever it needs to do to make that happen.)
- A bulk "get" API (presumably paginated) that returns all the current entries in a group. This would be more similar to the prototype branch I have, but the filter-down-to-a-group would happen on the
dpd side instead of the client needing to fetch all NAT entries. Given this, the client can do its own diffing and then use the existing add/remove APIs to make updates as needed.
I think the first of those options is more generally useful - the second is only useful if there's a single client that can't race with others, since it opens an obvious TOCTOU between the "bulk get" and the "local diff then apply changes". But either would be a big improvement for the current project, and I assume (perhaps naively) that the second would be a smaller lift.
As a part of oxidecomputer/omicron#10167, sled-agent running on each scrimlet wants to ensure the NAT entries for all Omicron services are correct on its switch zone's
dpd. The information it has is a generation-numbered-guarded declarative set of "entries that should exist".The existing
dpdAPI makes it easy to ensure each of those entries does exist, but I don't think there's an easy way for sled-agent to remove stale Omicron service NAT entries that shouldn't exist. It doesn't know what entries might have existed in the past; only what should exist now.I have a prototype branch that makes this work, but it's pretty gnarly: it reads all current NAT entries from
dpdacross a series of API calls:nat_ipv4_addresses_list, fetch all NAT entries associated with that IP vianat_ipv4_list.nat_ipv6_addresses_list, fetch all NAT entries associated with that IP vianat_ipv6_list.As a heuristic for "what entries exist that shouldn't", it filters all the returned NAT entries down to just those with a VNI matching the Omicron services VNI. Now it can diff locally and decide whether any entries exist that shouldn't, and delete those.
This issue is similar in spirit to #209. In this specific case we don't necessarily need a generation number on the
dpdside for correctness, since there is only one client (the local sled-agent) and it already guards changes via a generation number of its own. But I could imagine wanting a similar kind of "bulk" API if we wanted to invert the Nexus <-> dpd NAT syncing path (which we do at some point - this is #83), and it would be easy for us to pass through the generation number we do have if needed.I'm not sure exactly what shape this should be. The features I'd like are:
dpdinternally removes any stale entries that are in the same group but not in the new request, and adds/updates all entries that are in the new request. (I believe this is similar to the way BGP is configured inmgd- there's a singlebgp_apply()endpoint that takes the config that should exist, then internally does whatever it needs to do to make that happen.)dpdside instead of the client needing to fetch all NAT entries. Given this, the client can do its own diffing and then use the existing add/remove APIs to make updates as needed.I think the first of those options is more generally useful - the second is only useful if there's a single client that can't race with others, since it opens an obvious TOCTOU between the "bulk get" and the "local diff then apply changes". But either would be a big improvement for the current project, and I assume (perhaps naively) that the second would be a smaller lift.