Skip to content

NMS-19560: Pre populate metadata for each service#8319

Draft
cgorantla wants to merge 1 commit intodevelopfrom
cg/jira/NMS-19560
Draft

NMS-19560: Pre populate metadata for each service#8319
cgorantla wants to merge 1 commit intodevelopfrom
cg/jira/NMS-19560

Conversation

@cgorantla
Copy link
Contributor

Retrieve metadata once for each Poller request instead of retrieving it for each RPC invoke.
Whenever metadata updates, we will refresh the request and interpolate those again.
Runtime attributes will still be called for each RPC request

External References

Retreive metadata before once for each service instead of retrieving
it for each RPC invoke.
Whenever metadata updates, we should clear parameters for the service
Copy link
Contributor Author

@cgorantla cgorantla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this thoroughly that's why I kept it in Draft but I think this should work. I guess we can achieve similar pattern on Collectd/Thresholding.

Copy link
Contributor

@christianpape christianpape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know if it's a good idea to tackle the performance problem in this way. For me, it carries too many risks of indeterminate behaviour, which is then very difficult to investigate.

My idea for optimisation is as follows. Currently, the interpolator methods work with scope instances. I would change this to ScopeProvider, i.e. the scope is not always generated, but only when a string actually contains a metadata expression. If it works as I imagine, this would eliminate all getScopeFor...() calls for the thresholds, as we do not have any metadata expressions there in the default configuration. And even for other spots in OpenNMS this will eliminate a lot of calls to the EntityScopeProvider.

final Map<String, Object> raw = m_configService.getParameterMap();
if (m_entityScopeProvider != null) {
final Scope scope = new FallbackScope(
m_entityScopeProvider.getScopeForNode(m_service.getNodeId()),
Copy link
Contributor

@christianpape christianpape Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our current implementation of a Scope retrieves for instance a node entity and add calls to the entity's properties. So, it did not fill in strings but calls the entity's method on demand. This means DAO objects are kept inside the scope and additional handling for node updates isn't necessary I think. My attempt to keep MapScope objects in the cache was not as successful as I had hoped. Unfortunately, this approach created additional load on the database whenever the cache was refreshed.

}
intf.removeMetaData(context);
getDao().update(intf);
sendNodeMetadataUpdatedEvent(intf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A poller scope consist of a service scope, an interface scope and a node scope. The node scope includes asset information and stuff like sys-object-id, the interface scope things like if-description. So, just handling the change of meta-data isn't enough I think. We also need to send events when node, interfaces and services are altered via ReST. Also a simple rescan can change some of the node's fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, two events I'm keeping track of, node updated and node scan completed. Both of these can take care of all the updates. If we gap somewhere in our Rest API, we can add those.

final Scope scope = new FallbackScope(
m_entityScopeProvider.getScopeForNode(m_service.getNodeId()),
m_entityScopeProvider.getScopeForInterface(m_service.getNodeId(), m_service.getIpAddr()),
m_entityScopeProvider.getScopeForService(m_service.getNodeId(), m_service.getAddress(), m_service.getSvcName()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in this case scopes and interfaces are retrieved more than neccessary. Of course, this will happen only when scheduling, but it isn't distributed over the different poller calls over time. This may have an impact on scheduled provisiond runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once refresh happens, we clear current interpolated parameters and getParameters happens on it's own schedule. We are not fetching anything that is not necessary. It's the same code in PollerRequestBuilder that moved to here. Instead of doing it for each service schedule, we are doing it once at the beginning. Now parameters are holding actual interpolated params instead or raw.

@cgorantla
Copy link
Contributor Author

I really don't know if it's a good idea to tackle the performance problem in this way. For me, it carries too many risks of indeterminate behaviour, which is then very difficult to investigate.

My idea for optimisation is as follows. Currently, the interpolator methods work with scope instances. I would change this to ScopeProvider, i.e. the scope is not always generated, but only when a string actually contains a metadata expression. If it works as I imagine, this would eliminate all getScopeFor...() calls for the thresholds, as we do not have any metadata expressions there in the default configuration. And even for other spots in OpenNMS this will eliminate a lot of calls to the EntityScopeProvider.

yeah, that makes sense. That should definitely help. If we don't have metadata expressions, we don't load any scopes and DB calls.

My current implementation in this PR moves interpolation from every poll cycle to once per service at the start. And for every node update and node scan complete, we clear the parameters. And next service schedule would do interpolation again. This doesn't introduce any new bottlenecks and DB calls are still per-service and still staggered by scheduler . Only thing is that we need to ensure all node metadata updates are taken care by events. We already have event based lifecycle updates in Polling/Collectd and Threshold and we should be able to handle this.

We can test all of this more broadly and decide on next steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants