fix: Fix segfault during shutdown when using Triton metrics in Python backend by aleksn7 · Pull Request #429 · triton-inference-server/python_backend

aleksn7 · 2026-02-10T12:15:37Z

Problem
When shutting down the Triton Inference Server with Python backend while using Triton metrics, a segmentation fault occurs. This happens because Metric::Clear attempts to access the Stub singleton during the Stub destructor execution.

Stack trace:

#0  triton::backend::python::Metric::SaveToSharedMemory()
#1  triton::backend::python::Metric::Clear()
#2  triton::backend::python::MetricFamily::~MetricFamily()
#3  std::_Sp_counted_base<>::_M_release()
#4  pybind11::class_<MetricFamily>::dealloc()
#5  pybind11_object_dealloc()
#6  libpython3.12.so.1.0
#7  libpython3.12.so.1.0
#8  triton::backend::python::Stub::~Stub()
#9  main()

Solution
Added a new DestroyPythonObjects() method to the Stub class that explicitly releases all Python objects before the Stub destructor is invoked. This ensures proper destruction order:

DestroyPythonObjects() is called first, releasing all Python-bound objects (including MetricFamily instances)
Stub::~Stub() is called afterward, when no Python objects depend on it
This approach guarantees that MetricFamily and other Python objects are fully destroyed while Stub is still valid and accessible.

To reproduce
Run triton server with custom metric example model and shut it down by kill -2 <main triton server process pid>

whoisj · 2026-04-10T18:26:28Z

@aleksn7 thank you for your contribution. Have you completed the required steps outlined in our Contributing section?

Specifically, have you submitted a signed CLA? If not, please do so.

whoisj · 2026-04-10T21:49:47Z

  // objects. If the scoped_interpreter is destroyed before the stub object,
  // this process will no longer hold the GIL lock and destruction of the stub
  // will result in segfault.
+  stub->DestroyPythonObjects();


this makes sense, but please move the call to Stub::DestroyInstance().

whoisj · 2026-04-10T21:50:07Z

            non_graceful_exit = true;

            // Destroy stub and exit.
+            stub->DestroyPythonObjects();


this makes sense, but please move this call to Stub::DestroyInstance().

Copilot

Pull request overview

Fixes a shutdown-time segmentation fault in the Python backend when Triton custom metrics are enabled by ensuring Python-bound objects are released in a safe order relative to the Stub singleton teardown.

Changes:

Added Stub::DestroyPythonObjects() to explicitly drop Python-owned objects before Stub destruction.
Updated Stub::DestroyInstance() to call DestroyPythonObjects() prior to resetting the singleton.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
src/pb_stub.h	Declares the new `DestroyPythonObjects()` API and documents intended destructor ordering.
src/pb_stub.cc	Calls `DestroyPythonObjects()` during singleton destruction and implements the cleanup routine.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T10:06:43Z

@@ -1088,6 +1088,7 @@ Stub::GetOrCreateInstance()
 void
 Stub::DestroyInstance()
 {


Stub::DestroyInstance() unconditionally dereferences stub_instance. If DestroyInstance() is called before GetOrCreateInstance() (or called twice), this will crash. Add a null check (e.g., early-return if !stub_instance) before calling DestroyPythonObjects() / reset().

Suggested change

{

{

if (!stub_instance) {

return;

}

Copilot · 2026-04-13T10:06:43Z

+    py::gil_scoped_acquire acquire;
+    py::object async_event_loop_local(std::move(async_event_loop_));
+    py::object background_futures_local(std::move(background_futures_));
+    py::object model_instance_local(std::move(model_instance_));


DestroyPythonObjects() only clears async_event_loop_, background_futures_, and model_instance_, but Stub also owns other py::object members (deserialize_bytes_, serialize_bytes_). If those remain non-empty, they will be decref'd later during Stub destruction (potentially after py::scoped_interpreter teardown / without the GIL), which can still segfault. Consider moving/clearing all py::object members here (and ideally reuse this helper from ~Stub() to keep the cleanup logic in one place).

Suggested change

py::object model_instance_local(std::move(model_instance_));

py::object model_instance_local(std::move(model_instance_));

py::object deserialize_bytes_local(std::move(deserialize_bytes_));

py::object serialize_bytes_local(std::move(serialize_bytes_));

@whoisj What do you think about this? Should we listen copilot here?

if the other fields are not a problem, then it doesn't matter, honestly.

Copilot · 2026-04-13T10:06:43Z


+  /// Cleans up Python objects and must be called before the destructor.
+  /// This prevents problems that occur when Python object destructors
+  /// call Stub::GetOrCreate.


The comment mentions Stub::GetOrCreate, but the singleton accessor is Stub::GetOrCreateInstance(). Update the comment to match the actual API to avoid confusion when debugging shutdown/destructor-order issues.

Suggested change

/// call Stub::GetOrCreate.

/// call Stub::GetOrCreateInstance().

whoisj

LGTM. @yinggeh can you add you review as well?

whoisj · 2026-04-14T21:32:31Z

This is still blocking:

@aleksn7 thank you for your contribution. Have you completed the required steps outlined in our Contributing section?

Specifically, have you submitted a signed CLA? If not, please do so.

aleksn7 · 2026-04-16T08:37:38Z

This is still blocking:

@aleksn7 thank you for your contribution. Have you completed the required steps outlined in our Contributing section?

Specifically, have you submitted a signed CLA? If not, please do so.

I'm working on it

Fix segfault in python object destruction

b24c8a7

aleksn7 force-pushed the main branch from d0b4ec6 to b24c8a7 Compare February 10, 2026 16:30

Style fix

1547ae8

aleksn7 mentioned this pull request Feb 16, 2026

Segmentation fault on shutdown when using Python backend metrics triton-inference-server/server#8662

Open

Merge branch 'main' into main

3bb3db7

whoisj requested a review from yinggeh April 10, 2026 18:27

yinggeh requested review from pskiran1 and whoisj April 10, 2026 18:37

yinggeh added the bug Something isn't working label Apr 10, 2026

whoisj requested changes Apr 10, 2026

View reviewed changes

Merge branch 'triton-inference-server:main' into main

59cbe6d

yinggeh requested a review from Copilot April 13, 2026 10:02

Copilot started reviewing on behalf of yinggeh April 13, 2026 10:02 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Move Stub::DestroyPythonObjects call into Stub::DestroyInstance

a7d9cb8

aleksn7 force-pushed the main branch from c8d0d54 to a7d9cb8 Compare April 13, 2026 10:18

aleksn7 requested a review from whoisj April 13, 2026 10:19

Explicitly destroy deserialize_bytes_local & serialize_bytes_local

cda7ced

whoisj approved these changes Apr 14, 2026

View reviewed changes

Add DestroyPythonObjects call into destructor

e967f7c

-{
+{
+  if (!stub_instance) {
+    return;
+  }

	/// call Stub::GetOrCreate.
	/// call Stub::GetOrCreateInstance().

Conversation

aleksn7 commented Feb 10, 2026

Uh oh!

whoisj commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whoisj Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

aleksn7 Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

whoisj Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

aleksn7 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

aleksn7 Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

whoisj Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

aleksn7 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

whoisj commented Apr 14, 2026

Uh oh!

aleksn7 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

whoisj commented Apr 10, 2026 •

edited

Loading

aleksn7 Apr 13, 2026 •

edited

Loading

aleksn7 Apr 13, 2026 •

edited

Loading