Summary
In foundationpose_decoder.cpp, the tf_to_center translation sign is inverted compared to the original Python implementation, causing a systematic translation error of 2 * R * mesh_model_center in the output pose.
Root Cause
The mesh vertices are centered during loading by subtracting mesh_model_center:
// mesh_storage.cpp:210-212
vertices.push_back(mesh->mVertices[v].x - mesh_data_->mesh_model_center[0]);
vertices.push_back(mesh->mVertices[v].y - mesh_data_->mesh_model_center[1]);
vertices.push_back(mesh->mVertices[v].z - mesh_data_->mesh_model_center[2]);
The internal pose maps centered vertices to camera coordinates:
p_cam = R * v_centered + t = R * (v_original - c) + t = R * v_original + (t - R*c)
So the API pose translation should be t_api = t - R*c, requiring tf_to_center = [I | -c].
However, the decoder uses +mesh_model_center:
// foundationpose_decoder.cpp:173-176
// Add the distance from edge to the center because
Eigen::Matrix4f tf_to_center = Eigen::Matrix4f::Identity();
tf_to_center.block<3, 1>(0, 3) = mesh_data_ptr->mesh_model_center; // should be negative
pose_matrix = pose_matrix * tf_to_center;
This produces t_api = t + R*c instead of t - R*c, introducing a 2*R*c offset.
Reference: Original Python Implementation
The official Python FoundationPose (estimater.py) correctly uses the negative sign:
# bundlesdf/foundation_pose/estimater.py
def get_tf_to_centered_mesh(self):
tf_to_center = torch.eye(4, dtype=torch.float, device='cuda')
tf_to_center[:3,3] = -torch.as_tensor(self.model_center, device='cuda', dtype=torch.float) # negative
return tf_to_center
Impact
- Translation error magnitude:
2 * ||mesh_model_center||
- For a mesh with bbox center at ~22mm from origin → ~44mm systematic error
- Error norm is constant across all scenes, but XYZ components vary with camera viewpoint (because of the
R * factor)
- Rotation estimation is unaffected
Proposed Fix
- tf_to_center.block<3, 1>(0, 3) = mesh_data_ptr->mesh_model_center;
+ tf_to_center.block<3, 1>(0, 3) = -mesh_data_ptr->mesh_model_center;
Reproduction
Run FoundationPose on any mesh whose bounding box center is not at origin. Compare the C++ output pose with the Python estimater.py output on the same input. The translation will differ by 2 * R * mesh_model_center.
Summary
In
foundationpose_decoder.cpp, thetf_to_centertranslation sign is inverted compared to the original Python implementation, causing a systematic translation error of2 * R * mesh_model_centerin the output pose.Root Cause
The mesh vertices are centered during loading by subtracting
mesh_model_center:The internal pose maps centered vertices to camera coordinates:
So the API pose translation should be
t_api = t - R*c, requiringtf_to_center = [I | -c].However, the decoder uses
+mesh_model_center:This produces
t_api = t + R*cinstead oft - R*c, introducing a2*R*coffset.Reference: Original Python Implementation
The official Python FoundationPose (
estimater.py) correctly uses the negative sign:Impact
2 * ||mesh_model_center||R *factor)Proposed Fix
Reproduction
Run FoundationPose on any mesh whose bounding box center is not at origin. Compare the C++ output pose with the Python
estimater.pyoutput on the same input. The translation will differ by2 * R * mesh_model_center.