PyBullet simulation of a Franka Emika Panda robot doing pick-and-place using Image-Based Visual Servoing (IBVS). Closed-loop control maps wrist-camera pixel errors directly to Cartesian end-effector velocities, enabling accurate grasping under localization noise.
- IBVS controller: Pixel-error to Cartesian velocity mapping with wrist-mounted camera
- Multi-object scenes: Spawn and sequentially pick multiple YCB objects
- Pluggable detectors:
gt— ground-truth from simulation statecolor— HSV segmentationgrounding_dino— zero-shot open-vocabulary detectionsam— Segment Anything Model
- Adaptive grasp height: Queries object Z-coordinate to avoid collisions with varying object heights
- ROS 2 + RViz integration: Joint state and camera image publishing for real-time visualization
- Simulation: PyBullet
- Robot model: Franka Emika Panda (URDF)
- Perception: OpenCV, GroundingDINO, SAM (Segment Anything)
- ROS: ROS 2 Humble (optional)
- Python: 3.9+
visiontouch/
├── control/
│ ├── servoing.py # IBVS controller core
│ ├── controller.py # Joint-space controller
│ ├── kinematics.py # Forward/inverse kinematics
│ ├── pick_place.py # High-level pick-place orchestration
│ └── gripper.py
├── perception/
│ ├── detector.py # Detector interface
│ ├── gt_detector.py # Ground-truth via sim state
│ ├── color_detector.py # HSV-based segmentation
│ ├── grounding_detector.py # GroundingDINO zero-shot
│ ├── sam_detector.py # SAM-based detection
│ └── transforms.py # Image <-> world projections
├── simulation/
│ ├── environment.py # PyBullet scene setup
│ ├── camera.py # Wrist camera model
│ └── objects.py # YCB object loading
└── ros/
└── bridge.py # ROS 2 publisher bridge
scripts/
├── demo.py # Main demo entrypoint
├── evaluate.py # Quantitative eval
└── run_with_rviz.py # ROS 2 + RViz launch
git clone https://github.com/shrirag10/VisionTouch.git
cd VisionTouch
pip install -r requirements.txtFor GroundingDINO/SAM: download pretrained weights into models/.
# Multi-object pick-place (GT detector)
python3 scripts/demo.py --mode pick-place --detector gt
# Target a specific YCB object
python3 scripts/demo.py --mode pick-place --detector gt --object foam_brick
# Hover/track only (no grasp)
python3 scripts/demo.py --mode hover --detector gt --object banana
# With RViz (requires ROS 2 Humble)
source /opt/ros/humble/setup.bash
python3 scripts/run_with_rviz.py --mode pick-place --detector gt- IBVS axis inversion: Downward-pointing wrist camera mirrors world-frame axes; velocity mappings must be corrected or the robot diverges from the target.
- Proximity constraint: Grasping constraint applied only when gripper is within 0.25m to prevent object teleportation.
- False convergence filter: Detections below 0.01 confidence are rejected to avoid blind descents when the object leaves frame.
- Camera FOV: 90 degrees required; 60 degrees creates detection blind spots near table edges.
MIT