Skip to content

shrirag10/VisionTouch

Repository files navigation

VisionTouch

PyBullet simulation of a Franka Emika Panda robot doing pick-and-place using Image-Based Visual Servoing (IBVS). Closed-loop control maps wrist-camera pixel errors directly to Cartesian end-effector velocities, enabling accurate grasping under localization noise.

Features

  • IBVS controller: Pixel-error to Cartesian velocity mapping with wrist-mounted camera
  • Multi-object scenes: Spawn and sequentially pick multiple YCB objects
  • Pluggable detectors:
    • gt — ground-truth from simulation state
    • color — HSV segmentation
    • grounding_dino — zero-shot open-vocabulary detection
    • sam — Segment Anything Model
  • Adaptive grasp height: Queries object Z-coordinate to avoid collisions with varying object heights
  • ROS 2 + RViz integration: Joint state and camera image publishing for real-time visualization

Tech Stack

  • Simulation: PyBullet
  • Robot model: Franka Emika Panda (URDF)
  • Perception: OpenCV, GroundingDINO, SAM (Segment Anything)
  • ROS: ROS 2 Humble (optional)
  • Python: 3.9+

Architecture

visiontouch/
├── control/
│   ├── servoing.py         # IBVS controller core
│   ├── controller.py       # Joint-space controller
│   ├── kinematics.py       # Forward/inverse kinematics
│   ├── pick_place.py       # High-level pick-place orchestration
│   └── gripper.py
├── perception/
│   ├── detector.py         # Detector interface
│   ├── gt_detector.py      # Ground-truth via sim state
│   ├── color_detector.py   # HSV-based segmentation
│   ├── grounding_detector.py  # GroundingDINO zero-shot
│   ├── sam_detector.py     # SAM-based detection
│   └── transforms.py       # Image <-> world projections
├── simulation/
│   ├── environment.py      # PyBullet scene setup
│   ├── camera.py           # Wrist camera model
│   └── objects.py          # YCB object loading
└── ros/
    └── bridge.py           # ROS 2 publisher bridge
scripts/
├── demo.py                 # Main demo entrypoint
├── evaluate.py             # Quantitative eval
└── run_with_rviz.py        # ROS 2 + RViz launch

Setup

git clone https://github.com/shrirag10/VisionTouch.git
cd VisionTouch
pip install -r requirements.txt

For GroundingDINO/SAM: download pretrained weights into models/.

Usage

# Multi-object pick-place (GT detector)
python3 scripts/demo.py --mode pick-place --detector gt

# Target a specific YCB object
python3 scripts/demo.py --mode pick-place --detector gt --object foam_brick

# Hover/track only (no grasp)
python3 scripts/demo.py --mode hover --detector gt --object banana

# With RViz (requires ROS 2 Humble)
source /opt/ros/humble/setup.bash
python3 scripts/run_with_rviz.py --mode pick-place --detector gt

Engineering Notes

  • IBVS axis inversion: Downward-pointing wrist camera mirrors world-frame axes; velocity mappings must be corrected or the robot diverges from the target.
  • Proximity constraint: Grasping constraint applied only when gripper is within 0.25m to prevent object teleportation.
  • False convergence filter: Detections below 0.01 confidence are rejected to avoid blind descents when the object leaves frame.
  • Camera FOV: 90 degrees required; 60 degrees creates detection blind spots near table edges.

License

MIT

About

Franka Panda pick-and-place in PyBullet using Image-Based Visual Servoing (IBVS). Supports ground-truth, color, GroundingDINO, and SAM detectors with ROS 2/RViz integration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages