VisionTouch

PyBullet simulation of a Franka Emika Panda robot doing pick-and-place using Image-Based Visual Servoing (IBVS). Closed-loop control maps wrist-camera pixel errors directly to Cartesian end-effector velocities, enabling accurate grasping under localization noise.

Features

IBVS controller: Pixel-error to Cartesian velocity mapping with wrist-mounted camera
Multi-object scenes: Spawn and sequentially pick multiple YCB objects
Pluggable detectors:
- gt — ground-truth from simulation state
- color — HSV segmentation
- grounding_dino — zero-shot open-vocabulary detection
- sam — Segment Anything Model
Adaptive grasp height: Queries object Z-coordinate to avoid collisions with varying object heights
ROS 2 + RViz integration: Joint state and camera image publishing for real-time visualization

Tech Stack

Simulation: PyBullet
Robot model: Franka Emika Panda (URDF)
Perception: OpenCV, GroundingDINO, SAM (Segment Anything)
ROS: ROS 2 Humble (optional)
Python: 3.9+

Architecture

visiontouch/
├── control/
│   ├── servoing.py         # IBVS controller core
│   ├── controller.py       # Joint-space controller
│   ├── kinematics.py       # Forward/inverse kinematics
│   ├── pick_place.py       # High-level pick-place orchestration
│   └── gripper.py
├── perception/
│   ├── detector.py         # Detector interface
│   ├── gt_detector.py      # Ground-truth via sim state
│   ├── color_detector.py   # HSV-based segmentation
│   ├── grounding_detector.py  # GroundingDINO zero-shot
│   ├── sam_detector.py     # SAM-based detection
│   └── transforms.py       # Image <-> world projections
├── simulation/
│   ├── environment.py      # PyBullet scene setup
│   ├── camera.py           # Wrist camera model
│   └── objects.py          # YCB object loading
└── ros/
    └── bridge.py           # ROS 2 publisher bridge
scripts/
├── demo.py                 # Main demo entrypoint
├── evaluate.py             # Quantitative eval
└── run_with_rviz.py        # ROS 2 + RViz launch

Setup

git clone https://github.com/shrirag10/VisionTouch.git
cd VisionTouch
pip install -r requirements.txt

For GroundingDINO/SAM: download pretrained weights into models/.

Usage

# Multi-object pick-place (GT detector)
python3 scripts/demo.py --mode pick-place --detector gt

# Target a specific YCB object
python3 scripts/demo.py --mode pick-place --detector gt --object foam_brick

# Hover/track only (no grasp)
python3 scripts/demo.py --mode hover --detector gt --object banana

# With RViz (requires ROS 2 Humble)
source /opt/ros/humble/setup.bash
python3 scripts/run_with_rviz.py --mode pick-place --detector gt

Engineering Notes

IBVS axis inversion: Downward-pointing wrist camera mirrors world-frame axes; velocity mappings must be corrected or the robot diverges from the target.
Proximity constraint: Grasping constraint applied only when gripper is within 0.25m to prevent object teleportation.
False convergence filter: Detections below 0.01 confidence are rejected to avoid blind descents when the object leaves frame.
Camera FOV: 90 degrees required; 60 degrees creates detection blind spots near table edges.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.claude		.claude
config		config
configs		configs
scripts		scripts
tests		tests
visiontouch		visiontouch
.gitignore		.gitignore
DEMO_RESULTS.md		DEMO_RESULTS.md
FINAL_DEMO_SUMMARY.md		FINAL_DEMO_SUMMARY.md
README.md		README.md
SOLUTIONS.md		SOLUTIONS.md
requirements.txt		requirements.txt
results_gt_pbvs.json		results_gt_pbvs.json
results_gt_smoke.json		results_gt_smoke.json
results_overhead.json		results_overhead.json
results_sam.json		results_sam.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionTouch

Features

Tech Stack

Architecture

Setup

Usage

Engineering Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VisionTouch

Features

Tech Stack

Architecture

Setup

Usage

Engineering Notes

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages