vLLM Issue Reporting Template

Before submitting:

Check if your issue is listed in known issues.

General Information (Required for All Issues)

vLLM Version: (from pip show vllm or git commit hash)
Hardware Setup:
- GPU(s): (Make, Model, and Count)
- Driver Version: (nvidia-smi or rocm-smi output)
- Memory: (Host and GPU memory)
Execution Environment:
- Docker Image: (Name + Tag)
- CUDA/ROCm Version:
- Python Version:
- Kernel Version: (uname -a)

Performance Regression Report

1. Benchmark Command

# Full command from benchmarks/ directory
# Include all parameters and quantization flags
# Example:
python benchmarks/benchmark_throughput.py \
  --model meta-llama/Llama-2-7b-hf \
  --tensor-parallel-size 2 \
  --dtype half \
  --num-prompts 64 \
  --input-len 1024 \
  --output-len 128 \
  -tp 8

2. Environment Configuration

# Any non-default environment variables
# Example:
export VLLM_USE_TRITON_FLASH_ATTN=False

3. Performance Metrics

Metric	Good Performance (Image: `vllm:old`)	Regressed Performance (Image: `vllm:new`)
Throughput (tokens/s)	1250	840
Memory Utilization	78%	92%
GPU Utilization	95%	68%

4. Reproducibility Context

Original working Docker image: docker pull rocm/vllm-dev:main
Regression Docker image: docker pull rocm/vllm-dev:nightly
Performance difference persists across multiple runs
Verified with different input sizes/batch sizes

Crash/Bug Report

1. Reproduction Steps

# Minimal command that triggers the issue
# Include deployment commands if applicable
python benchmarks/benchmark_latency.py \
  --model meta-llama/Llama-2-7b-hf \
  --max-num-seqs 16 \
  --enforce-eager

2. Error Logs

Expand for full logs

[Full plaintext log output]

3. Environment Context

# Non-default configurations
export VLLM_USE_TRITON_FLASH_ATTN=false

4. Diagnostic Information

Issue reproduces with --enforce-eager mode
Issue reproduces with different random seeds

Additional Context

First observed date:
Frequency: (Always/Intermittent/Specific Conditions)
Related components: (e.g., FP8 quantization, PagedAttention)
Custom modifications: (List any code/configuration changes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM Issue Reporting Template

General Information (Required for All Issues)

Performance Regression Report

1. Benchmark Command

2. Environment Configuration

3. Performance Metrics

4. Reproducibility Context

Crash/Bug Report

1. Reproduction Steps

2. Error Logs

3. Environment Context

4. Diagnostic Information

Additional Context

FilesExpand file tree

reporting_issues.md

Latest commit

History

reporting_issues.md

File metadata and controls

vLLM Issue Reporting Template

General Information (Required for All Issues)

Performance Regression Report

1. Benchmark Command

2. Environment Configuration

3. Performance Metrics

4. Reproducibility Context

Crash/Bug Report

1. Reproduction Steps

2. Error Logs

3. Environment Context

4. Diagnostic Information

Additional Context