is there a benchmark page on the benchmark results evaluated using bigcode-evaluation-harness

can bigcode-evaluation-harness eval results match or at least be close to published results by popular models like llama3, qwen2, etc.?