Benchmarking Pull Request Code Reviews
August 3, 2025
I built a lightweight benchmark to test 5 major AI models on real pull request decisions from major open-source projects like Kubernetes and VS Code. Most models turned out to be "Yes Men" - approving 80-90% of PRs including problematic ones.
Read more →