BENCHY

Benchmarks you can feel

We all love benchmarks, but there's nothing like a hands on vibe check. What if we could meet somewhere in the middle?

Enter BENCHY. A chill, live benchmark tool that lets you see the performance, price, and speed of LLMs in a side by side comparison for SPECIFIC use cases.

Watch the latest development video here

Benchy Micro Apps

Thought Bench
- Goal: Compare multiple reasoning models (Deepseek R1, Gemini 2.0 Flash Thinking, OpenAI o1, ,etc) thoughts side by side in parallel.
- Watch the walk through video here
- Front end: src/pages/ThoughtBench.vue
BIG AI Coding Updates to Benchy
- Watch the walk through video here
Iso Speed Bench
- Goal: Create a unified, config file based, multi-llm provider, yes/no evaluation based benchmark for high quality insights and iteration.
- Watch o3-mini vibe check, comparison, and benchmark video here
- Watch the M4 Unboxing and benchmark video here
- Front end: src/pages/IsoSpeedBench.vue
Long Tool Calling
- Goal: Understand the best LLMs and techniques for LONG chains of tool calls / function calls (15+).
- Watch the walk through video here
- Front end: src/pages/AppMultiToolCall.vue
Multi Autocomplete
- Goal: Understand claude 3.5 haiku & GPT-4o predictive outputs compared to existing models.
- Watch the walk through video here
- Front end: src/pages/AppMultiAutocomplete.vue

Important Files

.env - Environment variables for API keys
server/.env - Environment variables for API keys
package.json - Front end dependencies
server/pyproject.toml - Server dependencies
src/store/* - Stores all front end state and prompt
src/api/* - API layer for all requests
src/pages/* - Front end per app pages
src/components/* - Front end components
server/server.py - Server routes
server/modules/llm_models.py - All LLM models
server/modules/openai_llm.py - OpenAI LLM
server/modules/anthropic_llm.py - Anthropic LLM
server/modules/gemini_llm.py - Gemini LLM
server/modules/ollama_llm.py - Ollama LLM
server/modules/deepseek_llm.py - Deepseek LLM
server/benchmark_data/* - Benchmark data
server/reports/* - Benchmark results

Setup

Get API Keys & Models

Anthropic
Google Cloud
OpenAI
Deepseek

Ollama

After installing Ollama, pull the required models:

# Pull Llama 3.2 1B model
ollama pull llama3.2:1b

# Pull Llama 3.2 latest (3B) model
ollama pull llama3.2:latest

# Pull Qwen2.5 Coder 14B model
ollama pull qwen2.5-coder:14b

# Pull Deepseek R1 1.5B, 7b, 8b, 14b, 32b, 70b models
ollama pull deepseek-r1:1.5b
ollama pull deepseek-r1:latest
ollama pull deepseek-r1:8b
ollama pull deepseek-r1:14b
ollama pull deepseek-r1:32b
ollama pull deepseek-r1:70b

# Pull mistral-small 3
ollama pull mistral-small:latest

Client Setup

# Install dependencies using bun (recommended)
bun install

# Or using npm
npm install

# Or using yarn
yarn install

# Start development server
bun dev  # or npm run dev / yarn dev

Server Setup

# Move into server directory
cd server

# Create and activate virtual environment using uv
uv sync

# Set up environment variables
cp .env.sample .env (client)
cp server/.env.sample server/.env (server)

# Set EVERY .env key with your API keys and settings
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GEMINI_API_KEY=
DEEPSEEK_API_KEY=
FIREWORKS_API_KEY=

# Start server
uv run python server.py

# Run tests
uv run pytest (**beware will hit APIs and cost money**)

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
.vscode		.vscode
ai_docs		ai_docs
images		images
public		public
server		server
src		src
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
aider-o1-deep.sh		aider-o1-deep.sh
aider-o1-fireworks-deep.sh		aider-o1-fireworks-deep.sh
aider-o1-o1.sh		aider-o1-o1.sh
aider-o1-sonney.sh		aider-o1-sonney.sh
aider-o1.sh		aider-o1.sh
aider-o3mini-sonney.sh		aider-o3mini-sonney.sh
aider-r1-deep.sh		aider-r1-deep.sh
aider-r1-r1.sh		aider-r1-r1.sh
aider-r1-sonney.sh		aider-r1-sonney.sh
aider-s-s.sh		aider-s-s.sh
aider-sonney.sh		aider-sonney.sh
index.html		index.html
o3-mini-sota-summary.md		o3-mini-sota-summary.md
openrouter.sh		openrouter.sh
package.json		package.json
start.sh		start.sh
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
uno.config.ts		uno.config.ts
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BENCHY

Benchy Micro Apps

Important Files

Setup

Get API Keys & Models

Client Setup

Server Setup

Resources

About

Releases

Packages

Languages

agdfoster/benchy

Folders and files

Latest commit

History

Repository files navigation

BENCHY

Benchy Micro Apps

Important Files

Setup

Get API Keys & Models

Client Setup

Server Setup

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages