Skip to content

agdfoster/benchy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BENCHY

Benchmarks you can feel

We all love benchmarks, but there's nothing like a hands on vibe check. What if we could meet somewhere in the middle?

Enter BENCHY. A chill, live benchmark tool that lets you see the performance, price, and speed of LLMs in a side by side comparison for SPECIFIC use cases.

Watch the latest development video here

deepseek-r1

deepseek-r1

o1-ai-coding-limit-testing

m4-mac-book-pro

parallel-function-calling

pick-two

Benchy Micro Apps

Important Files

  • .env - Environment variables for API keys
  • server/.env - Environment variables for API keys
  • package.json - Front end dependencies
  • server/pyproject.toml - Server dependencies
  • src/store/* - Stores all front end state and prompt
  • src/api/* - API layer for all requests
  • src/pages/* - Front end per app pages
  • src/components/* - Front end components
  • server/server.py - Server routes
  • server/modules/llm_models.py - All LLM models
  • server/modules/openai_llm.py - OpenAI LLM
  • server/modules/anthropic_llm.py - Anthropic LLM
  • server/modules/gemini_llm.py - Gemini LLM
  • server/modules/ollama_llm.py - Ollama LLM
  • server/modules/deepseek_llm.py - Deepseek LLM
  • server/benchmark_data/* - Benchmark data
  • server/reports/* - Benchmark results

Setup

Get API Keys & Models

  • Anthropic
  • Google Cloud
  • OpenAI
  • Deepseek
  • Ollama
    • After installing Ollama, pull the required models:
    # Pull Llama 3.2 1B model
    ollama pull llama3.2:1b
    
    # Pull Llama 3.2 latest (3B) model
    ollama pull llama3.2:latest
    
    # Pull Qwen2.5 Coder 14B model
    ollama pull qwen2.5-coder:14b
    
    # Pull Deepseek R1 1.5B, 7b, 8b, 14b, 32b, 70b models
    ollama pull deepseek-r1:1.5b
    ollama pull deepseek-r1:latest
    ollama pull deepseek-r1:8b
    ollama pull deepseek-r1:14b
    ollama pull deepseek-r1:32b
    ollama pull deepseek-r1:70b
    
    # Pull mistral-small 3
    ollama pull mistral-small:latest

Client Setup

# Install dependencies using bun (recommended)
bun install

# Or using npm
npm install

# Or using yarn
yarn install

# Start development server
bun dev  # or npm run dev / yarn dev

Server Setup

# Move into server directory
cd server

# Create and activate virtual environment using uv
uv sync

# Set up environment variables
cp .env.sample .env (client)
cp server/.env.sample server/.env (server)

# Set EVERY .env key with your API keys and settings
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GEMINI_API_KEY=
DEEPSEEK_API_KEY=
FIREWORKS_API_KEY=

# Start server
uv run python server.py

# Run tests
uv run pytest (**beware will hit APIs and cost money**)

Resources

About

Benchmarks you can feel

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 64.8%
  • Python 19.1%
  • Vue 15.4%
  • Other 0.7%