cyclic/lsfbench

Files

cyclic bcb31d84e6 base benchmark

2025-09-04 23:00:01 -06:00

371 B

Raw Blame History

LSFBench

Minimal Luau/Lune benchmark to evaluate LLMs: one model answers questions, another model scores the answers against the reference key.

Quick Start

Prereqs

Install Lune (0.10.x)
Start Ollama at http://localhost:11434 and pull the models referenced in config.luau (e.g. qwen3:4b)

Notice

The evaluator model must support structured JSON outputs.