base benchmark

This commit is contained in:
2025-09-04 23:00:01 -06:00
parent 1114c02b7c
commit bcb31d84e6
10 changed files with 375 additions and 14 deletions

10
README.md Normal file
View File

@@ -0,0 +1,10 @@
# LSFBench
Minimal Luau/Lune benchmark to evaluate LLMs: one model answers questions, another model scores the answers against the reference key.
## Quick Start
Prereqs
- Install Lune (0.10.x)
- Start Ollama at `http://localhost:11434` and pull the models referenced in `config.luau` (e.g. `qwen3:4b`)
## Notice
The evaluator model must support structured JSON outputs.