base benchmark

2025-09-04 23:00:01 -06:00
parent 1114c02b7c
commit bcb31d84e6
10 changed files with 375 additions and 14 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,10 @@
+# LSFBench
+Minimal Luau/Lune benchmark to evaluate LLMs: one model answers questions, another model scores the answers against the reference key.
+
+## Quick Start
+Prereqs
+- Install Lune (0.10.x)
+- Start Ollama at `http://localhost:11434` and pull the models referenced in `config.luau` (e.g. `qwen3:4b`)
+
+## Notice
+The evaluator model must support structured JSON outputs.