← Back to projects

Signal Grader

An evaluation harness that scores model answers against a witnessed ground truth.

TypeScriptEvaluationCI

A lightweight harness for pressure testing models on the failure modes that matter, with every score traceable back to evidence.