Signal Grader
An evaluation harness that scores model answers against a witnessed ground truth.
TypeScriptEvaluationCI
A lightweight harness for pressure testing models on the failure modes that matter, with every score traceable back to evidence.
An evaluation harness that scores model answers against a witnessed ground truth.
A lightweight harness for pressure testing models on the failure modes that matter, with every score traceable back to evidence.