Eval Run Report: [run_id]

Date: YYYY-MM-DD
Model / Pipeline version: [version]
Case set: [all / capability name / list of case IDs]
Evaluator: [team or individual name]
Overall status: PASS / FAIL / PARTIAL

Summary

Capability-level pass/fail at a glance. A capability passes if: (a) score ≥ 0.85, (b) no Tier 1 field below 0.70, and (c) hallucination rate = 0.0.

Capability	Cases run	Avg. case score	Provenance score	Hallucinations	Status
Loan Onboarding	4	—	—	—	PASS / FAIL
Covenant Monitoring	4	—	—	—	PASS / FAIL
BMT Validation	3	—	—	—	PASS / FAIL
Document Q&A	4	—	—	—	PASS / FAIL
Overall	15	—	—	—	PASS / FAIL

Detailed Results

Loan Onboarding

Case ID	Case name	Case score	Provenance score	Hallucination
LO-001	Standard APLMA	—	—	Y/N
LO-002	Amendment Notice	—	—	Y/N
LO-003	Multi-tranche	—	—	Y/N
LO-004	Edge — Sparse Doc	—	—	Y/N
Avg.		—	—

Tier 1 field scores (LO capability):

Field	Avg. score across cases	Pass (≥ 0.70)?
Borrower	—	—
Facility Amount	—	—
Currency	—	—
Maturity Date	—	—
Margin / Spread	—	—

Covenant Monitoring

Case ID	Case name	D1 Coverage	D2 Type	D3 Threshold	D4 Frequency	D5 Edge cases	Case score
CM-001	Financial Covenants	—	—	—	—	—	—
CM-002	Information Covenants	—	—	—	—	—	—
CM-003	Negative Covenants	—	—	—	—	—	—
CM-004	Waiver Scenario	—	—	—	—	—	—
Avg.		—	—	—	—	—	—

BMT Validation

Case ID	Case name	Case score	False positives	False negatives
BM-001	Market-Standard Deal	—	—	—
BM-002	Deviation — Pricing	—	—	—
BM-003	Novel Structure	—	—	—
Avg.		—	—	—

Document Q&A

Case ID	Case name	D1 Accuracy	D2 Citation	D3 Scope	D4 Uncertainty	D5 Refusal	Case score
QA-001	Factual Retrieval	—	—	—	—	—	—
QA-002	Cross-clause Reasoning	—	—	—	—	—	—
QA-003	Ambiguous Term	—	—	—	—	—	—
QA-004	Out-of-scope Question	—	—	—	—	—	—
Avg.		—	—	—	—	—	—

Grounded refusal accuracy: — / 1 (out-of-scope cases) = — (threshold: ≥ 0.95)

Regressions

Any case where the score is lower than the most recent previous run for the same case. If this is the first run, this section is N/A.

Case ID	Previous score	Current score	Delta	Description of regression
—	—	—	—

Previous run reference: [run_id of last comparable run, or "N/A — first run"]

Hallucination Log

List every hallucination detected in this run. Each must have a root-cause note before the next run is permitted.

Case ID	Field / context	Hallucinated value	Root cause (preliminary)	Resolved?
—	—	—	—	—

If no hallucinations: None detected in this run.

Action Items

Actions required before the next eval run or before release, depending on severity.

#	Severity	Description	Owner	Due date
1	—	[Action description]	—	—

Severity levels:

Release-blocking: Must be resolved before any production release.
High: Must be resolved before the next eval run.
Medium: Address within current sprint.
Low: Log and address in next planning cycle.

Evaluator Notes

[Free-form observations from the evaluator: document quality issues, unexpected model behaviour, edge cases not covered by current eval cases, recommendations for new regression cases.]

Sign-off

Role	Name	Date	Sign-off
Product	—	—	—
QA	—	—	—
Engineering	—	—	—

Summary​

Detailed Results​

Loan Onboarding​

Covenant Monitoring​

BMT Validation​

Document Q&A​

Regressions​

Hallucination Log​

Action Items​

Evaluator Notes​

Sign-off​