Extraction Accuracy Rubric
This rubric applies to all Loan Onboarding Extraction evaluations. It defines how individual field scores are calculated and how they are aggregated into a per-case and per-capability score.
This rubric is normative. Any conflict with individual eval case scoring criteria shall be resolved by the case-level criteria for case-specific situations; this rubric governs aggregation and cross-case consistency.
1. Field-Level Score Scale
Apply this scale to each extracted field independently.
| Score | Label | Definition |
|---|---|---|
| 1.0 | Exact match | Extracted value matches the source document exactly. Citation points to the correct page number and clause/section. Verbatim quote is present and correct. |
| 0.75 | Semantic match | Extracted value is informationally equivalent to the source but formatted differently (e.g., USD 500,000,000 vs. $500M; 15 January 2031 vs. 2031-01-15). Citation is correct and precise. Verbatim quote present. |
| 0.50 | Partial | One of the following applies: (a) value is partially correct (e.g., facility amount without currency; guarantor list missing one of three guarantors); OR (b) citation points to the correct section but wrong page; OR (c) value is correct but citation is absent. Either condition independently triggers Partial, not compounding. |
| 0.25 | Structural only | Field was identified and a value was returned, but the value is materially wrong or significantly incomplete (e.g., repayment schedule extracted as "amortising" without instalment amounts or dates). |
| 0.0 | Miss or Hallucination | One of the following applies: (a) required field not extracted when clearly present in source; OR (b) extracted value is factually wrong and not a formatting variant; OR (c) extracted value is not present anywhere in source document (fabricated); OR (d) citation references a location that does not exist in the document. |
Scale tie-breaker rules
- If value is correct and citation is partially wrong: score 0.50 (not 0.75), because correct provenance is required for full credit.
- If value is wrong and citation is correct: score 0.25 at most — showing where to look is partial credit but not a correct extraction.
- If both value and citation are partially correct: score 0.50.
2. Hallucination Override Rule
Any hallucinated value or fabricated citation in a case sets the entire case score to 0.0, regardless of scores on other fields.
A hallucination occurs when:
- The extracted value is not traceable to any text in the source document.
- The citation points to a page, clause, or section that does not exist in the source document.
- A value is "completed" from partial OCR text without basis (e.g., "Citigroup" inferred from "Citig...").
After applying the hallucination override, the case is marked HALLUCINATION in the evaluation log. It must be reviewed and root-cause-analysed before the capability can be re-run.
3. Multi-Value Field Scoring
Fields that expect a list of values (e.g., Guarantors, Conditions Precedent) are scored as follows:
- Count the number of expected values from the ground truth (N).
- Score each extracted value independently using the field-level scale.
- Per-field score = arithmetic mean of individual value scores.
- If the extraction returns more values than the ground truth (false additions): each extra value scores 0.0 and is included in the denominator.
Example — Guarantors (expected: 3 entities):
| Guarantor | Extraction result | Score |
|---|---|---|
| Acme Holdings Pte. Ltd. | Correctly extracted, correct citation | 1.0 |
| Acme Asia Pacific Sdn. Bhd. | Correctly extracted, citation wrong section | 0.50 |
| [Third guarantor] | Not extracted | 0.0 |
Field score = (1.0 + 0.50 + 0.0) / 3 = 0.50
4. Conditional Field Scoring
Some fields are optional in the sense that they may be absent from a document (e.g., Commitment Fee on a fully drawn facility, Guarantors on a borrower-only deal). Apply the following rules:
| Scenario | Score |
|---|---|
| Field is absent from document AND system correctly reports its absence with a supporting citation (e.g., "Commitment Fee: N/A — the Facility was fully drawn at signing per Clause 2.1") | 1.0 |
| Field is absent from document AND system returns null / blank / "not found" without explanation | 0.75 — correct absence, inadequate communication |
| Field is absent from document AND system fabricates a value | 0.0 — hallucination |
| Field is present in document AND system correctly extracts it | Apply scale normally |
| Field is present in document AND system reports it as absent | 0.0 — miss |
5. Tier-Weighted Aggregation
Field scores are aggregated into a per-case score using tier weights defined in EVAL-STANDARDS.md.
| Tier | Fields | Weight |
|---|---|---|
| Tier 1 — Deal-critical | Borrower, Facility Amount, Currency, Maturity Date, Margin/Spread | 3.0× |
| Tier 2 — Important | Governing Law, Repayment Schedule, Reference Rate | 1.5× |
| Tier 3 — Supporting | Guarantors, Facility Agent, Facility Type, Tenor, Commitment Fee, Conditions Precedent, MAC clause, Negative Pledge | 1.0× |
Aggregation formula
$$\text{Case Score} = \frac{\sum_{i} (w_i \times s_i)}{\sum_{i} w_i}$$
Where:
- $s_i$ = field score for field $i$
- $w_i$ = tier weight for field $i$ (3.0, 1.5, or 1.0)
Worked example
| Field | Tier | Weight | Score | Weighted Score |
|---|---|---|---|---|
| Borrower | T1 | 3.0 | 1.0 | 3.0 |
| Facility Amount | T1 | 3.0 | 1.0 | 3.0 |
| Currency | T1 | 3.0 | 1.0 | 3.0 |
| Maturity Date | T1 | 3.0 | 0.75 | 2.25 |
| Margin/Spread | T1 | 3.0 | 1.0 | 3.0 |
| Governing Law | T2 | 1.5 | 1.0 | 1.5 |
| Repayment Schedule | T2 | 1.5 | 0.50 | 0.75 |
| Reference Rate | T2 | 1.5 | 1.0 | 1.5 |
| Guarantors | T3 | 1.0 | 0.50 | 0.50 |
| Facility Agent | T3 | 1.0 | 1.0 | 1.0 |
| Facility Type | T3 | 1.0 | 1.0 | 1.0 |
| Tenor | T3 | 1.0 | 1.0 | 1.0 |
| Commitment Fee | T3 | 1.0 | 1.0 | 1.0 |
| Conditions Precedent | T3 | 1.0 | 0.75 | 0.75 |
| MAC clause | T3 | 1.0 | 1.0 | 1.0 |
| Negative Pledge | T3 | 1.0 | 1.0 | 1.0 |
| Totals | 30.5 | 26.25 |
Case Score = 26.25 / 30.5 = 0.861
6. Per-Capability Score
$$\text{Capability Score} = \frac{\sum_{c} \text{Case Score}c}{N\text{cases}}$$
Where $N_\text{cases}$ = total number of active cases in the capability set (4 for Loan Onboarding).
Cases scored 0.0 due to hallucination are included in the denominator.
7. Passing Thresholds
| Threshold | Value | Consequence of failure |
|---|---|---|
| Minimum per-capability score | 0.85 | Capability blocked from release |
| Minimum Tier 1 field score (per individual field) | 0.70 | Individual field flagged; if any Tier 1 field < 0.70 across all cases, release blocked |
| Hallucination rate | 0.0 | Any hallucination triggers root-cause analysis before re-run |
Tier 1 field threshold calculation
For each Tier 1 field (e.g., Facility Amount), calculate its average score across all cases in the capability set. If any single Tier 1 field's average falls below 0.70, the capability does not pass — even if the overall capability score is ≥ 0.85.
8. Amendment Document Scoring Rules
When the input is an amendment notice (case LO-002 type):
- Fields correctly classified as "unchanged" and explicitly flagged as such: 1.0
- Amended fields scored on the standard scale for each: original value, new value, and effective date each count as a sub-field, averaging to the field score.
- Any field where the system fabricates a value for a field not present in the amendment document: 0.0 + hallucination override
9. Multi-Tranche Scoring Rules
When the input is a multi-tranche document (case LO-003 type):
- Score each tranche independently using standard field-level scale.
- Shared fields (Borrower, Governing Law, etc.) scored once and applied to all tranches.
- Cross-tranche contamination (value from Tranche B assigned to Tranche A): 0.0 for the affected field, regardless of whether the value would be correct for the other tranche.
- Failure to identify a tranche: all fields for that tranche score 0.0.