Extraction Accuracy Rubric

This rubric applies to all Loan Onboarding Extraction evaluations. It defines how individual field scores are calculated and how they are aggregated into a per-case and per-capability score.

This rubric is normative. Any conflict with individual eval case scoring criteria shall be resolved by the case-level criteria for case-specific situations; this rubric governs aggregation and cross-case consistency.

1. Field-Level Score Scale

Apply this scale to each extracted field independently.

Score	Label	Definition
1.0	Exact match	Extracted value matches the source document exactly. Citation points to the correct page number and clause/section. Verbatim quote is present and correct.
0.75	Semantic match	Extracted value is informationally equivalent to the source but formatted differently (e.g., `USD 500,000,000` vs. `$500M`; `15 January 2031` vs. `2031-01-15`). Citation is correct and precise. Verbatim quote present.
0.50	Partial	One of the following applies: (a) value is partially correct (e.g., facility amount without currency; guarantor list missing one of three guarantors); OR (b) citation points to the correct section but wrong page; OR (c) value is correct but citation is absent. Either condition independently triggers Partial, not compounding.
0.25	Structural only	Field was identified and a value was returned, but the value is materially wrong or significantly incomplete (e.g., repayment schedule extracted as "amortising" without instalment amounts or dates).
0.0	Miss or Hallucination	One of the following applies: (a) required field not extracted when clearly present in source; OR (b) extracted value is factually wrong and not a formatting variant; OR (c) extracted value is not present anywhere in source document (fabricated); OR (d) citation references a location that does not exist in the document.

Scale tie-breaker rules

If value is correct and citation is partially wrong: score 0.50 (not 0.75), because correct provenance is required for full credit.
If value is wrong and citation is correct: score 0.25 at most — showing where to look is partial credit but not a correct extraction.
If both value and citation are partially correct: score 0.50.

2. Hallucination Override Rule

Zero tolerance

Any hallucinated value or fabricated citation in a case sets the entire case score to 0.0, regardless of scores on other fields.

A hallucination occurs when:

The extracted value is not traceable to any text in the source document.
The citation points to a page, clause, or section that does not exist in the source document.
A value is "completed" from partial OCR text without basis (e.g., "Citigroup" inferred from "Citig...").

After applying the hallucination override, the case is marked HALLUCINATION in the evaluation log. It must be reviewed and root-cause-analysed before the capability can be re-run.

3. Multi-Value Field Scoring

Fields that expect a list of values (e.g., Guarantors, Conditions Precedent) are scored as follows:

Count the number of expected values from the ground truth (N).
Score each extracted value independently using the field-level scale.
Per-field score = arithmetic mean of individual value scores.
If the extraction returns more values than the ground truth (false additions): each extra value scores 0.0 and is included in the denominator.

Example — Guarantors (expected: 3 entities):

Guarantor	Extraction result	Score
Acme Holdings Pte. Ltd.	Correctly extracted, correct citation	1.0
Acme Asia Pacific Sdn. Bhd.	Correctly extracted, citation wrong section	0.50
[Third guarantor]	Not extracted	0.0

Field score = (1.0 + 0.50 + 0.0) / 3 = 0.50

4. Conditional Field Scoring

Some fields are optional in the sense that they may be absent from a document (e.g., Commitment Fee on a fully drawn facility, Guarantors on a borrower-only deal). Apply the following rules:

Scenario	Score
Field is absent from document AND system correctly reports its absence with a supporting citation (e.g., "Commitment Fee: N/A — the Facility was fully drawn at signing per Clause 2.1")	1.0
Field is absent from document AND system returns null / blank / "not found" without explanation	0.75 — correct absence, inadequate communication
Field is absent from document AND system fabricates a value	0.0 — hallucination
Field is present in document AND system correctly extracts it	Apply scale normally
Field is present in document AND system reports it as absent	0.0 — miss

5. Tier-Weighted Aggregation

Field scores are aggregated into a per-case score using tier weights defined in EVAL-STANDARDS.md.

Tier	Fields	Weight
Tier 1 — Deal-critical	Borrower, Facility Amount, Currency, Maturity Date, Margin/Spread	3.0×
Tier 2 — Important	Governing Law, Repayment Schedule, Reference Rate	1.5×
Tier 3 — Supporting	Guarantors, Facility Agent, Facility Type, Tenor, Commitment Fee, Conditions Precedent, MAC clause, Negative Pledge	1.0×

Aggregation formula

$$\text{Case Score} = \frac{\sum_{i} (w_i \times s_i)}{\sum_{i} w_i}$$

Where:

$s_i$ = field score for field $i$
$w_i$ = tier weight for field $i$ (3.0, 1.5, or 1.0)

Worked example

Field	Tier	Weight	Score	Weighted Score
Borrower	T1	3.0	1.0	3.0
Facility Amount	T1	3.0	1.0	3.0
Currency	T1	3.0	1.0	3.0
Maturity Date	T1	3.0	0.75	2.25
Margin/Spread	T1	3.0	1.0	3.0
Governing Law	T2	1.5	1.0	1.5
Repayment Schedule	T2	1.5	0.50	0.75
Reference Rate	T2	1.5	1.0	1.5
Guarantors	T3	1.0	0.50	0.50
Facility Agent	T3	1.0	1.0	1.0
Facility Type	T3	1.0	1.0	1.0
Tenor	T3	1.0	1.0	1.0
Commitment Fee	T3	1.0	1.0	1.0
Conditions Precedent	T3	1.0	0.75	0.75
MAC clause	T3	1.0	1.0	1.0
Negative Pledge	T3	1.0	1.0	1.0
Totals		30.5		26.25

Case Score = 26.25 / 30.5 = 0.861

6. Per-Capability Score

$$\text{Capability Score} = \frac{\sum_{c} \text{Case Score}c}{N\text{cases}}$$

Where $N_\text{cases}$ = total number of active cases in the capability set (4 for Loan Onboarding).

Cases scored 0.0 due to hallucination are included in the denominator.

7. Passing Thresholds

Threshold	Value	Consequence of failure
Minimum per-capability score	0.85	Capability blocked from release
Minimum Tier 1 field score (per individual field)	0.70	Individual field flagged; if any Tier 1 field < 0.70 across all cases, release blocked
Hallucination rate	0.0	Any hallucination triggers root-cause analysis before re-run

Tier 1 field threshold calculation

For each Tier 1 field (e.g., Facility Amount), calculate its average score across all cases in the capability set. If any single Tier 1 field's average falls below 0.70, the capability does not pass — even if the overall capability score is ≥ 0.85.

8. Amendment Document Scoring Rules

When the input is an amendment notice (case LO-002 type):

Fields correctly classified as "unchanged" and explicitly flagged as such: 1.0
Amended fields scored on the standard scale for each: original value, new value, and effective date each count as a sub-field, averaging to the field score.
Any field where the system fabricates a value for a field not present in the amendment document: 0.0 + hallucination override

9. Multi-Tranche Scoring Rules

When the input is a multi-tranche document (case LO-003 type):

Score each tranche independently using standard field-level scale.
Shared fields (Borrower, Governing Law, etc.) scored once and applied to all tranches.
Cross-tranche contamination (value from Tranche B assigned to Tranche A): 0.0 for the affected field, regardless of whether the value would be correct for the other tranche.
Failure to identify a tranche: all fields for that tranche score 0.0.

1. Field-Level Score Scale​

Scale tie-breaker rules​

2. Hallucination Override Rule​

3. Multi-Value Field Scoring​

4. Conditional Field Scoring​

5. Tier-Weighted Aggregation​

Aggregation formula​

Worked example​

6. Per-Capability Score​

7. Passing Thresholds​

Tier 1 field threshold calculation​

8. Amendment Document Scoring Rules​

9. Multi-Tranche Scoring Rules​