Skip to main content

Extraction Accuracy Rubric

This rubric applies to all Loan Onboarding Extraction evaluations. It defines how individual field scores are calculated and how they are aggregated into a per-case and per-capability score.

This rubric is normative. Any conflict with individual eval case scoring criteria shall be resolved by the case-level criteria for case-specific situations; this rubric governs aggregation and cross-case consistency.


1. Field-Level Score Scale

Apply this scale to each extracted field independently.

ScoreLabelDefinition
1.0Exact matchExtracted value matches the source document exactly. Citation points to the correct page number and clause/section. Verbatim quote is present and correct.
0.75Semantic matchExtracted value is informationally equivalent to the source but formatted differently (e.g., USD 500,000,000 vs. $500M; 15 January 2031 vs. 2031-01-15). Citation is correct and precise. Verbatim quote present.
0.50PartialOne of the following applies: (a) value is partially correct (e.g., facility amount without currency; guarantor list missing one of three guarantors); OR (b) citation points to the correct section but wrong page; OR (c) value is correct but citation is absent. Either condition independently triggers Partial, not compounding.
0.25Structural onlyField was identified and a value was returned, but the value is materially wrong or significantly incomplete (e.g., repayment schedule extracted as "amortising" without instalment amounts or dates).
0.0Miss or HallucinationOne of the following applies: (a) required field not extracted when clearly present in source; OR (b) extracted value is factually wrong and not a formatting variant; OR (c) extracted value is not present anywhere in source document (fabricated); OR (d) citation references a location that does not exist in the document.

Scale tie-breaker rules

  • If value is correct and citation is partially wrong: score 0.50 (not 0.75), because correct provenance is required for full credit.
  • If value is wrong and citation is correct: score 0.25 at most — showing where to look is partial credit but not a correct extraction.
  • If both value and citation are partially correct: score 0.50.

2. Hallucination Override Rule

Zero tolerance

Any hallucinated value or fabricated citation in a case sets the entire case score to 0.0, regardless of scores on other fields.

A hallucination occurs when:

  • The extracted value is not traceable to any text in the source document.
  • The citation points to a page, clause, or section that does not exist in the source document.
  • A value is "completed" from partial OCR text without basis (e.g., "Citigroup" inferred from "Citig...").

After applying the hallucination override, the case is marked HALLUCINATION in the evaluation log. It must be reviewed and root-cause-analysed before the capability can be re-run.


3. Multi-Value Field Scoring

Fields that expect a list of values (e.g., Guarantors, Conditions Precedent) are scored as follows:

  1. Count the number of expected values from the ground truth (N).
  2. Score each extracted value independently using the field-level scale.
  3. Per-field score = arithmetic mean of individual value scores.
  4. If the extraction returns more values than the ground truth (false additions): each extra value scores 0.0 and is included in the denominator.

Example — Guarantors (expected: 3 entities):

GuarantorExtraction resultScore
Acme Holdings Pte. Ltd.Correctly extracted, correct citation1.0
Acme Asia Pacific Sdn. Bhd.Correctly extracted, citation wrong section0.50
[Third guarantor]Not extracted0.0

Field score = (1.0 + 0.50 + 0.0) / 3 = 0.50


4. Conditional Field Scoring

Some fields are optional in the sense that they may be absent from a document (e.g., Commitment Fee on a fully drawn facility, Guarantors on a borrower-only deal). Apply the following rules:

ScenarioScore
Field is absent from document AND system correctly reports its absence with a supporting citation (e.g., "Commitment Fee: N/A — the Facility was fully drawn at signing per Clause 2.1")1.0
Field is absent from document AND system returns null / blank / "not found" without explanation0.75 — correct absence, inadequate communication
Field is absent from document AND system fabricates a value0.0 — hallucination
Field is present in document AND system correctly extracts itApply scale normally
Field is present in document AND system reports it as absent0.0 — miss

5. Tier-Weighted Aggregation

Field scores are aggregated into a per-case score using tier weights defined in EVAL-STANDARDS.md.

TierFieldsWeight
Tier 1 — Deal-criticalBorrower, Facility Amount, Currency, Maturity Date, Margin/Spread3.0×
Tier 2 — ImportantGoverning Law, Repayment Schedule, Reference Rate1.5×
Tier 3 — SupportingGuarantors, Facility Agent, Facility Type, Tenor, Commitment Fee, Conditions Precedent, MAC clause, Negative Pledge1.0×

Aggregation formula

$$\text{Case Score} = \frac{\sum_{i} (w_i \times s_i)}{\sum_{i} w_i}$$

Where:

  • $s_i$ = field score for field $i$
  • $w_i$ = tier weight for field $i$ (3.0, 1.5, or 1.0)

Worked example

FieldTierWeightScoreWeighted Score
BorrowerT13.01.03.0
Facility AmountT13.01.03.0
CurrencyT13.01.03.0
Maturity DateT13.00.752.25
Margin/SpreadT13.01.03.0
Governing LawT21.51.01.5
Repayment ScheduleT21.50.500.75
Reference RateT21.51.01.5
GuarantorsT31.00.500.50
Facility AgentT31.01.01.0
Facility TypeT31.01.01.0
TenorT31.01.01.0
Commitment FeeT31.01.01.0
Conditions PrecedentT31.00.750.75
MAC clauseT31.01.01.0
Negative PledgeT31.01.01.0
Totals30.526.25

Case Score = 26.25 / 30.5 = 0.861


6. Per-Capability Score

$$\text{Capability Score} = \frac{\sum_{c} \text{Case Score}c}{N\text{cases}}$$

Where $N_\text{cases}$ = total number of active cases in the capability set (4 for Loan Onboarding).

Cases scored 0.0 due to hallucination are included in the denominator.


7. Passing Thresholds

ThresholdValueConsequence of failure
Minimum per-capability score0.85Capability blocked from release
Minimum Tier 1 field score (per individual field)0.70Individual field flagged; if any Tier 1 field < 0.70 across all cases, release blocked
Hallucination rate0.0Any hallucination triggers root-cause analysis before re-run

Tier 1 field threshold calculation

For each Tier 1 field (e.g., Facility Amount), calculate its average score across all cases in the capability set. If any single Tier 1 field's average falls below 0.70, the capability does not pass — even if the overall capability score is ≥ 0.85.


8. Amendment Document Scoring Rules

When the input is an amendment notice (case LO-002 type):

  • Fields correctly classified as "unchanged" and explicitly flagged as such: 1.0
  • Amended fields scored on the standard scale for each: original value, new value, and effective date each count as a sub-field, averaging to the field score.
  • Any field where the system fabricates a value for a field not present in the amendment document: 0.0 + hallucination override

9. Multi-Tranche Scoring Rules

When the input is a multi-tranche document (case LO-003 type):

  • Score each tranche independently using standard field-level scale.
  • Shared fields (Borrower, Governing Law, etc.) scored once and applied to all tranches.
  • Cross-tranche contamination (value from Tranche B assigned to Tranche A): 0.0 for the affected field, regardless of whether the value would be correct for the other tranche.
  • Failure to identify a tranche: all fields for that tranche score 0.0.