Provenance Quality Rubric
This rubric applies to all Smartflow AI capability outputs — Loan Onboarding Extraction, Covenant Monitoring, BMT Validation, and Document Q&A. Provenance is scored independently of the content score. Both scores are reported separately; a high content score does not compensate for low provenance quality.
The cross-cutting provenance standard is defined in EVAL-STANDARDS.md Section 5.
1. Provenance Components
Every output must include three provenance elements. Each is scored independently.
| Component | Label | Description |
|---|---|---|
| A | Source document name | Exact filename as submitted (e.g., acme-corp-facility-agreement.pdf) |
| B | Page / section reference | Page number AND clause/section identifier (e.g., Page 14, Clause 5.1(a)) |
| C | Verbatim quote | Exact text from the source document. Must be findable by a human reviewer searching the source PDF. |
2. Component Scoring Matrix
Score each provenance component independently using the table below.
Component A — Source Document Name
| Condition | Score |
|---|---|
| Exact filename present and matches submitted document | 1 |
| Document name present but abbreviated (e.g., "acme-corp" rather than full filename) | 0.5 |
| Document name absent or refers to wrong document | 0 |
Component B — Page / Section Reference
| Condition | Score |
|---|---|
| Both page number AND clause/section reference are present and correct | 1 |
| Page number correct but clause/section reference absent | 0.75 |
| Clause reference correct but page number wrong (off by ≤ 2 pages) | 0.75 |
| Clause reference correct but page number wrong (off by > 2 pages) | 0.5 |
| Section reference only (no page number), section is correct | 0.5 |
| Page number present but wrong; clause reference absent | 0.25 |
| Both absent | 0 |
Component C — Verbatim Quote
| Condition | Score |
|---|---|
| Quote is exact verbatim text, findable in source PDF, sufficient to support the output | 1 |
| Quote is mostly verbatim with minor differences (punctuation, capitalisation, elided words with "..." notation) AND findable in source PDF | 0.75 |
| Quote is a close paraphrase (meaning preserved) but not verbatim; would require inference to trace back to source | 0.5 |
| Quote is a summary or restatement — not traceable to a specific sentence in the source | 0.25 |
| Quote absent entirely | 0 |
| Quote present but does not appear anywhere in source document | 0 (and triggers hallucination review) |
3. Composite Provenance Score
$$\text{Provenance Score} = \frac{A + B + C}{3}$$
Where A, B, C are each scored 0–1 per the component tables above.
Composite score interpretation
| Composite Score | Interpretation |
|---|---|
| 1.0 | Full provenance — all components present, correct, and exact |
| 0.75–0.99 | Good provenance — minor imprecision in one component; acceptable for production with note |
| 0.50–0.74 | Partial provenance — one component significantly imprecise or absent; requires review |
| 0.25–0.49 | Weak provenance — multiple components missing or imprecise; output should not be relied upon without full source verification |
| 0.0–0.24 | Provenance failure — output is unverifiable; must not be used |
4. Per-Output Provenance Scoring
For extraction outputs: score provenance per field. Calculate average provenance score across all fields.
For covenant identification: score provenance per covenant.
For BMT validation: score provenance per assessed term.
For Q&A: score provenance per answer (an answer citing multiple clauses is scored on each citation, averaged).
5. Provenance Passing Thresholds
| Threshold | Value |
|---|---|
| Minimum average provenance score (per capability run) | 0.90 |
| Maximum proportion of outputs with provenance score = 0.0 | 0% |
| Quote found to be unverifiable in source PDF | Triggers hallucination review; output removed from scoring pool |
6. Special Cases
Multi-citation answers (Q&A)
When a correct answer requires citing multiple clauses, score each citation independently and average:
| Citation | A score | B score | C score | Composite |
|---|---|---|---|---|
| Citation 1 (Clause 7.1) | 1 | 1 | 1 | 1.0 |
| Citation 2 (Clause 11.1) | 1 | 0.75 | 1 | 0.917 |
| Citation 3 (Clause 7.3) | 1 | 1 | 0.75 | 0.917 |
Answer provenance score = (1.0 + 0.917 + 0.917) / 3 = 0.944
Missing page / illegible document (edge cases)
When source text is absent or illegible (e.g., degraded scan, missing pages):
- If the system correctly reports the field as unextractable and cites the missing/degraded page as evidence: Component B = 0.5 (location identified, but cannot provide specific clause), Component C = 0.5 (system acknowledged limitation). Composite = average.
- If the system fabricates a quote for an illegible section: Component C = 0 and hallucination review triggered.
Amendment documents
For fields correctly flagged as "unchanged": cite the amendment document and note "field not amended — refer to original [filename]." All three components scored on this citation.
7. Relationship to Content Score
Provenance score and content score are reported separately and both count toward overall capability assessment.
A correct extraction with absent provenance does not achieve the goal of evidence-linked outputs — Smartflow's core value proposition. The provenance threshold (0.90) must be met even if the content accuracy threshold (0.85) is met.
Conversely, a well-cited extraction of a wrong value is still a content failure — a correct citation does not validate an incorrect extracted value.