Provenance Quality Rubric

This rubric applies to all Smartflow AI capability outputs — Loan Onboarding Extraction, Covenant Monitoring, BMT Validation, and Document Q&A. Provenance is scored independently of the content score. Both scores are reported separately; a high content score does not compensate for low provenance quality.

The cross-cutting provenance standard is defined in EVAL-STANDARDS.md Section 5.

1. Provenance Components

Every output must include three provenance elements. Each is scored independently.

Component	Label	Description
A	Source document name	Exact filename as submitted (e.g., `acme-corp-facility-agreement.pdf`)
B	Page / section reference	Page number AND clause/section identifier (e.g., `Page 14, Clause 5.1(a)`)
C	Verbatim quote	Exact text from the source document. Must be findable by a human reviewer searching the source PDF.

2. Component Scoring Matrix

Score each provenance component independently using the table below.

Component A — Source Document Name

Condition	Score
Exact filename present and matches submitted document	1
Document name present but abbreviated (e.g., "acme-corp" rather than full filename)	0.5
Document name absent or refers to wrong document	0

Component B — Page / Section Reference

Condition	Score
Both page number AND clause/section reference are present and correct	1
Page number correct but clause/section reference absent	0.75
Clause reference correct but page number wrong (off by ≤ 2 pages)	0.75
Clause reference correct but page number wrong (off by > 2 pages)	0.5
Section reference only (no page number), section is correct	0.5
Page number present but wrong; clause reference absent	0.25
Both absent	0

Component C — Verbatim Quote

Condition	Score
Quote is exact verbatim text, findable in source PDF, sufficient to support the output	1
Quote is mostly verbatim with minor differences (punctuation, capitalisation, elided words with "..." notation) AND findable in source PDF	0.75
Quote is a close paraphrase (meaning preserved) but not verbatim; would require inference to trace back to source	0.5
Quote is a summary or restatement — not traceable to a specific sentence in the source	0.25
Quote absent entirely	0
Quote present but does not appear anywhere in source document	0 (and triggers hallucination review)

3. Composite Provenance Score

$$\text{Provenance Score} = \frac{A + B + C}{3}$$

Where A, B, C are each scored 0–1 per the component tables above.

Composite score interpretation

Composite Score	Interpretation
1.0	Full provenance — all components present, correct, and exact
0.75–0.99	Good provenance — minor imprecision in one component; acceptable for production with note
0.50–0.74	Partial provenance — one component significantly imprecise or absent; requires review
0.25–0.49	Weak provenance — multiple components missing or imprecise; output should not be relied upon without full source verification
0.0–0.24	Provenance failure — output is unverifiable; must not be used

4. Per-Output Provenance Scoring

For extraction outputs: score provenance per field. Calculate average provenance score across all fields.

For covenant identification: score provenance per covenant.

For BMT validation: score provenance per assessed term.

For Q&A: score provenance per answer (an answer citing multiple clauses is scored on each citation, averaged).

5. Provenance Passing Thresholds

Threshold	Value
Minimum average provenance score (per capability run)	0.90
Maximum proportion of outputs with provenance score = 0.0	0%
Quote found to be unverifiable in source PDF	Triggers hallucination review; output removed from scoring pool

6. Special Cases

Multi-citation answers (Q&A)

When a correct answer requires citing multiple clauses, score each citation independently and average:

Citation	A score	B score	C score	Composite
Citation 1 (Clause 7.1)	1	1	1	1.0
Citation 2 (Clause 11.1)	1	0.75	1	0.917
Citation 3 (Clause 7.3)	1	1	0.75	0.917

Answer provenance score = (1.0 + 0.917 + 0.917) / 3 = 0.944

Missing page / illegible document (edge cases)

When source text is absent or illegible (e.g., degraded scan, missing pages):

If the system correctly reports the field as unextractable and cites the missing/degraded page as evidence: Component B = 0.5 (location identified, but cannot provide specific clause), Component C = 0.5 (system acknowledged limitation). Composite = average.
If the system fabricates a quote for an illegible section: Component C = 0 and hallucination review triggered.

Amendment documents

For fields correctly flagged as "unchanged": cite the amendment document and note "field not amended — refer to original [filename]." All three components scored on this citation.

7. Relationship to Content Score

Provenance score and content score are reported separately and both count toward overall capability assessment.

A correct extraction with absent provenance does not achieve the goal of evidence-linked outputs — Smartflow's core value proposition. The provenance threshold (0.90) must be met even if the content accuracy threshold (0.85) is met.

Conversely, a well-cited extraction of a wrong value is still a content failure — a correct citation does not validate an incorrect extracted value.

1. Provenance Components​

2. Component Scoring Matrix​

Component A — Source Document Name​

Component B — Page / Section Reference​

Component C — Verbatim Quote​

3. Composite Provenance Score​

Composite score interpretation​

4. Per-Output Provenance Scoring​

5. Provenance Passing Thresholds​

6. Special Cases​

Multi-citation answers (Q&A)​

Missing page / illegible document (edge cases)​

Amendment documents​

7. Relationship to Content Score​