Skip to main content

Provenance Quality Rubric

This rubric applies to all Smartflow AI capability outputs — Loan Onboarding Extraction, Covenant Monitoring, BMT Validation, and Document Q&A. Provenance is scored independently of the content score. Both scores are reported separately; a high content score does not compensate for low provenance quality.

The cross-cutting provenance standard is defined in EVAL-STANDARDS.md Section 5.


1. Provenance Components

Every output must include three provenance elements. Each is scored independently.

ComponentLabelDescription
ASource document nameExact filename as submitted (e.g., acme-corp-facility-agreement.pdf)
BPage / section referencePage number AND clause/section identifier (e.g., Page 14, Clause 5.1(a))
CVerbatim quoteExact text from the source document. Must be findable by a human reviewer searching the source PDF.

2. Component Scoring Matrix

Score each provenance component independently using the table below.

Component A — Source Document Name

ConditionScore
Exact filename present and matches submitted document1
Document name present but abbreviated (e.g., "acme-corp" rather than full filename)0.5
Document name absent or refers to wrong document0

Component B — Page / Section Reference

ConditionScore
Both page number AND clause/section reference are present and correct1
Page number correct but clause/section reference absent0.75
Clause reference correct but page number wrong (off by ≤ 2 pages)0.75
Clause reference correct but page number wrong (off by > 2 pages)0.5
Section reference only (no page number), section is correct0.5
Page number present but wrong; clause reference absent0.25
Both absent0

Component C — Verbatim Quote

ConditionScore
Quote is exact verbatim text, findable in source PDF, sufficient to support the output1
Quote is mostly verbatim with minor differences (punctuation, capitalisation, elided words with "..." notation) AND findable in source PDF0.75
Quote is a close paraphrase (meaning preserved) but not verbatim; would require inference to trace back to source0.5
Quote is a summary or restatement — not traceable to a specific sentence in the source0.25
Quote absent entirely0
Quote present but does not appear anywhere in source document0 (and triggers hallucination review)

3. Composite Provenance Score

$$\text{Provenance Score} = \frac{A + B + C}{3}$$

Where A, B, C are each scored 0–1 per the component tables above.

Composite score interpretation

Composite ScoreInterpretation
1.0Full provenance — all components present, correct, and exact
0.75–0.99Good provenance — minor imprecision in one component; acceptable for production with note
0.50–0.74Partial provenance — one component significantly imprecise or absent; requires review
0.25–0.49Weak provenance — multiple components missing or imprecise; output should not be relied upon without full source verification
0.0–0.24Provenance failure — output is unverifiable; must not be used

4. Per-Output Provenance Scoring

For extraction outputs: score provenance per field. Calculate average provenance score across all fields.

For covenant identification: score provenance per covenant.

For BMT validation: score provenance per assessed term.

For Q&A: score provenance per answer (an answer citing multiple clauses is scored on each citation, averaged).


5. Provenance Passing Thresholds

ThresholdValue
Minimum average provenance score (per capability run)0.90
Maximum proportion of outputs with provenance score = 0.00%
Quote found to be unverifiable in source PDFTriggers hallucination review; output removed from scoring pool

6. Special Cases

Multi-citation answers (Q&A)

When a correct answer requires citing multiple clauses, score each citation independently and average:

CitationA scoreB scoreC scoreComposite
Citation 1 (Clause 7.1)1111.0
Citation 2 (Clause 11.1)10.7510.917
Citation 3 (Clause 7.3)110.750.917

Answer provenance score = (1.0 + 0.917 + 0.917) / 3 = 0.944

Missing page / illegible document (edge cases)

When source text is absent or illegible (e.g., degraded scan, missing pages):

  • If the system correctly reports the field as unextractable and cites the missing/degraded page as evidence: Component B = 0.5 (location identified, but cannot provide specific clause), Component C = 0.5 (system acknowledged limitation). Composite = average.
  • If the system fabricates a quote for an illegible section: Component C = 0 and hallucination review triggered.

Amendment documents

For fields correctly flagged as "unchanged": cite the amendment document and note "field not amended — refer to original [filename]." All three components scored on this citation.


7. Relationship to Content Score

Provenance score and content score are reported separately and both count toward overall capability assessment.

A correct extraction with absent provenance does not achieve the goal of evidence-linked outputs — Smartflow's core value proposition. The provenance threshold (0.90) must be met even if the content accuracy threshold (0.85) is met.

Conversely, a well-cited extraction of a wrong value is still a content failure — a correct citation does not validate an incorrect extracted value.