AI Evaluations — Overview
This folder defines what "correct" means for every Smartflow AI capability. It is the shared contract between Product, QA, and Engineering for AI output quality.
This folder defines what "correct" means for every Smartflow AI capability. It is the shared contract between Product, QA, and Engineering for AI output quality.
This rubric applies to all Covenant Monitoring evaluations. It scores covenant identification across five independent dimensions. All dimensions must be scored for a complete evaluation; the composite is a weighted average.
This rubric applies to all Loan Onboarding Extraction evaluations. It defines how individual field scores are calculated and how they are aggregated into a per-case and per-capability score.
Use this weighted model to assess pilot readiness consistently.
This rubric applies to all Smartflow AI capability outputs — Loan Onboarding Extraction, Covenant Monitoring, BMT Validation, and Document Q&A. Provenance is scored independently of the content score. Both scores are reported separately; a high content score does not compensate for low provenance quality.
This rubric applies to all Document Q&A evaluations. It scores each answer across five independent dimensions. The composite score is a weighted average. Provenance is scored separately using the Provenance Quality Rubric.