👉Our AI agents platform is now PCI DSS L1 certified!

sei
Compliance

AI Agents for BSA/AML: SAR Narratives, Transaction Monitoring Tuning, and the New Examiner Bar

7 min read
Ramkumar Venkataraman
Share

Why BSA/AML Is the First Place AI Agents Earn Their Keep — And the First Place They Get Banks in Trouble

The economics are obvious. A typical mid-size bank's BSA team spends 60 to 75 percent of its time on alert triage, narrative drafting, and case documentation. Most alerts close without a SAR. Most SARs read like they were assembled from templates because they were. An AI agent that can read a case file, draft a narrative, and surface the comparables saves real money on day one.

The risk is also obvious. FinCEN's 2024 guidance on innovative technology repeated what the agencies had been saying since the joint statement in 2018: a bank that delegates the SAR decision to a model is not running a compliant BSA program. The model can support the analyst. The analyst owns the file. Examiners will ask to see exactly where that line is drawn.

We have built BSA/AML agents at banks and credit unions and the design pattern below is what kept the program defensible while still capturing the productivity gain.

What the Examiner Is Actually Looking For

The FFIEC BSA/AML Examination Manual gives the test. For any technology used in the program, the bank has to be able to explain:

  • The intended use and scope of the technology
  • The risks the technology introduces and how they are managed
  • The validation evidence that the technology works as intended
  • The ongoing monitoring and recalibration cadence
  • The human oversight and decision rights

Apply this to an AI agent doing alert triage or narrative drafting and the audit posture writes itself. Skip any of these and the next exam will find it.

The Four Use Cases That Pay Back Fast

We focus AI agents on the parts of the BSA program where the work is high-volume, the inputs are structured, and the decision authority can stay clearly with the human analyst.

Alert triage and disposition support

The transaction monitoring system produces alerts. The agent pulls the customer file, the prior 13 months of activity, the relevant peer comparison, and the open and closed prior cases. It writes a structured pre-investigation memo that the analyst reads in 2 minutes instead of 20. The memo flags the typology the activity matches, the customer-risk-rating implications, and the questions the analyst needs to answer to close the alert. The disposition decision stays with the human.

SAR narrative drafting

The agent drafts the SAR narrative from the case file. The five Ws — who, what, when, where, why — are filled from the structured data. The typology fields use the FinCEN list. The narrative is in plain English, tied to specific transactions by date and amount. The analyst reviews and edits before signing. We see analyst time per SAR drop from 90 to 30 minutes with no quality loss on QA review.

CTR exception handling

Most CTRs are mechanical. The exceptions — aggregation across multiple branches, structuring patterns, the CTR-exempt-person rule — are where errors happen. The agent surfaces the exception, applies the rule, and presents the recommended treatment with citations to the bank's policy and the regulation. The analyst confirms.

Transaction monitoring tuning

The hardest BSA work at most banks is tuning the rules engine. The agent runs a quarterly analysis of alert volumes, productive-to-non-productive ratios, suppressed alerts, and SAR yield by rule. It proposes parameter changes with the simulated impact on alert volume and SAR coverage. The second line and a model risk team review and approve the changes. The change goes through change control.

The SAR Narrative Architecture We Deploy

A SAR narrative is a regulatory artifact, not a chat response. The architecture has to enforce that.

The agent pulls only from approved sources: the case file, the customer profile, the activity log, the prior SARs on the customer, and the typology dictionary. It does not pull from the open web or from the bank's policy library by default, because the narrative is about the customer's activity, not the bank's policy.

Every factual claim in the draft is tied to a source record by an internal citation. The QA layer checks that every transaction date, amount, and counterparty in the narrative appears in the underlying activity log. A mismatch fails the draft.

The narrative follows the FinCEN narrative structure — introduction, suspicious activity, conclusion, supporting documentation — and uses the agency's preferred verbs and tense. We have a style sheet derived from the FinCEN advisories, and the agent is graded against it.

The analyst sees the draft alongside the source records in a single screen. The analyst can accept, edit, or reject. Every edit is captured as training data for the next quarter's tuning. The signed SAR carries a metadata tag for the model version and the prompt version. When the SAR system asks for it years later, the lineage is complete.

The Hallucination Problem in SARs

A hallucinated SAR is not a soft error. It is a false statement to a federal agency. The architecture has to make hallucination impossible at the level of facts in the narrative.

The defenses we run, in order of strictness:

  • Hard source binding. Every named amount, date, counterparty, and account number in the draft must appear verbatim in the source data. If it does not, the draft fails.
  • Confidence gating on typology selection. The agent's typology pick is checked against a separate classifier trained on FinCEN advisories. Disagreement triggers analyst review.
  • No model judgment on the suspicion conclusion. The agent writes "the activity is consistent with [typology] for the following reasons." It does not write "the bank concludes this activity is suspicious." That sentence is the analyst's.
  • No model judgment on customer characterization. The agent does not call a customer "high risk" unless the customer's risk rating is in the source record. It does not infer risk from transactional patterns and apply a label.

Each of these is enforced with deterministic checks, not LLM judgment. They fail closed.

What "Tuning the Rules" Should Mean

A common consulting pitch is that an AI agent will replace the rules engine. It will not, and it should not. The rules engine produces auditable, deterministic alerts that examiners understand. The AI's job is to make the rules engine work better.

The tuning work we do:

  • Reduce false positives on the highest-volume rules by adding context the rules engine cannot see — customer occupation, peer comparisons, seasonality
  • Identify under-performing rules that produce few SARs and recommend either parameter tightening or retirement
  • Identify typologies surfaced through SAR filings that are not currently covered by any rule and recommend new rule design
  • Surface customer segments where the alert-to-SAR ratio is anomalous, which often points to a rule that does not fit the segment

Every recommendation goes to the BSA officer and the model risk function for review. Nothing changes in production without the change-control trail.

Customer Risk Ratings and the Limits of AI

A bank's customer risk rating drives the EDD cycle, the transaction monitoring sensitivity, and the SAR threshold. It is a model under SR 11-7 even when it is built in spreadsheets, and an AI-driven CRR is a model that gets validated.

We tell banks: do not let the AI agent overwrite the CRR. The agent can recommend a rating change with supporting evidence. The recommendation goes through the existing CRR governance — second line approval for upgrades, BSA officer approval for downgrades. The audit trail shows the human decision. The agent's recommendation becomes part of the file.

OFAC and Sanctions Screening

OFAC screening is a deterministic problem. AI agents can help with the disposition of fuzzy-match alerts — reading the customer file, the counterparty data, the open-source corroborating data, and the prior screening decisions on the customer — and proposing a clearance or escalation. The agent does not clear sanctions hits. The screening analyst clears sanctions hits.

The pattern that works is the agent producing a structured disposition memo, the analyst reading the memo and the source records in parallel, and the analyst making the call. We measure analyst time per fuzzy-match disposition and see a 60 to 70 percent reduction with no false-clear regression in QA.

The Validation Pack the Second Line Needs

For each AI agent component in the BSA program, the second-line MRM team needs a model card that covers the same ground SR 11-7 requires for any model. The specifics for BSA:

  • The labeled test set: at least 1,000 historical alerts with the human disposition as the label, refreshed quarterly
  • The narrative QA set: at least 500 historical SARs scored against a rubric of completeness, factual accuracy, typology accuracy, and clarity
  • The drift monitor: weekly tracking of disposition recommendation accuracy and narrative quality scores
  • The challenger comparison: a rules-only or older-model baseline that runs in shadow
  • The change log: every prompt, retrieval source, and configuration change with the validator's sign-off

When the examiner asks for the AI piece of the BSA program, this is the file you produce.

What "OK to Deploy" Looks Like

A reasonable internal bar for promoting an AI agent into the BSA workflow:

  • The narrative quality score on the held-out SAR set is within 10 percent of the human baseline
  • The disposition recommendation matches the human disposition on at least 92 percent of the test set, with the misses skewing toward over-recommending review (analyst-safe direction)
  • The hallucination rate on factual claims in the narrative is zero on the test set under the source-binding check
  • The change-control plan has been signed by the BSA officer, the model risk officer, and the audit liaison
  • The first 90 days of production are under a 100-percent analyst review with a 1-percent independent QA sample

Banks that meet this bar ship. Banks that compress it produce the next public enforcement action.

The Comp Pitch That Matters

The case to the audit committee is not productivity. It is consistency. A human-driven BSA program at scale has narrative variance, typology variance, and disposition variance across analysts. The AI-driven program with human-final-decision shrinks the variance. The SARs read like one bank wrote them. The dispositions reflect one policy. That is the BSA officer's job and it is what an examiner notices.

The productivity gain is real — 30 to 45 percent on analyst time in the deployments we have shipped — but it is the second thing on the slide. The consistency is the first.

Ramkumar Venkataraman

Ramkumar Venkataraman

CTO & Co-Founder

BOOK A DEMO

Embed Sei AI in your workflows
Tell us about your operations. We'll show you how Sei handles borrower calls, processes loan documents, and monitors compliance for mortgage lenders and banks.
  • Deploy in weeks, not months
  • Trained on FDCPA, TCPA, TILA, UDAAP, and RESPA
  • SOC 2 Type II and PCI DSS L1 certified
  • Integrates with your LOS, CRM, and telephony

Please provide your full name so we know how to address you.

Tell us which company you represent so we can personalise our response.

Use your work email so we can connect you with the right specialist.

Choose the topics you’d like us to cover during the demo.

Complete the verification to submit the form.

sei

AI operations platform for mortgage lenders, servicers, and banks. Handle borrower calls, process loan documents, and monitor compliance.

Partners

Speechmatics

© 2026 Sei Software Technologies Inc. All rights reserved.