Primer AI: RAG-V

Granular Verification for Trustworthy AI Summaries

Role

Staff Product Designer

Responsibilities

UX Research, Usability Testing, Wireframing, Prototyping, Visual Design

Duration

June, 2024 - August 2024

Overview

Designing RAG for claim-by-claim trust

RAG-V is a reference pattern that integrates retrieval-augmented generation (RAG) with a verification layer. In collaboration with researchers, I led the design of a workflow that verifies each claim in an AI-generated summary against source material. The goal was to ensure accuracy, reduce hallucinations, and build user trust, especially for national security users who have very low tolerance for errors.

The Problem

High-stakes users require zero hallucinations

Traditional summary workflows offered only sentence-level transparency, showing whether a sentence was grounded in evidence but not which specific claims were supported. Even with RAG, AI-generated summaries still carried a 5% to 10% error rate. For users in high-stakes environments such as intelligence analysts, even minor hallucinations could have critical consequences.

Note: Data is illustrative, based on trends from Ke et al. (2025).
Ke, Y. H., Jin, L., Elangovan, K., Abdullah, H. R., Liu, N., Sia, A. T. H., Soh, C. R., Tung, J. Y. M., Ong, J. C. L., Kuo, C.F., Wu, S.C., Kovacheva, V. P., & Ting, D. S. W. (2025, April 5). Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness. Nature. https://www.nature.com/articles/s41746-025-01519-z

Research Insights

Learning from existing citation and verification behaviors

Understanding how different groups handle evidence and attribution helped shape a design pattern that feels both familiar and trustworthy to users.

Preliminary Research Insight

Academic citation patterns

Use footnotes or inline citations to ground claims, establishing a trusted pattern for linking evidence.

Citation example from Citation Machine

Preliminary Research Insight

Intel analyst workflows

Rely on similar citations but emphasize excerpts and metadata such as organization, author, and timestamp to validate and share information.

Preliminary Research Insight

LLM product conventions

ChatGPT, Gemini, Claude, and Perplexity use hoverable chips for citations, with Gemini also offering on-demand verification. These approaches verify at the sentence level, which risks hallucination and makes it unclear which claim is supported.

Proposed Solutions

Technical proposal

1. Breaks down generated text into individual claim
2. Cross-checks each claim against retrieved sources
3. Iteratively regenerates summaries until claims are verified.

Visit Primer's Research to read more details

RAG-V pipeline illustration

Design Principles

Designing for scannability, transparency, and control

This workflow reduced hallucination rates and improved review efficiency by embedding verification directly into the analyst’s workflow.

  • A familiar button design

    Adopted an existing style of schedule send button inspired by Gmail's design to reduce the cost of transition.

  • Ideal scenario: following up at the highest replied time

    Suggested an optimal delivery time from our prediction to achieve the recruiters' goal.

  • MVP: following up in the next two days

    Default send time to the next two days (rather than highest replied time or business day) for the initial launch.

  • Be more explicit to prevent possible errors

    Explicit showing the relationship between initial contact and follow-ups to reduce possible confusion.

  • Sending as a specific person rather than a role

    Keeping sender to a list of people with permission rather than their roles within the company to minimize the technical requirement for the initial release.

  • Optimizing the order of information for the user

    Finalize the sequence of delivery time based on multiple user feedback.

  • Utilizing tabs organize different follow-ups

    By adding a tab navigation, users can manage information of the initial email and each follow-up individually.

  • Preventing accidental overwrite

    Separating the original "Save Template" button into two buttons to solve the pain point of accidentally overwrites the currently-used template without a confirmation.

  • Moving tabs to the bottom to match how users compose emails

    From usability testings, we discovered that most users thought about the follow-ups after they finish editing the initial outreach. Thus, we moved the tabs to the bottom of the window to accommodate users' writing behaviors.

Final Results

RAG-V in action

Integrating tabs to track multiple follow-ups

The final RAG-V workflow distilled three core design principles: efficiency, granularity, and transparency. Through iteration, these principles shaped claim-level verification, clear evidence views, and reduced latency, resulting in a system that not only minimized hallucinations but also fit seamlessly into analysts’ review processes, driving adoption and trust.

Trust became the new baseline

RAG-V achieved a significant reduction in unverified content and increased analyst confidence in generative features. Customers requested its extension beyond search summaries, and it became a baseline explainability pattern across Primer products.

"This approach gives me much greater confidence in the accuracy of summaries! When can I expect to see RAG-V applied to other generative text beyond search summaries?"
— One customer after a RAG-V demo
Retrospective

Verification builds trust, but speed builds value

RAG-V demonstrated the value of granular claim-level verification in building trust with generative AI. The tradeoff was speed and content richness, two factors that limited the perceived value of summaries. Future iterations will focus on balancing trustworthiness with efficiency and utility, ensuring analysts gain both confidence and insight from AI outputs.