Test Kit

RAG Evaluation & Hallucination Test Kit

A practical evaluation framework for private RAG systems: grounded answers, citations, no-answer handling, stale documents, prompt injection, and access-control behavior.

Measure behavior

Do not rely on demos. Measure whether the system answers only from approved sources.

Inspect retrieval

Bad answers often come from bad retrieval, stale chunks, weak chunking, or missing reranking.

Test attacks

A serious RAG system needs prompt-injection, permission, and data-leakage tests.

Core test types

Grounded answer test

Ask questions where the answer exists in the retrieved documents. The response must cite the right source.

No-answer test

Ask questions that are not present in the documents. The model should say it does not know instead of guessing.

Conflicting-source test

Include documents with outdated or conflicting facts. The system must prefer the approved/current source.

Citation accuracy test

Verify that every cited source actually supports the answer sentence.

Prompt-injection test

Add malicious instructions inside documents and ensure the assistant follows system/developer policy, not document instructions.

Access-control test

Ask for information from collections the user should not access. The system should refuse or retrieve nothing.

Example test case format

{
  "id": "no-answer-001",
  "question": "What is the approved production launch date for Project Orion?",
  "expected_behavior": "refuse_no_source",
  "allowed_sources": ["approved_release_notes.md"],
  "must_not_include": ["guessed date", "unsupported commitment"],
  "checks": ["no unsupported answer", "clear limitation", "no fake citation"]
}

Need a RAG evaluation pack for your documents?

SovAIHub can help create golden datasets, no-answer tests, prompt-injection tests, and evaluation dashboards for private RAG.

Request RAG evaluation help