Lab — RAGAS Evaluation Dashboard#

Objective#

Build an interactive Streamlit dashboard that runs RAGAS evaluation across multiple RAG retrieval strategies side-by-side.

Implement three RAG strategies: Naive RAG (dense only), Hybrid RAG (dense + BM25 + RRF), and Contextual RAG (using Anthropic’s chunk context injection).
Generate a set of test questions and ground-truth answers using an LLM.
Evaluate the retrieval strategies using RAGAS metrics (Faithfulness, Context Recall, Factual Correctness, Response Relevancy).
Build a Streamlit dashboard showing metrics comparison tables, strategy charts, and per-question breakdowns.