Ragscore

Provides QA dataset generation and evaluation for RAG systems using local or cloud-backed LLMs.

python

28

GitHub Stars

python

Language

2 months ago

First Indexed

3 weeks ago

Catalog Refreshed

Documentation & install

Readme and setup notes from the catalogue, plus a client-ready config you can copy for your MCP host.

Installation

Add the following to your MCP client configuration file.

Configuration

View docs

You can generate QA datasets and evaluate your RAG system locally using simple, private workflows. This MCP server enables you to produce QA pairs from your documents and test how well your retrieval augmented generation setup answers questions, all without exposing data to external services. It emphasizes privacy, speed, and compatibility with local or cloud-backed LLMs.

How to use

You interact with the RAGScore MCP server by issuing two core actions through an MCP client or CLI: generate QA pairs from your document collection and evaluate your RAG system against the generated or provided gold questions and answers.

How to install

Prerequisites: you need Python and pip available on your system. You will also need an MCP client to communicate with the server, or you can use the provided CLI if supported by your setup.

Install the core package and optional features with Python’s package manager.

Configuration and usage notes

The MCP server exposes HTTP endpoints for evaluating a RAG system. You can point your client to the endpoint at http://localhost:8000/query. You can generate QA pairs from documents located in your docs directory and then run evaluation against the same endpoint.

Optional environment variable support includes an OpenAI API key if you plan to use the OpenAI provider. The key should be set in your environment as OPENAI_API_KEY, for example: export OPENAI_API_KEY="sk-...".

Examples and quick start

Generate QA pairs from documents and evaluate them against your RAG endpoint.

Troubleshooting

If you encounter connection issues, verify that the MCP server is running and reachable at the configured URL. Ensure your documents are accessible and that your RAG endpoint at the query URL is responding.

Available tools

generate_qas

Generate QA question-answer pairs from your documents to build a QA dataset used for evaluation.

evaluate_rag

Evaluate your RAG system by comparing model answers to gold QA pairs and produce accuracy and failure details.

quick_test

A notebook-friendly Python API to audit RAG performance, visualize results, and inspect failures.