Great Expectations

Exposes data-quality checks, dataset loading, and validation via MCP for LLM agents.

python

2

GitHub Stars

python

Language

4 months ago

First Indexed

3 weeks ago

Catalog Refreshed

Documentation & install

Readme and setup notes from the catalogue, plus a client-ready config you can copy for your MCP host.

Installation

Add the following to your MCP client configuration file.

Configuration

View docs

This MCP server exposes Great Expectations data-quality checks as MCP endpoints you can call from LLM agents. It lets you load datasets, define validation rules on the fly, run checks, and interpret results, all through consistent MCP mechanisms.

How to use

You connect to the server from an MCP client (such as Claude CLI or a custom application) and issue data-quality checks against datasets you load through the server. Use the HTTP mode for web-based clients or the STDIO mode for embedded AI clients. In HTTP mode, you interact via a remote endpoint; in STDIO mode, you run the server locally and communicate through standard input/output.

How to install

Prerequisites: Python installed (for local server) and Docker if you want to run pre-built images. You should also have a client capable of MCP interactions (such as Claude CLI or a custom MCP client). Follow the concrete steps below to run the server in either STDIO (local) or HTTP (remote) mode.

# Option 1: Run the pre-built image in STDIO mode (default)
docker run --rm -i davidf9999/gx-mcp-server:latest

# Option 2: Run the pre-built image in HTTP mode
# Expose port 8000 for HTTP access
docker run -d -p 8000:8000 --name gx-mcp-server -e MCP_MODE=http davidf9999/gx-mcp-server:latest

# Then connect with an MCP client to http://localhost:8000/mcp/

Configuration and server modes

The server supports multiple transport modes. STDIO is used for AI clients that communicate through standard input/output. HTTP is used for web-based clients that call the MCP endpoint. You can enable authentication and rate limiting as needed.

Examples shown below illustrate both modes and how to connect from an MCP client.

// The HTTP mode is configured via the command line when you start the server
// Example: enable HTTP mode with a basic user/password
uv run python -m gx_mcp_server --http --basic-auth myuser:mypassword

// The STDIO mode uses a direct in-process communication channel
uv run python -m gx_mcp_server

MCP client configuration

Configure your MCP client to connect to the server's endpoint. Choose HTTP for remote clients or STDIO for local, embedded clients. Use the provided URL or start the local process and point the client to it.

If you are using the docker-based HTTP approach, your client would reference the HTTP URL shown in the HTTP configuration example and use the appropriate authentication headers if enabled.

Security and access control

Enable authentication in production to protect access to data-quality checks. You can use Basic Authentication or Bearer tokens. Basic authentication uses a username and password. Bearer tokens rely on JWTs issued by your Identity Provider.

When using Bearer tokens, provide the public key or JWKS URL to verify tokens, and set issuer and audience claims to match your identity provider configuration.

Docker

The easiest way to run the MCP server is via the official Docker image. By default, the container runs in STDIO mode. Switch to HTTP mode by setting MCP_MODE to http.

# STDIO mode (default)
docker run --rm -i davidf9999/gx-mcp-server:latest

# HTTP mode
docker run -d -p 8000:8000 --name gx-mcp-server -e MCP_MODE=http davidf9999/gx-mcp-server:latest

# HTTP mode with authentication
docker run -d -p 8000:8000 --name gx-mcp-server \
  -e MCP_MODE=http \
  -e MCP_SERVER_USER=myuser \
  -e MCP_SERVER_PASSWORD=mypass \
  davidf9999/gx-mcp-server:latest

MCP server endpoints and commands

The server exposes an MCP endpoint at the HTTP URL when running in HTTP mode. You connect using an MCP client to the /mcp/ path and issue your data-quality checks. For local development, you can run the server with STDIO and interact directly through the CLI-enabled client.

Tools and capabilities

You can load datasets from files, URLs, or inline data, and you can load from data warehouses such as Snowflake or BigQuery. You can define and modify ExpectationSuites and run validations to fetch detailed results. The server supports in-memory or SQLite storage for datasets and results, and it provides Prometheus metrics and optional OpenTelemetry tracing.

Troubleshooting

If you run into issues, check that the server is running and accessible at the expected URL or STDIO channel. Verify authentication settings if enabled and review network access rules. Use verbose logging if needed to diagnose problems.

# Health check in HTTP mode
curl http://localhost:8000/mcp/health

# Health check with basic auth (if enabled)
curl -H "Authorization: Basic base64(user:pass)" http://localhost:8000/mcp/health

# Run with verbose logging (example)
uv run python -m gx_mcp_server --log-level DEBUG

Development

If you are developing against the server, install dependencies, run a local server, and try examples to validate behavior. You can run tests and lint as part of your development workflow.

Available tools

load_dataset

Load datasets from files, URLs, or inline data, including support for Snowflake and BigQuery via URI prefixes.

define_expectations

Create and modify ExpectationSuites to define data quality checks on the fly.

validate_data

Run validation checks against loaded datasets and fetch detailed results (synchronous or asynchronous).