Google Research

Provides web search, content extraction, and document parsing capabilities for AI assistants with caching and secure transport.

typescript

10

GitHub Stars

typescript

Language

4 months ago

First Indexed

2 months ago

Catalog Refreshed

Documentation & install

Readme and setup notes from the catalogue, plus a client-ready config you can copy for your MCP host.

Installation

Add the following to your MCP client configuration file.

Configuration

View docs

{
  "mcpServers": {
    "zoharbabin-google-research-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "google-researcher-mcp"
      ],
      "env": {
        "GOOGLE_CUSTOM_SEARCH_ID": "YOUR_SEARCH_ID_HERE",
        "GOOGLE_CUSTOM_SEARCH_API_KEY": "YOUR_API_KEY_HERE"
      }
    }
  }
}

You can run a production-ready MCP server that lets AI assistants search the web, read pages, extract YouTube transcripts, and parse documents. It provides fast, cached results, quality scoring, and secure transport options to fit local or web-based integrations.

How to use

This MCP server exposes a set of tools your AI assistant can call to perform web searches, read content from pages, and extract information from documents. You will run the server locally and connect to it using an MCP client. The recommended setup is to start via STDIO for local integrations and switch to HTTP+SSE only if you need a shared web-facing service.

How to install

Prerequisites you need before installation: Node.js 20.0.0 or higher, Google Custom Search API credentials (API key and search engine ID), and Chromium for JavaScript rendering (Playwright installs Chromium automatically). If you plan to expose the server over HTTP, you also need an OAuth 2.1 provider for token-based authentication.

Step-by-step commands you will run:

# 1. Clone the repository
git clone https://github.com/zoharbabin/google-research-mcp.git
cd google-researcher-mcp

# 2. Install dependencies
npm install
npx playwright install chromium

# 3. Configure environment variables
cp .env.example .env

Configuration and setup notes

Only the local STDIO transport is configured in this guide. The server will read Google Custom Search credentials from environment variables and can be started in development or production modes. When you choose to run HTTP transport, you must provide OAuth 2.1 configuration and expose the server over a network.

Operational basics and maintenance

Start the server in development mode for active development with live reloads, or build and run for production. The server supports caching, quality scoring, and graceful degradation. If issues arise, verify environment variables, dependencies, and that Chromium rendering is available.

Security and access control

For HTTP transport, enforce access with OAuth 2.1 tokens. Validate tokens using your issuer’s JWKS endpoint and ensure required scopes are granted for each tool. STDIO transport does not require OAuth.

Troubleshooting tips

If you encounter startup errors, ensure all dependencies are installed and environment variables are set. If search results are empty, clear the persistent cache directories and restart. For rendering issues, re-install Playwright dependencies. If the HTTP health check fails in STDIO mode, remember the health endpoint is only active for HTTP transport.

Example client interaction

After configuring the MCP server, you can connect a local client using STDIO transport and call tools such as search_and_scrape to retrieve and surface content from top web results. For remote apps, use HTTP transport with a Bearer token and invoke the same tools through the web API.

Notes on transport choices

STDIO is optimal for local AI assistant integrations. HTTP+SSE is suited for web apps, multi-client setups, or remote access where you need a shared service.

Available tools

google_search

Returns ranked URLs from Google for a query, suitable when you want to process pages yourself.

google_image_search

Search images with size, type, and color filters to find visual content.

google_news_search

Fetch recent news results with freshness controls.

scrape_page

Extract text from web pages and documents, including YouTube transcripts and PDFs.

search_and_scrape

Combine search and content extraction in a single call with deduplicated results and source attribution.

sequential_search

Track multi-step investigations, maintaining state across several searches.

academic_search

Find academic papers from arXiv, PubMed, IEEE, and more with citations.

patent_search

Search patents for prior art and landscape insights.