ai-tools_skill

This skill enables multimodal AI analysis and research by integrating Gemini API, Gemini CLI, and NotebookLM for transcription, extraction, and querying.

Python

12

GitHub Stars

3

Bundled Files

2 months ago

Catalog Refreshed

4 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill samhvw8/dotfiles --skill ai-tools

dot_env.example3.7 KB
executable_notebooklm-requirements.txt325 B
SKILL.md7.0 KB

Overview

This skill integrates Google AI tools—Gemini API, Gemini CLI, and NotebookLM—into a single workflow for multimodal processing, web-grounded research, and document Q&A. It handles audio, image, video, and PDF processing, plus image generation and Google-powered search/second opinions. Use it to transcribe, extract, analyze, generate, or query content with source grounding and current web context.

How this skill works

Gemini API performs multimodal tasks: transcription, OCR, video analysis, PDF extraction, and image generation using models like gemini-2.5-flash and gemini-2.5-pro. Gemini CLI orchestrates CLI-driven code review, Google Search grounding, and parallel tasks or second opinions. NotebookLM provides source-grounded Q&A over uploaded notebooks and documents, returning citations and follow-up context.

When to use it

Transcribing long audio or extracting captions and timestamps from video
Extracting structured data or tables from PDFs and documents
Analyzing images for descriptions, captions, or OCR
Generating images from prompts and iterating on visuals
Getting a second AI opinion on code, architecture, or research via Google Search grounding
Asking source-grounded questions over uploaded notebooks or corpora

Best practices

Prefer gemini-2.5-flash for cost-effective tasks and gemini-2.5-pro for highest quality
Optimize and trim media before upload; use File API for files >20MB
Process segments instead of entire long videos to save tokens and cost
Use the CLI for web-grounded queries and second opinions; use NotebookLM for document-sourced answers
Keep follow-up questions to refine NotebookLM answers and verify sources

Example use cases

Batch-transcribe a set of interviews, then extract speaker timestamps and summaries
Extract tables from a 200-page PDF as JSON for downstream analysis
Run an automated code review and request a second opinion with Google Search context
Generate concept art iterations with gemini-2.5-flash-image for creative briefs
Upload internal documentation to NotebookLM and run source-grounded Q&A for onboarding

FAQ

Use gemini-2.5-flash for most routine multimodal tasks; switch to gemini-2.5-pro when you need maximal quality.

How do I handle large media files?

Optimize and compress media, process segments, and use the File API for uploads larger than 20MB.

Can I get up-to-date web info?

Yes—use the Gemini CLI with Google Search grounding to retrieve current information and a second opinion.