pdf_skill

This skill helps you manage PDF files by reading, merging, splitting, OCR, watermarks, forms, and encryption with Python tools.

Python
Official

78k

GitHub Stars

4

Bundled Files

2 months ago

Catalog Refreshed

4 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill anthropics/skills --skill pdf

forms.md11.6 KB
LICENSE.txt1.4 KB
reference.md16.3 KB
SKILL.md7.9 KB

Overview

This skill provides a comprehensive toolkit for working with PDF files programmatically and from the command line. It covers common operations like reading, extracting text and tables, merging and splitting, rotating pages, watermarking, OCR for scanned documents, form filling, image extraction, and encryption. Use it whenever a user mentions a .pdf file or asks to produce or manipulate one.

How this skill works

The skill leverages Python libraries (pypdf, pdfplumber, reportlab, pytesseract, pdf2image) and command-line tools (qpdf, pdftotext, pdfimages, pdftk) to perform operations. For reading and low-level manipulations use pypdf; for robust text and table extraction use pdfplumber; for PDF generation use reportlab; for OCR convert pages to images then run pytesseract. Command-line utilities provide fast alternatives for extraction, merging, splitting, rotation and image export.

When to use it

Extract text or structured tables from PDFs, including saving to Excel or CSV.
Merge multiple PDFs into a single document or split a PDF into pages or page ranges.
Rotate pages, add watermarks, stamp content, or apply page-level edits.
Create new PDFs, multi-page reports, or render styled paragraphs with reportlab.
Perform OCR on scanned PDFs to produce searchable text output.
Encrypt or decrypt PDFs and extract embedded images or metadata.

Best practices

Prefer pdfplumber for table extraction and layout-aware text, and pypdf for page-level edits and merging.
Convert scanned PDFs to images before OCR (pdf2image + pytesseract) for reliable results.
When creating complex formatted text, use reportlab Platypus Paragraphs and XML tags for subscripts/superscripts instead of Unicode glyphs.
Always work on copies of original PDFs when performing destructive operations like rotate or merge.
Use command-line tools (qpdf, pdftotext, pdfimages) for large batches or performance-sensitive workflows.

Example use cases

Combine quarterly reports (doc1.pdf, doc2.pdf) into a single merged.pdf using pypdf or qpdf.
Extract all tables from an invoice PDF into an Excel file using pdfplumber + pandas.
Convert a scanned contract to searchable text by converting pages to images and running pytesseract.
Add a corporate watermark to every page of a report before distribution using pypdf.
Create a multi-page analytics report programmatically with reportlab and export as report.pdf.

FAQ

Use pdfplumber for layout-aware table extraction, then clean/normalize with pandas. Try adjusting extraction parameters or manual table detection for very complex layouts.

How do I OCR a scanned PDF to get searchable text?

Convert each PDF page to an image with pdf2image, then run pytesseract on each image to extract text. Combine results and optionally reassemble into a searchable PDF.