Poly Docs Icon

AI Document Processing

Intelligent document understanding

What is AI Document Processing?

Go beyond simple text extraction with Gemini 3's native vision capabilities for comprehensive document understanding.

visibility
Native Vision

Understands text, images, diagrams, charts, and tables together

description
Large Documents

Process PDFs up to 1000 pages or 50MB in a single request

code
Structured Output

Extract data into JSON format for downstream applications

compare
Multi-Document

Compare and analyze multiple PDFs simultaneously

Upload & Analyze Document

Upload a PDF to analyze with AI-powered document understanding.

upload_file

Click to upload or drag and drop

Upload 1-5 PDF files (up to 50MB each, max 1000 pages per file)

Get guaranteed JSON format responses for data extraction and programmatic use
The AI will analyze your document and respond to your specific question.

Technical Specifications

Understanding the capabilities and limits of document processing.

check_circle What It Can Do
  • File Size: Up to 50MB per PDF
  • Page Count: Up to 1000 pages per document
  • Resolution: Pages scaled to 3072×3072 max (preserving aspect ratio)
  • Multi-file: Process multiple PDFs simultaneously
  • Formats: PDF (best), also accepts TXT, MD, HTML (text-only)
  • Vision: Understands charts, diagrams, tables, images, layouts
tips_and_updates Best Practices
  • Rotate pages to correct orientation before uploading
  • Avoid blurry or low-quality scans
  • Use Files API for documents larger than 10MB
  • Place text prompts after the document in requests
  • PDFs work best - other formats lose visual context
  • Native text is extracted and not charged separately
  • Set media_resolution (low/medium/high) per document