100's of workflows

Transcribing Bank Statements To Markdown Using Vision AI

Integrations
Edit Image
HTTP Request
Google Drive
Compression
Sticky Note
Code
Manual Trigger
Basic LLM Chain
Aggregate
Sort
Google Gemini Chat Model
Information Extractor

This rantir workflow offers an effective approach to parsing bank statement PDFs using multimodal LLMs, which outperforms traditional OCR by providing accurate data extraction, especially for tables and complex layouts.

Advantages of Multimodal Parsing over Traditional OCR:

       
  • Reduces complexity and overhead by avoiding text pre-processing before sending to the LLM.
  •    
  • Handles non-standard PDF formats that may produce errors with traditional OCR.
  •    
  • Is significantly more cost-effective than premium OCR models, which often require further cleanup.

How it Works

       
  • The bank statement PDF is imported from Google Drive. This example uses a mock statement with complex 5-column tables that OCR struggles with.
  •    
  • Since multimodal LLMs do not accept PDFs directly, the PDF is converted to images using Stirling PDF. This tool is self-hostable, ensuring privacy for sensitive data.
  •    
  • Stirling PDF returns the PDF as a series of JPGs (one per page) in a zip file. The rantir workflow decompresses and sorts these images in the correct order.
  •    
  • Each image is resized with the Edit Image node for optimal balance between resolution and processing speed.
  •    
  • The resized images are passed to the Basic LLM node, which uses the multimodal LLM (e.g., Gemini 1.5 Pro). A "user message" of binary type is added as input to process each image.
  •    
  • The prompt instructs the LLM to transcribe each page to markdown for clarity. Alternatively, you can prompt for specific data points directly.
  •    
  • The markdown version of each page can then be analyzed by another LLM node to extract data, such as deposit line items.

Requirements

       
  • Google Gemini API for multimodal LLM processing.
  •    
  • Google Drive for document storage.
  •    
  • Stirling PDF for PDF-to-image conversion.

Customizing the Workflow

       
  • Gemini 1.5 Pro is optimal for text document parsing, but other multimodal LLMs like OpenAI GPT or Anthropic Claude can also be used.
  •    
  • For faster results, skip markdown formatting and directly request data extraction from the LLM.
  •    
  • This template is versatile and can be adapted for invoices, inventory lists, contracts, legal documents, and more.

Other Workflows like this one

Your connected stack awaits to automate AI workflows with 24-7 uptime performance and engagement

AI-powered automated stock analysis
Google Docs
Code
Code Tool
Wikipedia
OpenAI
Update Twitter using HTTP request
HTTP Request
Start
Automated AI image analysis and response via Telegram
Telegram
OpenAI
Merge
Telegram Trigger
Switch
🚀 Local Multi-LLM Testing & Performance Tracker
Google Sheets
HTTP Request
Code
Basic LLM Chain
OpenAI Chat Model
AI Conversational agent with custom JavaScript
AI Agent
OpenAI Chat Model
Code Tool
Edit Fields (Set)
Manual Chat Trigger
Reconcile Rent Payments with Local Excel Spreadsheet and OpenAI
Code
AI Agent
OpenAI Chat Model
Structured Output Parser
Code Tool

Compare features across plans

Computir Cloud Suite All Access

$99/m

Per team/per month, with 10 GB of data and storage
Everything in Free, and:
Icon
Host up to around 4-5 Applications
Icon
Advanced user roles
Icon
Unlimited AI applications & workflows
Icon
Custom onboarding & Customer management
Icon
Advanced integrations
Icon
International capabilities
Unlimited Team Plan & Custom Integration

$299/m

Per $1K Tokens or 1 TB added, custom integration (per month)
Everything in Professional, and:
Icon
Host up to around 20+ Applications
Icon
Tailored implementation services
Icon
Advanced ERP integration capabilities
Icon
Extra bandwidth and open-source AI models
Icon
Fine-tuning & data logic
Icon
SOX or integration customization
Icon
Dedicated premium support
Cloud Suite

$99/mo

Team Plan

$299

Computir Cloud

AI Application & Automation platform suite
Get access to generate dashboards, websites or content
Chat to Explore Data
Icon

Custom Develop  integrations

Chat to Transform Data
Icon
Direct or Enterprise application connections
Webflow, Wix or Wordpress
+ Acumatica, Microsoft, Netsuite & Sage
+ Oracle & Workday
Rules to automate AI
Basic
Advanced
Advanced

Custom Integrations

Build & Share Live Reports
Icon
Generated
Human-Led
Train Classification Models
Icon
Human-Led
Train Time Series Forecasts
Icon

"I highly recommend Computir, they are a great dev team with quick turn around on all projects and requests. We recently worked with them on updating our website and any changes, updates or modifications I needed were always taken care of quickly!"

Paige J, VP of Marketing, Heavy AI