100's of workflows

Transcribing Bank Statements To Markdown Using Vision AI

Integrations
Edit Image
HTTP Request
Google Drive
Compression
Sticky Note
Code
Manual Trigger
Basic LLM Chain
Aggregate
Sort
Google Gemini Chat Model
Information Extractor

This rantir workflow offers an effective approach to parsing bank statement PDFs using multimodal LLMs, which outperforms traditional OCR by providing accurate data extraction, especially for tables and complex layouts.

Advantages of Multimodal Parsing over Traditional OCR:

       
  • Reduces complexity and overhead by avoiding text pre-processing before sending to the LLM.
  •    
  • Handles non-standard PDF formats that may produce errors with traditional OCR.
  •    
  • Is significantly more cost-effective than premium OCR models, which often require further cleanup.

How it Works

       
  • The bank statement PDF is imported from Google Drive. This example uses a mock statement with complex 5-column tables that OCR struggles with.
  •    
  • Since multimodal LLMs do not accept PDFs directly, the PDF is converted to images using Stirling PDF. This tool is self-hostable, ensuring privacy for sensitive data.
  •    
  • Stirling PDF returns the PDF as a series of JPGs (one per page) in a zip file. The rantir workflow decompresses and sorts these images in the correct order.
  •    
  • Each image is resized with the Edit Image node for optimal balance between resolution and processing speed.
  •    
  • The resized images are passed to the Basic LLM node, which uses the multimodal LLM (e.g., Gemini 1.5 Pro). A "user message" of binary type is added as input to process each image.
  •    
  • The prompt instructs the LLM to transcribe each page to markdown for clarity. Alternatively, you can prompt for specific data points directly.
  •    
  • The markdown version of each page can then be analyzed by another LLM node to extract data, such as deposit line items.

Requirements

       
  • Google Gemini API for multimodal LLM processing.
  •    
  • Google Drive for document storage.
  •    
  • Stirling PDF for PDF-to-image conversion.

Customizing the Workflow

       
  • Gemini 1.5 Pro is optimal for text document parsing, but other multimodal LLMs like OpenAI GPT or Anthropic Claude can also be used.
  •    
  • For faster results, skip markdown formatting and directly request data extraction from the LLM.
  •    
  • This template is versatile and can be adapted for invoices, inventory lists, contracts, legal documents, and more.

Other Workflows like this one

Your connected stack awaits to automate AI workflows with 24-7 uptime performance and engagement

Traveler Co-Pilot: AI-Powered Telegram for Easy Language and Image Translation
Telegram
Basic LLM Chain
Anthropic Chat Model
OpenAI Chat Model
OpenAI
Creating a AI Slack Bot with Google Gemini
Slack
AI Agent
Window Buffer Memory (easiest)
Google Gemini Chat Model
Webhook
Generate SEO Keywords Using AI
AI Agent
Anthropic Chat Model
No Operation, do nothing
Aggregate
Sticky Note
Build a Financial Documents Assistant
HTTP Request
If
Edit Fields (Set)
Switch
Local File Trigger
Automated Email Marketing Campaign Workflow
Send Email
Google Sheets
Code
OpenAI
Loop Over Items (Split in Batches)
Onboard Siri AI Agent with Apple Shortcuts powered voice template
AI Agent
OpenAI Chat Model
Webhook
Respond to Webhook
Sticky Note

Compare features across plans

Computir Cloud Suite All Access

$99/m

Per team/per month, with 10 GB of data and storage
Everything in Free, and:
Icon
Host up to around 4-5 Applications
Icon
Advanced user roles
Icon
Unlimited AI applications & workflows
Icon
Custom onboarding & Customer management
Icon
Advanced integrations
Icon
International capabilities
Unlimited Team Plan & Custom Integration

$299/m

Per $1K Tokens or 1 TB added, custom integration (per month)
Everything in Professional, and:
Icon
Host up to around 20+ Applications
Icon
Tailored implementation services
Icon
Advanced ERP integration capabilities
Icon
Extra bandwidth and open-source AI models
Icon
Fine-tuning & data logic
Icon
SOX or integration customization
Icon
Dedicated premium support
Cloud Suite

$99/mo

Team Plan

$299

Computir Cloud

AI Application & Automation platform suite
Get access to generate dashboards, websites or content
Chat to Explore Data
Icon

Custom Develop  integrations

Chat to Transform Data
Icon
Direct or Enterprise application connections
Webflow, Wix or Wordpress
+ Acumatica, Microsoft, Netsuite & Sage
+ Oracle & Workday
Rules to automate AI
Basic
Advanced
Advanced

Custom Integrations

Build & Share Live Reports
Icon
Generated
Human-Led
Train Classification Models
Icon
Human-Led
Train Time Series Forecasts
Icon

"I highly recommend Computir, they are a great dev team with quick turn around on all projects and requests. We recently worked with them on updating our website and any changes, updates or modifications I needed were always taken care of quickly!"

Paige J, VP of Marketing, Heavy AI