Manual mail processing doesn't scale. Our customer, a leading virtual mailbox provider, knew this all too well.
Every day, thousands of envelopes and letters arrived as images. Human operators had to open each item, understand what it was, type key data into downstream systems, and resolve exceptions. It was slow, expensive, and hard to predict.
Together with AWS, we helped them turn this into a fully automated, Generative AI–powered pipeline that processes mail in minutes instead of hours – with lower cost and higher accuracy.
The business challenge
The customer's goals were clear and ambitious:
- Reduce cost per mail item by at least 50% compared to the manual workflow.
- Increase throughput significantly without sacrificing quality.
- Cut the ingest-to-post cycle from hours to under an hour.
- Improve data quality so fewer items needed manual correction.
- Gain a predictable, auditable process with clear SLAs, dashboards, and runbooks.
Earlier attempts with traditional OCR and rule-based parsing had hit a wall:
- Hand-crafted rules broke whenever layouts or image quality changed.
- Scaling meant hiring more people, not just adding more machines.
- Operations teams had little visibility into where and why items were delayed.
They needed something different: a cloud-native, AI-driven solution where AWS services and Generative AI sit at the heart of the process.
Why Generative AI on AWS?
Before committing, we evaluated multiple options:
- Enhanced manual process using a traditional document management system – rejected as too expensive and too slow.
- Classic OCR + regex rules – couldn't reach target accuracy for names, dates, and totals across highly variable documents.
- Custom models on Amazon SageMaker – powerful, but would require the customer to operate and maintain a full model-hosting platform.
In the end, we chose Generative AI on AWS because it offered:
- Managed foundation models through Amazon Bedrock, so we could focus on prompts, evaluation, and business rules instead of MLOps.
- Seamless integration with Amazon Textract for OCR and layout understanding.
- A fully serverless data plane built from AWS Lambda, AWS Step Functions, Amazon S3, Amazon DynamoDB, and Amazon API Gateway, which can scale up and down with demand.
- Native observability, security, and governance with Amazon CloudWatch, AWS Identity and Access Management (IAM), AWS Secrets Manager, AWS CloudTrail, and AWS Config.
This combination gave us the best balance of speed, accuracy, cost, and operational simplicity.
Solution overview – from image to structured data
We designed a production-grade mail processing pipeline where Generative AI and managed AWS services do the heavy lifting.
At a high level, each mail item goes through five stages:
1. Ingest
Images of envelopes, letters, and attachments arrive into the system via a secure API or scheduled batch jobs. Amazon API Gateway and Amazon EventBridge provide controlled entry points into AWS.
2. Preprocessing & OCR on AWS
A set of AWS Lambda functions normalize and enhance images, then call Amazon Textract to extract text and layout information. This prepares high-quality input for the Generative AI step.
3. Generative AI extraction on Amazon Bedrock
Using Amazon Bedrock with a leading foundation model, we apply schema-guided prompts and examples to transform semi-structured mail images into structured JSON: senders, recipients, dates, amounts, reference numbers, and more.
Confidence scores and validation rules (implemented in Lambda) decide whether an item can flow straight through or needs human review.
4. Validation, posting, and notifications
Validated records are written to Amazon DynamoDB and securely sent to downstream systems via API. Amazon Simple Notification Service (SNS) powers status updates and operational alerts.
5. Operations, security, and governance
The entire process is orchestrated by AWS Step Functions, which handles retries, parallelization, and audit trails.
- Amazon CloudWatch provides dashboards and alarms for latency, error rates, and throughput.
- AWS Secrets Manager stores credentials for external systems.
- All data at rest is encrypted in Amazon S3 and DynamoDB using AWS Key Management Service (AWS KMS).
- Infrastructure is defined with AWS Cloud Development Kit (AWS CDK) and deployed via CI/CD, ensuring repeatability and compliance.
We deliberately keep the implementation modular so that new document types and business rules can be introduced without redesigning the entire system.
Key AWS services used
While we don't publish detailed architecture diagrams, the solution is built entirely on AWS managed services:
- Amazon Bedrock – Generative AI foundation models for interpreting mail content and extracting entities.
- Amazon Textract – OCR and layout detection for scanned documents.
- AWS Lambda – Event-driven compute for ingestion, preprocessing, validation, and integration.
- AWS Step Functions – Orchestration, error handling, and parallel fan-out of mail items.
- Amazon S3 – Durable storage for raw and processed images and artifacts.
- Amazon DynamoDB – Serverless metadata and workflow state store.
- Amazon API Gateway & Amazon EventBridge – API-triggered and scheduled executions.
- Amazon SNS – Notifications and alerts.
- AWS Secrets Manager, AWS KMS, IAM, CloudTrail, Config – Security, encryption, and governance.
- Amazon CloudWatch – Metrics, logs, dashboards, and alarms for the full workload.
AWS is not just the hosting platform – it is the core engine that makes the solution scalable, secure, and observable.
Measurable outcomes
The project delivered clear, quantifiable improvements.
Faster processing
- Before: Hours to days to process a batch of mail, with significant delays during peak periods.
- After:
- Processing time reduced to under an hour for most items
- Median time reduced by over 90%
This allows the customer to confidently offer same-day SLAs even at much higher volumes.
Lower cost per item
- Before: High per-item costs due to labor and legacy tooling
- After: Significant cost reduction per item
That's roughly a 50–60% cost reduction, even after factoring in AWS consumption and residual human-in-the-loop review.
Better data quality
By combining Textract, Bedrock, and validation logic on AWS:
- Manual correction workload has dropped by around 20%.
- Production metrics show ≥98% "business success rate" on the fields that matter most to downstream systems.
Proven production-scale AWS usage
This is a steady-state production workload, not a lab experiment:
- Tens of thousands of documents processed per month
- High-volume token processing through Amazon Bedrock
- Millions of pages processed annually through Amazon Textract
- Significant AWS consumption driven primarily by Generative AI services (Bedrock + Textract)
Thinking about modernizing your document workflows?
If you're dealing with thousands of scanned documents, PDFs, or images every day and wondering how to reduce cost and turnaround time without compromising security or compliance, AWS provides all the building blocks you need.
Our team specializes in:
- Designing production-grade Generative AI workloads on AWS
- Migrating from manual or rules-based processes to Bedrock- and Textract-powered pipelines
- Implementing end-to-end observability, security, and governance using AWS best practices
Reach out to us to discuss how a similar solution on AWS could work for your organization—or to explore how we can tailor this approach to your specific document and mail-processing needs.