← Back to Blog

Transforming College Catalog Data with AWS Generative AI

Document processing workflow with AI, AWS, and data analytics

How Horus Technologies helps education data providers move from PDFs to trusted course data

About the project

Horus Technologies partnered with an education data provider that works with thousands of accredited institutions and large academic data platforms. Their business depends on one thing: accurate, up-to-date course data that can be compared across schools and programs.

Every year they process hundreds of college catalogs. Each catalog contains hundreds or even thousands of courses, all formatted differently, using different credit systems and prerequisite styles. This information powers downstream products such as search, analytics, and transfer-credit evaluation — where errors directly affect students' time, money, and graduation plans.

The challenge

Our customer came to Horus Technologies with several connected problems:

  • Catalogs were slow to process. Turning a single catalog PDF into structured data could take several days of manual work and scattered scripts.
  • Formats were wildly inconsistent. Course numbers, credits, and prerequisites were written in many different ways across institutions and catalog editions.
  • Accuracy was critical. Small mistakes in course fields could lead to incorrect transfer-credit decisions and frustrated students.
  • Demand was growing. As the number of partner institutions increased, the existing approach could not keep up without adding more staff.

They needed a way to dramatically accelerate catalog processing while keeping or improving expert-level quality.

Why AWS

To meet these requirements, Horus Technologies and the customer chose to build the new solution on Amazon Web Services (AWS).

We use:

  • AWS generative AI services to understand and structure complex catalog content.
  • AWS managed data services to store, secure, and serve course information to downstream applications.
  • Serverless and fully managed components to scale automatically with workload and keep operational overhead low.

By relying on AWS as the core platform, the customer gets a solution that is secure, compliant, and ready to grow with their business rather than a one-off custom script that is hard to maintain.

What Horus Technologies delivered

Working together with the customer, Horus Technologies designed and implemented a production-grade course data platform on AWS that:

  • Ingests large volumes of catalog files from higher-education institutions.
  • Uses generative AI on AWS to extract and normalize key fields such as subject, course number, title, credits, and prerequisites.
  • Enforces strict validation rules so that only clean, consistent records move forward.
  • Publishes structured course data through APIs, allowing internal tools and partner systems to integrate it quickly.

All of this is backed by AWS services that provide encryption, identity and access management, monitoring, and auditability as standard capabilities.

AWS services used in this project

The solution is built entirely on AWS and leverages several managed services:

  • Amazon Bedrock – for applying generative AI to complex catalog documents and producing structured course information.
  • AWS Lambda – for serverless processing tasks that react to new catalogs and orchestrate data preparation steps.
  • AWS Step Functions – for coordinating multi-step workflows and keeping long-running processes reliable and traceable.
  • Amazon S3 – for durable storage of source catalogs and derived data artifacts, with encryption and lifecycle policies.
  • Amazon DynamoDB – for fast, scalable access to course records and other frequently queried data.
  • Amazon RDS / Amazon Aurora (MySQL-compatible) – for relational data that requires strong consistency and reporting.
  • Amazon API Gateway – for exposing secure, versioned APIs that allow internal tools and partners to consume course data.
  • AWS Identity and Access Management (IAM) – for enforcing least-privilege access across all components.
  • AWS Secrets Manager – for securely storing and rotating credentials and configuration secrets.
  • Amazon CloudWatch – for centralized logging, metrics, and alerts that keep the platform observable and support operations.

These managed services give the customer a modern, cloud-native platform without having to build and operate low-level infrastructure themselves.

Business impact

After going live on AWS, the customer saw a step change in how they work with course data:

  • From days to hours. End-to-end processing time per catalog dropped from multiple days of manual effort to hours, with same-day turnaround now achievable for most catalogs.
  • Consistent, high-quality data. Course fields now meet strict internal accuracy targets, reducing rework and issues discovered later in the process.
  • Lower manual workload. Experts focus on quality assurance and edge cases instead of repetitive data entry, without needing to grow the team at the same pace as catalog volume.
  • Better visibility and control. Built-in monitoring and metrics on AWS make it easy to track processing performance, spot problems early, and continuously improve.

For the customer, this means faster delivery of catalog updates to their own clients, more reliable data products, and a platform that can scale as they add more institutions and services.

How Horus Technologies and AWS can help you

This project shows how Horus Technologies combines domain expertise in complex document workflows with AWS generative AI and cloud services to solve real business problems:

  • Large volumes of heterogeneous documents
  • Strict accuracy and compliance requirements
  • Need for scalable, API-driven access to structured data

If your organization is dealing with similar challenges — whether in education, healthcare, finance, or another data-heavy domain — Horus Technologies can help you design and implement a solution built on AWS and generative AI that fits your specific needs.