Skip to main content

Architecture

RAISE is a fully serverless, event-driven system. Every component is managed by AWS — there are no servers to provision, patch, or scale manually. The architecture is split into four concerns: ingestion, processing, output, and failure handling.

Resume Ingestion Flow

A user uploads a resume to S3. S3 publishes an event to an SQS ingestion queue. Lambda polls the queue one message at a time, invokes Bedrock to extract structured data, then routes the output to DynamoDB, SQS, or both — based on configuration.

Step-by-Step

  1. User uploads resume — PDF, DOCX, or DOC — to the S3 bucket (CVExtractorBucket).
  2. S3 publishes an event notification to the SQS ingestion queue (CVIngestionQueue).
  3. Lambda polls the queue (one message at a time) and retrieves the document from S3.
  4. Lambda invokes AWS Bedrock (Claude Sonnet 4.6 via cross-region inference profile). Bedrock extracts structured candidate data using tool calling with a predefined JSON schema.
  5. Lambda routes output based on the OUTPUT_TARGET environment variable:
    • DYNAMODB — store in DynamoDB only
    • SQS — publish to the output queue only
    • BOTH — store in DynamoDB and publish to SQS (default)
  6. On failure — if Lambda throws for any reason (e.g. Bedrock access denied, NoSuchKey), SQS moves the message directly to the Dead Letter Queue (CVIngestionDLQ). No retries are attempted because these failures are non-recoverable until the root cause is fixed.

Ingestion Reliability

Routing S3 events through SQS before Lambda provides two things: automatic retry via SQS visibility timeout, and a Dead Letter Queue for fast failure isolation.

The DLQ is configured with maxReceiveCount: 1 — messages move to the DLQ on the first failure, not after multiple retries. This is intentional: Lambda failures in RAISE are caused by configuration errors (wrong IAM policy, missing Bedrock model access) that will keep failing until fixed. Retrying wastes time and obscures the root cause.

Once the root cause is fixed, recovery is one click in the SQS Console:

  1. Select CVIngestionDLQ
  2. "Dead-letter queue actions" → "Start DLQ redrive"
  3. Redrive destination: CVIngestionQueue
  4. Click "Redrive"

Lambda picks up all failed messages automatically and reprocesses them.

Monitoring & Observability

RAISE uses a multi-signal monitoring strategy — not just a DLQ alarm, but signals at every layer of the pipeline.

Alarm Reference

AlarmMetricThresholdWhat it means
CVIngestionDLQAlarmDLQ messages visible≥ 1Resume processing failed — investigate and redrive
ProcessCVErrorAlarmLambda errors (sum)≥ 1 / 5 minLambda threw an exception — check CloudWatch Logs
ProcessCVDurationAlarmLambda duration (p95)≥ 4 minProcessing approaching 5-min timeout — check Bedrock latency
CVIngestionQueueAgeAlarmQueue message age≥ 10 minQueue backing up — Lambda stalled or throttled

All alarms route through CVIngestionAlertTopic (SNS). The alert email is configured at deploy time via --context alertEmail=an.other@example.com.

The CloudWatch Dashboard (RAISE-CV-Ingestion) provides a single-pane view of Lambda invocations and errors, Lambda duration (p50/p95), queue depth and message age, and DLQ message count.