Skip to main content

Troubleshooting

Common Issues

AccessDeniedException on Bedrock

Symptom: Lambda throws AccessDeniedException and the resume lands in the DLQ. CloudWatch Logs shows:

AccessDeniedException: User is not authorized to perform bedrock:InvokeModel

Causes and fixes:

CauseFix
Bedrock model access not enabledOpen the Bedrock console → Model access → enable Claude Sonnet 4.6 in the deployment region
Stack deployed to a region where global.anthropic.claude-sonnet-4-6 is not supportedRedeploy to a supported region (e.g. us-east-1, eu-west-1)

Resume Lands in the Dead Letter Queue

Symptom: CVIngestionDLQAlarm fires. The CVIngestionDLQDepth metric is ≥ 1.

Steps:

  1. Open CloudWatch → Log groups → /aws/lambda/ProcessCVFunction
  2. Find the log stream for the failed invocation and identify the root cause
  3. Fix the root cause (see other entries in this guide for specific errors)
  4. Redrive the failed message:
    • Open the SQS Console
    • Select CVIngestionDLQ
    • Dead-letter queue actions → Start DLQ redrive
    • Redrive destination: CVIngestionQueue
    • Click Redrive

Lambda picks up the message automatically and reprocesses the resume.


NoSuchKey — Resume Not Found in S3

Symptom: Lambda logs:

Error retrieving document from S3 [bucket=my-bucket, key=resumes/john-doe.pdf]: NoSuchKey

Cause: The resume was deleted from S3 between the upload event and Lambda processing. This is uncommon but can happen if a lifecycle policy or manual deletion removes the object before the queue is drained.

Fix: Re-upload the resume. The pipeline will process it automatically.


Lambda Duration Alarm Fires

Symptom: ProcessResumeDurationAlarm fires. Lambda duration p95 ≥ 4 minutes.

Cause: Bedrock latency spike. Claude Sonnet 4.6 inference is typically fast, but occasional latency spikes occur during high AWS load periods.

Fix: No immediate action required — SQS will redeliver the message if the Lambda times out. Check the AWS Service Health Dashboard to confirm Bedrock is operating normally.


Queue Age Alarm Fires

Symptom: CVIngestionQueueAgeAlarm fires. Messages are waiting in CVIngestionQueue for longer than 10 minutes.

Causes and fixes:

CauseFix
Lambda throttled (concurrency limit reached)Check Lambda concurrency metrics in CloudWatch; request a quota increase if needed
Lambda in a persistent error loopCheck CloudWatch Logs for repeated errors; fix root cause
Burst of uploads exceeds processing capacityExpected — SQS will drain automatically as Lambda catches up

SNS Subscription Not Confirmed — No Alert Emails

Symptom: CloudWatch alarms fire but no email is received.

Cause: The SNS subscription requires email confirmation. AWS sends a confirmation email to the alertEmail address on first deploy — if it was not confirmed, alerts are not delivered.

Fix:

  1. Open the SNS console → Topics → CVIngestionAlertTopic
  2. Check Subscriptions — if status is PendingConfirmation, resend the confirmation
  3. Check your spam folder for the original confirmation email from no-reply@sns.amazonaws.com

Unsupported Document Format

Symptom: Lambda logs a format error or Bedrock returns an unexpected response for a specific file.

Supported formats: PDF, DOCX, DOC. The format is detected from the file extension.

Fix: Ensure the uploaded file has the correct extension (.pdf, .docx, or .doc) and that the file is not corrupted. Re-upload with the correct extension if needed.


Useful CloudWatch Logs Insights Queries

Open CloudWatch → Logs Insights, select the log group /aws/lambda/ProcessCVFunction, and run these queries.

Find all errors in the last hour:

fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50

Trace a specific resume by filename:

fields @timestamp, @message
| filter @message like /your-resume.pdf/
| sort @timestamp asc

Find slow invocations (duration > 3 minutes):

fields @timestamp, @duration, @requestId
| filter @duration > 180000
| sort @duration desc
| limit 20