Amazon Connect API Throttling: A Complete Guide to Fixing 429 RateExceeded Errors

Contents

The 30-Second Diagnosis: What Is Happening Right Now

Amazon Connect API Throttling – Your Amazon Connect integration is returning HTTP 429 responses. Your logs show one of two error signatures, and they are not the same problem:

ThrottlingException — You have exceeded the request rate or burst limit for a specific API call within a defined time window. This is transient. The service is enforcing its Token Bucket limits, and you can recover by backing off and retrying.
ServiceQuotaExceededException — You have exhausted a hard quota tied to your account, such as maximum concurrent calls, maximum instances, or maximum phone numbers. This does not resolve on its own. You must file a quota increase request.

The distinction is critical. Engineers waste hours retrying a ServiceQuotaExceededException with exponential backoff — it will never succeed until the quota is raised. Conversely, engineers waste hours filing support tickets for a ThrottlingException that a three-line retry wrapper would have silently handled.

Triage rule: If the error is time-correlated with a traffic spike → ThrottlingException. If the error started after a provisioning action (new instances, new phone number assignments, scaling a dialer campaign) → ServiceQuotaExceededException.

The Root Cause Analysis: Token Bucket, Per-API Limits, and Per-Account Limits

How the Token Bucket Algorithm Works in AWS

AWS enforces throttling using the Token Bucket Algorithm. Conceptually:

A “bucket” holds a finite number of tokens. Each API request consumes one token.
The bucket refills at a fixed steady-state rate (requests per second, or RPS).
The bucket has a maximum capacity, which defines the burst limit — the maximum number of requests that can be processed simultaneously before the refill rate becomes the ceiling.

For example, if AWS API Gateway has a default account-level limit of 10,000 RPS with a burst of 5,000 tokens, and your Lambda functions send 12,000 requests in a single second, the first 5,000 clear the burst, the next 5,000 are served at the refill rate, and the remaining 2,000 are throttled with 429 Too Many Requests. This mechanism applies identically to Amazon Connect’s internal control-plane APIs.

The important operational insight from this model: burst traffic triggers throttling even if your average rate is within limits. A Lambda function that fans out 500 Connect API calls simultaneously will exhaust the burst budget in milliseconds, even if the hourly average is well within quota.

Per-API Limits vs. Per-Account Limits

There are two distinct throttle dimensions you must understand:

Per-Account, Per-Region Limits apply across all API calls within your AWS account in a given region. These are soft limits, adjustable via AWS Service Quotas. For Amazon Connect, examples include the number of concurrent active calls, the number of instances, and the number of phone numbers. If any single AWS service (including your own Lambda functions making Connect API calls) exhausts the account-level RPS ceiling, every API call from that account in that region begins receiving 429 errors — not just Connect calls.

Per-API (Per-Method) Limits apply to specific Amazon Connect API operations. For example:

StartOutboundVoiceContact has its own RPS limit separate from GetCurrentMetricData.
PutDialRequestBatch (used in high-volume outbound dialer campaigns) has a burst limit that is independent of general account throttles.
ListContactFlows and other list/describe operations share a separate read-quota pool from write operations.

The practical implication: you can be throttled on StartOutboundVoiceContact while GetCurrentMetricData continues to function normally. Always check which API is throwing the ThrottlingException before diagnosing the root cause.

AWS applies throttling limits in this priority order:

Per-client or per-method limits (usage plan level, if API Gateway is in the path)
Per-method stage-level limits
Per-account, per-region limits
AWS-wide regional throttling (cannot be changed by customers)

For Amazon Connect specifically, the most common throttling scenarios in production are:

Lambda functions triggered by contact flows making synchronous Connect API calls at high concurrency
Outbound dialer campaigns using PutDialRequestBatch without proper pacing
VDI/Citrix environments where many agent sessions initialize simultaneously, triggering bursts of GetCurrentMetricData and ListUsers calls
Monitoring tools polling GetCurrentMetricData or GetMetricData at sub-second intervals

Also Check – Amazon Connect CSP Bypass: Fix Content Security Policy Violations & Embed CCP (2026)

The Immediate Fix: Production-Ready SDK Code with Adaptive Retry

Python (boto3) — Exponential Backoff with Jitter

Amazon Connect API Throttling – The following script implements the AWS-recommended retry pattern for Connect API calls. It uses tenacity for clean decorator-based retry logic (as validated in production SRE environments) with full jitter — the most effective strategy for avoiding retry storms.

import boto3
import time
import random
import logging
from botocore.exceptions import ClientError

# Configure logging
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

# Initialize the Connect client with SDK-level retry configuration
# This provides a baseline retry layer beneath our custom logic
connect_client = boto3.client(
    'connect',
    region_name='us-east-1',
    config=boto3.session.Config(
        retries={
            'mode': 'adaptive',      # Adaptive mode adjusts retry rate dynamically
            'max_attempts': 10       # SDK-level max attempts before raising
        }
    )
)

def exponential_backoff_with_jitter(attempt: int, base_delay: float = 1.0, max_delay: float = 32.0) -> float:
    """
    Calculate delay using full jitter strategy.
    Full jitter: sleep = random(0, min(cap, base * 2^attempt))
    This is AWS's recommended strategy to prevent thundering herd.
    """
    delay = min(max_delay, base_delay * (2 ** attempt))
    jittered_delay = random.uniform(0, delay)
    return jittered_delay

def call_connect_api_with_retry(api_func, max_retries: int = 8, **kwargs):
    """
    Wrapper for any Amazon Connect API call with exponential backoff + jitter.
    Differentiates between retryable ThrottlingException and 
    non-retryable ServiceQuotaExceededException.
    
    Usage:
        result = call_connect_api_with_retry(
            connect_client.start_outbound_voice_contact,
            InstanceId='your-instance-id',
            ContactFlowId='your-flow-id',
            DestinationPhoneNumber='+15551234567',
            QueueId='your-queue-id'
        )
    """
    RETRYABLE_ERROR_CODES = {
        'ThrottlingException',
        'RequestLimitExceeded',
        'TooManyRequestsException',
        'ServiceUnavailableException',
        'InternalServiceException',
    }

    NON_RETRYABLE_ERROR_CODES = {
        'ServiceQuotaExceededException',  # Quota exhausted — retry won't help
        'AccessDeniedException',
        'InvalidParameterException',
        'ResourceNotFoundException',
    }

    for attempt in range(max_retries):
        try:
            response = api_func(**kwargs)
            logger.info(f"API call succeeded on attempt {attempt + 1}")
            return response

        except ClientError as e:
            error_code = e.response['Error']['Code']
            
            # Immediately raise non-retryable errors — do not waste time retrying
            if error_code in NON_RETRYABLE_ERROR_CODES:
                logger.error(
                    f"Non-retryable error '{error_code}': {e.response['Error']['Message']}. "
                    f"If ServiceQuotaExceededException, file a quota increase request."
                )
                raise

            # For throttling errors, back off and retry
            if error_code in RETRYABLE_ERROR_CODES:
                if attempt == max_retries - 1:
                    logger.error(f"Max retries ({max_retries}) exhausted for error '{error_code}'.")
                    raise

                delay = exponential_backoff_with_jitter(attempt)
                logger.warning(
                    f"ThrottlingException on attempt {attempt + 1}. "
                    f"Retrying in {delay:.2f}s... (error: {error_code})"
                )
                time.sleep(delay)
            else:
                # Unknown error — raise immediately
                raise

# Example: Start an outbound call with full retry protection
def start_outbound_call_safely(instance_id: str, contact_flow_id: str, 
                                 destination_number: str, queue_id: str):
    return call_connect_api_with_retry(
        connect_client.start_outbound_voice_contact,
        InstanceId=instance_id,
        ContactFlowId=contact_flow_id,
        DestinationPhoneNumber=destination_number,
        QueueId=queue_id
    )

Node.js (AWS SDK v3) — Adaptive Retry Mode

Amazon Connect API Throttling – AWS SDK v3 for JavaScript introduced a dedicated AdaptiveRetryStrategy that dynamically adjusts retry rates based on observed throttling — the most sophisticated client-side retry mechanism available.

import { ConnectClient, StartOutboundVoiceContactCommand } from "@aws-sdk/client-connect";
import { ConfiguredRetryStrategy } from "@aws-sdk/util-retry";

/**
 * Production-ready Connect client with Adaptive Retry Mode.
 * AWS SDK v3's AdaptiveRetryStrategy uses a client-side token bucket
 * that mirrors the server-side mechanism, reducing retry collisions.
 */
const connectClient = new ConnectClient({
  region: "us-east-1",
  retryStrategy: new ConfiguredRetryStrategy(
    8,  // maxAttempts — total attempts including the first call
    (attempt) => {
      // Full jitter exponential backoff: random(0, min(32000ms, 1000ms * 2^attempt))
      const baseDelay = 1000;
      const maxDelay = 32000;
      const exponentialDelay = Math.min(maxDelay, baseDelay * Math.pow(2, attempt));
      const jitteredDelay = Math.random() * exponentialDelay;
      console.log(`Retry attempt ${attempt + 1}, waiting ${jitteredDelay.toFixed(0)}ms`);
      return jitteredDelay;
    }
  ),
});

// Non-retryable error codes — fail fast on these
const NON_RETRYABLE_CODES = new Set([
  "ServiceQuotaExceededException",
  "AccessDeniedException",
  "InvalidParameterException",
]);

/**
 * Wraps any Connect SDK command with intelligent retry logic.
 * Distinguishes ThrottlingException (retryable) from 
 * ServiceQuotaExceededException (requires quota increase).
 */
async function executeConnectCommand(command) {
  try {
    const response = await connectClient.send(command);
    return response;
  } catch (error) {
    if (NON_RETRYABLE_CODES.has(error.name)) {
      console.error(
        `[FATAL] ${error.name}: Retry will not help. ` +
        `If ServiceQuotaExceededException, visit AWS Service Quotas console.`
      );
      throw error;
    }
    // All other errors (including ThrottlingException) are handled by 
    // the SDK's ConfiguredRetryStrategy above
    throw error;
  }
}

// Example usage: start outbound call
async function startOutboundCall({ instanceId, contactFlowId, destination, queueId }) {
  const command = new StartOutboundVoiceContactCommand({
    InstanceId: instanceId,
    ContactFlowId: contactFlowId,
    DestinationPhoneNumber: destination,
    QueueId: queueId,
  });
  return executeConnectCommand(command);
}

Also Check – Fixing Amazon Connect “acceptContact API Request Failed: Network Failure” — Complete Troubleshooting Guide

The Architectural Cure: SQS and EventBridge Buffering

Code-level retry is reactive. For high-volume workloads — outbound dialers, bulk contact imports, real-time metric polling — you need architectural rate control that absorbs bursts before they hit the Connect API.

Pattern 1: SQS as a Rate-Limiting Buffer

Place an Amazon SQS Standard Queue between your traffic source and the Connect API. A Lambda function consumes messages from the queue with reserved concurrency set to match your Connect API’s RPS limit.

[Traffic Source] → [SQS Queue] → [Lambda Consumer (concurrency=N)] → [Connect API]

Key configuration decisions:

Set Lambda reserved concurrency to match the target API’s RPS limit. If StartOutboundVoiceContact allows 2 RPS per account, set concurrency to 2.
Configure a Dead Letter Queue (DLQ) on the SQS queue to capture messages that fail after all retries — prevents data loss from persistent throttling or ServiceQuota failures.
Set VisibilityTimeout on the queue to at least 6× the Lambda timeout to prevent duplicate processing during slow retries.
For PutDialRequestBatch use cases, batch SQS messages so each Lambda invocation submits a full batch, maximizing efficiency per API call.

Pattern 2: EventBridge Scheduled Rules for Polling APIs

If you’re polling GetCurrentMetricData or GetMetricData for real-time dashboards, replace sub-second polling loops with EventBridge Scheduled Rules at the minimum polling interval your use case requires (typically 5–60 seconds). Store results in ElastiCache or DynamoDB and serve dashboard reads from cache — not directly from the Connect API.

[EventBridge Rule (every 30s)] → [Lambda Poller] → [Connect GetCurrentMetricData]
                                                   ↓
                                            [ElastiCache/DynamoDB]
                                                   ↓
                                    [Dashboard reads from cache, not Connect]

This pattern alone eliminates the most common source of ThrottlingException errors in monitoring integrations.

The Paperwork: Requesting an Amazon Connect Quota Increase

When architectural fixes and retry logic are insufficient because you’ve genuinely outgrown your service quota, file a formal increase request.

Step-by-step via AWS Console:

Open the AWS Management Console and navigate to Service Quotas (search in the top bar).
In the left panel, choose AWS Services and search for “Amazon Connect”.
You will see a list of adjustable quotas. The most commonly requested for throttling relief include:
- “Amazon Connect instances per account” — if you’re hitting instance creation limits
- “Concurrent active calls per instance” — the most common production bottleneck for voice contact centers
- “Outbound calls per second per instance” — critical for dialer campaigns using StartOutboundVoiceContact or PutDialRequestBatch
- “API requests per second” — the core throttle for control-plane API calls
Select the quota you need to increase and choose “Request quota increase”.
Enter your desired value and a business justification. Be specific: cite your concurrent agent count, expected calls per hour, and the specific API causing throttling. AWS support responds faster to detailed, data-backed requests.
AWS will review the request and may reach out for additional information. Most account-level throttle increases are approved within 1–3 business days.

Important: Not all quotas are adjustable. AWS-wide regional throttling limits (the top layer of the hierarchy) cannot be modified by any customer. Only account-level and per-API-stage quotas are eligible for increase requests.

Common Edge Cases: VDI/Citrix and Lambda-Triggered Failures

Throttling During VDI/Citrix Integration

VDI and Citrix environments present a unique throttling pattern: agent session bursts. When a Citrix farm restarts or a shift change brings 200 agents online simultaneously, each agent’s CCP (Contact Control Panel) initializes and makes several Connect API calls — GetCurrentUser, ListUsers, DescribeInstance — within seconds of each other. This synchronized burst exhausts the account-level burst budget almost instantly.

Fix: Introduce a randomized startup delay in the CCP initialization script. Adding Math.random() * 30 seconds of jitter to CCP load time staggers API calls across a 30-second window, reducing burst intensity by up to 95%.

Lambda-Triggered Connect API Failures

Lambda functions triggered by S3 events, Kinesis streams, or SNS fan-outs often create retry storms. When a Lambda hits a ThrottlingException, the event source mapping automatically retries — potentially with hundreds of concurrent Lambda invocations all retrying the same throttled API simultaneously.

Fix: Set a Maximum Concurrency limit on the Lambda event source mapping. For SQS triggers specifically, use the --scaling-config MaxConcurrency setting to cap simultaneous Connect API consumers. Combine with the SQS buffering pattern above for full protection. Additionally, configure a dead-letter queue on the Lambda’s async invocation configuration to catch events discarded after maximum retry attempts, ensuring no data loss during sustained throttling events.

Troubleshooting Checklist for Amazon Connect API Throttling

Use this checklist during a production throttling incident:

Immediate Triage

Identify the exact error code: ThrottlingException (retryable) vs. ServiceQuotaExceededException (quota increase required)
Identify which specific Amazon Connect API is being throttled (check CloudWatch Logs, not just the error message)
Confirm the error is time-correlated with a traffic spike or a provisioning event

Monitoring & Diagnosis

Enable Amazon CloudWatch Logs for your Connect instance and Lambda functions
Check the AWS/Usage CloudWatch namespace for API call counts vs. service quota limits
Set a CloudWatch Alarm when any API call count exceeds 70–80% of its quota (proactive alerting)
Use AWS CloudTrail Contributor Insights to identify the top callers of throttled APIs

Code-Level Fixes

Implement exponential backoff with full jitter on all Connect API calls
Use AWS SDK v3 ConfiguredRetryStrategy (Node.js) or mode: 'adaptive' (boto3/Python)
Differentiate non-retryable errors (ServiceQuotaExceededException) and fail fast
Set a maximum retry cap (8–10 attempts) to prevent infinite loops
Add randomized jitter to VDI/Citrix CCP initialization to prevent agent session bursts

Architectural Fixes

Buffer high-volume outbound API calls through SQS with Lambda concurrency limits
Replace polling loops with EventBridge scheduled rules + ElastiCache caching
Set MaxConcurrency on Lambda event source mappings to prevent retry storms
Configure Dead Letter Queues on all SQS queues and Lambda async invocations
Consider a multi-account strategy to obtain independent quota pools per workload

Quota Management

Open AWS Service Quotas console and check current usage vs. limits for Amazon Connect
Request a quota increase for “Concurrent active calls per instance” and “Outbound calls per second” if needed
Include business justification, agent count, and expected call volume in the quota increase request
Track quota increase request status; escalate via AWS Support if not resolved within 3 business days

Also Check – Amazon Connect Chat Widget Programmatic Launch: Complete JavaScript Guide

References – aws docs, rePost