Optimize Rate Limits for Application Workloads

Manage API throttling and optimize SP-API usage within your application.

When you design your Selling Partner API (SP-API) application, you must consider per-API resource rate limits. SP-API maintains a per-API resource quota for each selling partner to maintain availability and prevent overloading individual APIs.

If you exceed these rate limits, SP-API returns a 429 Too Many Requests error and throttles the call. Excessive API throttling can result in job failure, delays, and operational inefficiencies that ultimately cost your organization time and money. If you receive these error responses, you can resubmit the failed requests in a way that complies with rate limits.

This guide outlines the following strategies to help you effectively manage API throttling and optimize the performance and reliability of your SP-API applications:

Check and adhere to rate limits
Avoid spiky traffic
Implement retry and back-off techniques
Reduce the number of API requests

For comprehensive guidance on best practices across various aspects of SP-API integration, refer to the SP-API Well-Architected Guidance playlist on the Amazon SP-API Developer University channel.

Check and adhere to rate limits

Review the following guidance on how to check and adhere to rate limits.

Check rate limits

Review the usage plan for each SP-API operation in the documentation. To learn how to find the usage plan, refer to How to find your usage plan.

Compare the documented limits against the rate limit headers of the API responses. The response header is available for HTTP status codes 20x, 400, and 404. To avoid throttling, design your application to stay within these limits.

To learn more about usage plans and how the SP-API rate limiting algorithm works, refer to Usage Plans and Rate Limits.

Set up an error monitoring and alerting system

To adhere to API rate limits, it’s crucial to set up an effective system to monitor and alert when errors occur. This process typically involves the following steps:

Log API responses: Capture and store the complete API response data, including status codes, headers, and error messages, to enable analysis and categorization of errors.
Categorize errors: Organize the logged errors into relevant buckets based on HTTP status codes. For example, you can categorize 400-level client errors into the following buckets: 400 invalid input, 403 authentication issues, 404 resource not found, 429 rate limit breaches and so on.
Create an error dashboard: Visualize the error rates for each API operation and error type on a centralized dashboard to quickly identify problematic areas.
Set alerting thresholds: Define appropriate thresholds for each error type and set up alerts to proactively notify you when error rates exceed those thresholds.

If you use AWS services, you can implement this best practice by using Amazon CloudWatch:

CloudWatch logs: Capture and store the detailed API response data.
CloudWatch metrics filters: Create custom metrics to count the different error types based on status codes.
CloudWatch alarms: Monitor the error metrics and trigger notifications (for example, Amazon Simple Notification Service) when thresholds are breached.

Avoid spiky traffic

Distribute API requests uniformly across time to avoid concentrated bursts of calls to specific operations followed by periods of minimal activity. These uneven spikes cause additional 429 errors, which you can avoid by spreading out the traffic over time.

You can implement a rate limiter to manage a high volume of traffic, and allow N requests per second based on per-API resource limits. The rate limiter ensures a consistent calling pattern over time, to mitigate traffic peaks and promote uniform API usage. Use the per-API rate limit as the guideline for each API in the rate limiter.

For a step-by-step code example that uses the Selling Partner API Authentication/Authorization Library to implement a rate limiter, refer to the following sample code.

📖

Implement a rate limiter

Open Recipe

Implement retry and back-off techniques

Proactively implement the following techniques to avoid impact on your workloads and increase the reliability of your application:

Retry: Implement automatic retry logic. You can configure the retry settings by adding a small delay and queuing between your requests.
Exponential back-off: Use an exponential back-off algorithm for better flow control, with progressively longer waits between retries for consecutive error responses. Exponential back-off can lead to very long back-off times, because exponential functions grow quickly. Implement a maximum delay interval and a maximum number of retries, which you can adjust based on the operation and other local factors.
Jitter: Retries can be ineffective if all clients retry at the same time. To avoid this problem, use jitter, which is a random amount of time before making or retrying a request to help prevent large bursts by spreading out the arrival rate. Most exponential back-off algorithms use jitter to prevent successive collisions. For more information, refer to Exponential Backoff and Jitter.

Reduce the number of API requests

The following sections describe how you can use event-based workloads, batch operations, and bulk operations to reduce the number of API requests.

Event-based workloads

Monitor notifications by using the Notifications API and perform actions based on specific conditions. With the Notifications API, you can create a destination to receive notifications, subscribe to notifications, delete notification subscriptions, and so on. Instead of polling for information, your application can receive information directly from Amazon when an event invokes a notification to which you subscribe.

There are many notification types available for your application to leverage. For more information, refer to the Notifications API v1 Use Case Guide.

Batch operations

Get data for a batch of items in a single request. The SP-API supports a set of batch operations that perform the same action as one-by-one calls but for a batch of requests at a time. You can send the applicable number of requests (mostly 20) in a single API call instead of making the calls one by one.

The SP-API currently supports batch operations for the following use cases:

Searching products using the Catalog API
Fetching offer or pricing information
Getting fee estimate for products

Bulk operations

You can upload and download bulk data in a single API request.

To upload data in bulk, you can use the Feeds API. There are feeds for a wide variety of use cases, such as creating listings, managing inventory and prices, acknowledging orders, and so on. For a list of available feed types, refer to Feed Type Values.

To download data in bulk, you can use the Reports API or the Data Kiosk API. The Reports API provides reports for a variety of use cases, including monitoring inventory, tracking orders for fulfillment, getting tax information, tracking returns and seller performance, managing a selling business with Fulfillment by Amazon, and so on. For details about Reports API operations and associated data types and schemas, refer to the Reports API reference. For available report types, refer to Report Type Values.

The Data Kiosk API supports GraphQL query operations for dynamic report capabilities. GraphQL is a query language for APIs that enables you to request and receive the data that you need in a single request. Data Kiosk's GraphQL-based dynamic reporting suite helps you generate custom GraphQL queries to access bulk data from Amazon datasets. For details, refer to Data Kiosk Schema Explorer User Guide.

Other best practices

Keep in mind the following other best practices:

Monitor your usage and scale accordingly as your application grows.
Optimize your code to eliminate unnecessary API calls.
Cache frequently used data to reduce the need for repeated API requests. You can cache data on your servers using Object-level storage like Amazon S3. You can also save relatively static information in a database or serialize it in a file.
Stagger SP-API requests in a queue and do other processing tasks while waiting for the next queued job to run.

Updated 3 months ago