Building Data Kiosk workflows guide
How to integrate with the Selling Partner API to build and manage Data Kiosk workflows.
This workflow guide describes how to effectively use the Selling Partner API for Data Kiosk (Data Kiosk). With Data Kiosk, you can improve the seller experience by generating accurate business insights.
Data Kiosk's GraphQL-based dynamic reporting suite helps you generate custom GraphQL queries to access bulk data from Amazon datasets. By following the recommended steps, you can easily retrieve, analyze, and organize data, helping sellers make good decisions and understand their businesses better.
Note
As a supplement to this guide, the
SellingPartnerAPIDataKioskSampleApplication
on GitHub provides a full solution that demonstrates how to use the APIs in this guide to construct, retrieve, and store data with AWS services.
API Versions
This guide references operations in the following Selling Partner APIs:
API Reference | API Version | Use Case Guide |
---|---|---|
Data Kiosk API | 11/15/2023 | Data Kiosk Use Case Guide |
Notifications API | v1 | Notifications API Use Case Guide |
Terminology
- GraphQL: GraphQL is a query language for APIs that enables clients to request and receive the data they need in a single request, providing a more efficient dynamic report generation and advanced filtering and querying capabilities.
- Queries: Queries are structured requests made to databases or APIs to retrieve specific information based on defined criteria, facilitating data retrieval, manipulation and filtering.
- Pagination: Pagination involves breaking down a large set of results into manageable chunks or pages, with
nextToken
s serving as markers to navigate through these pages sequentially. - Schema Explorer: The Schema Explorer provides a user-friendly interface for visualizing and navigating through the structure and relationships of data schemas. It helps users build queries easily and better understand the underlying data.
- JSONL: JSON Lines is a format for storing structured data where each line represents a separate JSON object, facilitating easy processing and streaming of data.
Tutorial: Create and process Data Kiosk queries
Note
Datasets in Data Kiosk are organized collections of data regarding various aspects of the Amazon Selling ecosystem. These Datasets enable users to extract valuable insights to optimize business strategies and operations.
For more information, refer to the Data Kiosk Schema Explorer.
The following diagram highlights the recommended flow for Data Kiosk:
Step 1: Subscribe to Data Kiosk notifications
The DATA_KIOSK_QUERY_PROCESSING_FINISHED
notification is sent when a Data Kiosk query finishes processing. Both sellers and vendors have the option to subscribe to this notification. The payload of the notification will contain details regarding the returned document, query information, and the associated account information. These notifications are transmitted and managed through the Amazon Simple Queue Service (SQS).
To receive and handle DATA_KIOSK_QUERY_PROCESSING_FINISHED
notifications, it's necessary to subscribe to the queue using the Notifications API. For instructions on configuring a destination and establishing subscriptions, refer to the Notifications API v1 Use Case Guide.
Here is sample code you can use to subscribe to this notification.
Step 2: Generate the Data Kiosk Query using Schema Explorer
Data Kiosk is a REST API that uses GraphQL query operations for dynamic report functionalities. The Data Kiosk Schema Explorer helps you construct GraphQL queries efficiently. The explorer simplifies query formulation, provides attribute definitions upon hovering, and allows you to select relevant attributes according to your requirements.
For a comprehensive walkthrough on generating queries, refer to the Data Kiosk Schema Explorer User Guide and our Data Kiosk YouTube video.
After you've created your personalized query and chosen the attributes you need, be sure to minify the query and copy it for the next step.
Step 3: Create the Data Kiosk Query for processing
After you are satisfied with your query, call the createQuery
operation of the Data Kiosk API, passing in the query in the body of the request as a string. Make sure to handle any quotation mark inconsistencies for query validity, which can be done by escaping any nested quotation marks.
After sending the request, the queryId
will be returned in the response if your request had no errors.
Here is sample code you can use to create the query:
Note
The retention of a query varies based on the fields requested. Each field within a schema is annotated with a
@resultRetention
directive that defines how long a query containing that field will be retained. When a query contains multiple fields with different retentions, the shortest (minimum) retention is applied. The retention of a query's resulting documents always matches the retention of the query.
Step 4: Verify that query processing is complete
After calling createQuery
, Amazon begins processing the query. After processing is complete, the DATA_KIOSK_QUERY_PROCESSING_FINISHED
notification message is sent to the SQS queue that you subscribed to earlier.
The response can include one of the following:
- A
dataDocumentId
value if data is available as a result of the query. - An
errorDocumentId
value if there was an error during query processing. - Neither of these, if no data is returned as a result of the query processing.
For more details on the content of the Data Kiosk notification and an example notification, refer to the Data Kiosk Query Processing Finished Notification Guide.
Note
You can periodically check the processing status using the
getQuery
operation until it's marked as complete (CANCELLED
,DONE
, orFATAL
). If it's still in progress (IN_PROGRESS
orIN_QUEUE
), you can keep checking until it's done.
Step 5: Get the processed document details
To access the content of the query result document, use the getDocument
operation. Provide the dataDocumentId
or the errorDocumentId
from the notification as a parameter. This operation will give you a URL that expires in five minutes, allowing access to the document content. If the document is compressed, the Content-Encoding
header will specify the compression method. Note that this differs from how the Reports API handles compression. Even if the notification returned an errorDocumentId
, you can still use it with the getDocument
operation to get a URL for a document containing processing errors.
Here is a code sample you can use to get the processed document details:
Step 6: Retrieve document content
To obtain the query document, use the information provided in the previous step. If the document is an error document, address the issues mentioned in the error message, then recreate the query with the corrections.
Note
It's imperative to maintain encryption at rest. Under no circumstances should unencrypted query result document content be stored on disk, even temporarily, as it might contain sensitive information.
Tutorial: Cancel a query in progress
The cancelQuery
operation is used to cancel a query identified by the queryId
parameter. It is used when a query is in progress or queued (when processingStatus
is IN_QUEUE
or IN_PROGRESS
). Attempting to cancel a query that has already been terminated (when processingStatus
is CANCELLED
) will result in no operation being performed.
When a query is successfully canceled, it will be reflected in subsequent calls to the getQuery
and getQueries
operations. This ensures that the status of canceled queries can be retrieved for monitoring and management purposes.
Here is a code sample you can use to cancel a Data Kiosk document in progress:
Handling processing errors
There are two types of errors in the Data Kiosk API: synchronous errors and asynchronous errors.
- Synchronous errors: These errors occur during the initial query creation process using the
createQuery
operation. They prevent the acceptance and further processing of the submission. Typically, synchronous errors are related to syntax issues in the query or mishandled query parameters in the request. - Asynchronous errors: These errors are generated after the submission has been made and initial validation has passed. They occur during processing and are not immediately returned. Asynchronous errors are fetched in the form of error documents. These errors could arise from issues with the content of the query or the data requested. The error message returned provides insights to resolve the problem.
Common errors
400 errors: One common error encountered when using the createQuery
operation is related to query syntax and validation. If the submitted query contains invalid syntax or includes fields that are not recognized by the API, it can result in a 400 error. To address this issue, you should carefully review the structure and parameters of the query to ensure they comply with the API requirements. Making necessary adjustments to correct any syntax errors before resubmitting the query can help resolve this issue.
429 errors: Another potential error has to do with query concurrency limits. When attempting to create a new query with the createQuery
endpoint, a 429 error can occur if there's already a query from the same domain in progress. This indicates the API has reached its concurrency limit for handling simultaneous queries from the same domain. To address this, implement appropriate handling mechanisms in your application. This could involve either waiting for the in-progress query to complete or canceling it using the cancelQuery
endpoint before submitting another.
Refer to the Data Kiosk Best Practices section for more help on how to avoid these errors.
Error document analysis and resubmission
When an errorDocumentId
is returned, it's crucial to retrieve and analyze the error document. This can involve identifying the nature of the error, determining potential fixes, and resubmitting the request with corrected parameters or data.
To access the content of the error document, the process is similar to the standard step of obtaining the processed document, but with the inclusion of the errorDocumentId
as a parameter instead of dataDocumentId
. This ensures consistent steps for retrieving and fetching content, but with the specific errorDocumentId
parameter.
No data availability
An empty documentId
indicates that no data is available in the report. In such cases, try adjusting the date-time window of the requested data.
Retry policies
Implementing retry policies can be beneficial for handling transient errors, such as temporary network issues. However, it's crucial to apply retries thoughtfully to prevent overwhelming the server with repeated failed requests. Employing exponential backoff is a recommended strategy where the interval between retries increases exponentially with each attempt, mitigating the risk of overloading the server.
Data Kiosk best practices
Query efficiency and optimization
Data Kiosk relies on GraphQL for building queries and retrieving data. GraphQL's ability to request only the needed data minimizes network traffic, enhancing performance. To get the most out of GraphQL, keep these principles in mind:
- Optimize data retrieval: By requesting only necessary fields, you minimize unnecessary data transfer, leading to faster response times and reduced network traffic. Avoid fetching nested structures or additional fields that aren't vital for your application's functionality, as this can bloat response payloads and worsen performance.
- Filter: Use GraphQL's filtering capabilities to narrow down query results based on specific criteria. By using filter arguments, you can ensure that only relevant data is returned, reducing the size of response payloads and improving query performance. This targeted approach enhances the efficiency of your application's data retrieval process.
- Order: When structuring GraphQL queries, keep in mind that the order of attributes determines the order of returned data in the JSON response. By organizing query attributes in the desired order, you can ensure that data is returned consistently and predictably. This is particularly important when displaying ordered lists or collections in your application.
Handling concurrent requests
Similar to the Reports API, Data Kiosk APIs also have specific request rate limits in place. Exceeding these limits will result in a throttling error, preventing your request from being processed. Data Kiosk rate limits are the same as the Reports API.
Additionally, Data Kiosk processes queries on a first-come, first-served basis. A newly submitted query will only be processed after the previous query has completed its processing. To manage the submission of requests that rely on the completion of preceding ones, consider the following:
- Implement a request queue: To manage the sequential processing of requests effectively, maintain a request queue within your application. Whenever a new request is made, add it to the queue. As requests complete processing, check the queue for pending requests and process them in the order they were received. This ensures that requests are handled sequentially and helps prevent race conditions or concurrency issues.
- Implement cancellation mechanisms: Allow users to cancel pending requests if they are no longer needed. This improves the responsiveness and usability of your application, as users have more control over the request processing flow. For guidance on implementing this feature, refer to the cancel query tutorial, which includes explanations and sample code.
Updated about 2 months ago