Best Practices with Amazon QuickSight SPICE Dataset and Direct Query Mode
Introductions
Amazon QuickSight is a scalable, serverless business intelligence (BI) service powered by machine learning (ML). It seamlessly integrates data sources from Amazon Web Services (AWS), other cloud providers, and on-premises environments. Commonly utilized data sources within QuickSight include Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon Redshift.
Once connected to a data source, QuickSight allows users to create datasets that can operate in either SPICE (Super-fast, Parallel, In-memory Calculation Engine) or direct query mode. These datasets serve as the foundation for building interactive dashboards, paginated reports, and Q topics, enabling intuitive natural language querying capabilities over the underlying data.
Benefits of SPICE Memory
- SPICE, Amazon QuickSight’s in-memory calculation engine, accelerates dashboard performance by ingesting data from sources based on scheduled, on-demand, or API-driven refreshes.
- It utilizes columnar storage, in-memory technologies, and machine code generation for rapid interactive query responses on large datasets. During dataset refreshes, dashboards display the previous snapshot to ensure uninterrupted user interaction.
- SPICE reduces data source workload by querying it only when necessary for data ingestion. Charges for SPICE usage are based solely on purchased storage capacity in gigabytes (GB), allowing data consumption across multiple dashboards and users without additional costs.
- For security, data stored in SPICE is encrypted at rest, with AWS managing the encryption key by default or allowing customer-managed keys through AWS Key Management Service (AWS KMS). This setup supports auditing access and compliance with regulatory requirements, with immediate access lockdown by revoking AWS KMS key access.
- Data transfers between SPICE and users are encrypted in transit using Transport Layer Security (TLS), ensuring secure data transmission throughout.
When to use SPICE Dataset Memory?
A common use case is to use the APIs to invoke a SPICE data refresh once extract, transform, and load (ETL) processes have been completed and data is stored in the data lake or data warehouse. This on-demand refresh process involves initiating a data ingestion either through the QuickSight UI or using the QuickSight API call CreateIngestion. A point to note here is that you cannot initiate an on-demand full refresh or incremental refresh after the limit has been reached. However, this doesn’t impact schedules that are configured; they will continue to run as planned.
Another important consideration is the total SPICE capacity, which is allocated at an account level. If this available capacity is consumed by either a single dataset or multiple datasets, this will lead to data ingestion failures. To prevent such failures, you can manually purchase additional SPICE capacity from the QuickSight UI as needed. Alternatively, you can enable the Auto-purchase capacity option, which allows SPICE to automatically acquire the necessary capacity and so avoid data ingestion failures.
Benefits of using Direct Query Mode
Direct query mode enables near real-time querying directly from the data source. When a user opens or refreshes a dashboard in QuickSight, queries are sent directly to the data source. The data source processes these queries and returns results that are immediately visualized in the dashboard. Utilizing direct query mode can provide the benefits of accessing the freshest data from regularly updated sources and maintaining high-performance dashboards, especially when the data source responds quickly.
Unlike SPICE, direct query mode does not impose limits on the number of rows or dataset size, making it suitable for scenarios where data exceeds SPICE’s capacity constraints. To optimize query performance for slower queries, dataset parameters can be used to filter data and reduce the volume queried. User interactions within the dashboard, such as selections in controls, filters, and visuals, can propagate to the data source through live, customized, and parameterized SQL queries.
When to use Direct Query Mode?
- Performance in direct query mode depends on the runtime performance of the data source. Longer query execution times can lead to longer wait times for dashboard visuals compared to SPICE datasets. For a good user experience, it’s recommended to use a data source with fast response times when using direct query mode.
- QuickSight imposes a 2-minute timeout for generating dashboard visuals, applicable to both direct query mode and SPICE. Not all database drivers may respond to this timeout by cancelling queries; manual or programmatic cancellation may be required.
- Direct query mode supports up to 2,000 columns per dataset. Data source-specific timeouts also apply and vary for each type of data source.
- Scaling up the capacity of the data source may be necessary as data volume grows, which can incur additional costs. High concurrency in direct query mode, especially with frequent dashboard users, can overwhelm the data source’s ability to handle the query load effectively.
Different Scenarios with their Outcomes
Dashboard Data Updates Daily
Dashboard data often requires daily updates to reflect the previous day’s information. Typically, this involves extracting data from source systems, transforming it, and loading it into a data lake or data warehouse through batch ETL processes, usually scheduled outside of regular business hours. The recommended approach for using SPICE depends on how data updates occur in the source system.
- If individual records are updated in the source system, SPICE with a daily full refresh is suitable.
- For scenarios where data is primarily appended, SPICE with a daily incremental refresh is preferable to manage new additions.
- Align SPICE data refresh with the completion of ETL processes by scheduling it at a fixed time each day.
- If ETL processing times vary, invoke SPICE refresh programmatically using the QuickSight API after ETL completion.
Dashboard Data Updates every 20 minutes
If you require more real-time dashboard data updates, especially when new data is appended with a creation timestamp, consider scheduling a SPICE incremental refresh instead of a full refresh. This refresh can occur as frequently as every 20 minutes.
To achieve even more frequent updates, such as every 10 minutes, while leveraging SPICE for enhanced dashboard performance, you can combine different data refresh methods — full refresh, incremental refresh, and API refresh — on the same dataset.
This strategy optimizes data freshness and performance, adapting to both real-time data needs and operational efficiency throughout the day.
Dashboard Data has Batch Data Updates and Near Real-Time Data Updates
Dashboards often display diverse data types with varying dimensions and temporal granularity. For instance, a dashboard’s landing page typically showcases high-level key performance indicators (KPIs), monthly trends, and aggregated data across different dimensions. A secondary sheet within the dashboard delves deeper, offering near real-time insights at a transactional level.
Amazon QuickSight supports the integration of multiple datasets within a single dashboard. In this setup, the landing page utilizes a SPICE dataset scheduled for daily updates, ensuring fast performance with aggregated data. Meanwhile, the secondary sheet requires continuously fresh data directly from the source, utilizing direct query mode for this dataset.
Dashboard Data refreshes in Real-Time
When dashboards require immediate access to business-critical operational data, retrieving information in near real-time directly from the data source is crucial. Direct query mode stands out as the optimal solution for displaying data promptly on visuals.
In this mode, QuickSight queries data directly from the source for each visual upon dashboard loading, ensuring users always have access to the latest data available. This approach minimizes delays and supports timely decision-making and actions based on current information.
Conclusion
Thus mastering the best practices of Amazon QuickSight, particularly with SPICE datasets and Direct Query mode, is pivotal for optimizing data visualization and analysis workflows. SPICE’s in-memory calculation engine empowers users with faster query responses and offline access capabilities, making it ideal for interactive dashboards and reports. On the other hand, Direct Query mode ensures real-time data freshness directly from the source, essential for scenarios requiring up-to-the-minute insights.
By strategically choosing between SPICE and Direct Query based on data requirements and usage patterns, organizations can enhance dashboard performance, manage costs efficiently, and empower decision-makers with timely, accurate information. Implementing these best practices not only improves operational efficiency but also maximizes the value derived from Amazon QuickSight, driving continuous improvement in business intelligence strategies.