Amazon Redshift Performance Tuning - Statistics

Associate Scrum Master • Strategy

This article contains scripts that can be run to obtain statistics about tables and data blocks being stored, to provide insight for query planning.

Amazon Redshift, like other databases, requires statistics about tables and the composition of data blocks being stored in order to make good decisions when planning a query. Run one of those scripts to determine which tables are missing statistics, or contain outdated statistics:


SELECT database, schema || '.' || "table" AS "table", stats_off 
FROM info
WHERE stats_off > 5 
ORDER BY 2;


SELECT substring(trim(plannode),1,100) AS plannode,COUNT(*)
FROM dwk
WHERE plannode LIKE '%missing statistics%'
GROUP BY plannode
ORDER BY 2 DESC; 
Whenever you insert, delete, or update a significant number of rows, you should run a VACUUM command and then an ANALYZE command. Gather table statistics: 
vacuum t1;
analyze t1;

The entire database should be analyzed weekly. Also, you should plan for a daily stats collection job for most critical tables/volatile tables. To avoid a resource intensive VACUUM operation, you can load the data in sort key order, or design your table maintain data for a rolling time period, using time series tables.

Comment

0 comments

Details

Example

Published:

November 7, 2017

Last Updated:

November 7, 2017