Consider the following tips when tuning Amazon Redshift performance:
- Amazon Redshift takes automatic snapshot when there are incremental data loads. There may be impact on the performance of the cluster when the snapshot is taken. Automatic snapshot can be disabled in AWS console by changing snapshot retention days to 0. Then create manual snapshot of the cluster when user interaction is minimal.
- Do not apply any function on the joining columns in join operation when the joining columns defined as distribution keys.
- For the performance base line, clean up entire cluster. Once clean-up is done, you will have an idea about cluster size and how much space is really available.
- Amazon Redshift is designed for analytics queries, rather than transaction processing, i.e. the cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to a commit queue. If you have queries that are waiting on the commit queue, then look for sessions that are committing multiple times per session, such as ETL jobs that are logging progress or inefficient data loads.
To aid with performance tuning, consider the following administrative queries:
select query, trim(querytxt) as sqlquery, database, datediff(seconds, starttime, endtime), starttime,endtime from stl_query where database = ‘bi_dwh’ order by starttime desc
select sum(used)::float / sum(capacity) as pct_full from stv_partitions
select query, step, rows, workmem, label, is_diskbased
from svl_query_summary
where is_diskbased = 't'
select slice, query, trim(label) querylabel, pid,
starttime, substring(text,1,20) querytext
from stv_inflight;
000038859