EducationSoftwareStrategy.com
StrategyCommunity

Knowledge Base

Product

Community

Knowledge Base

TopicsBrowse ArticlesDeveloper Zone

Product

Download SoftwareProduct DocumentationSecurity Hub

Education

Tutorial VideosSolution GalleryEducation courses

Community

GuidelinesGrandmastersEvents
x_social-icon_white.svglinkedin_social-icon_white.svg
Strategy logoCommunity

© Strategy Inc. All Rights Reserved.

LegalTerms of UsePrivacy Policy
  1. Home
  2. Topics

Amazon Redshift Performance Tuning - Sort Keys


Norman Matos

Associate Scrum Master • Strategy


This article explains how to get a list of recommended sort keys to use to optimize Amazon Redshift query performance.

There can be multiple sort keys defined on a table and their purpose it to store the data in nodes in a sorted order which helps in a group by or order by operation. They act like an index in other databases, but which does not incur a storage cost as with other platforms.
 
It is recommended to use the sort key on columns used in WHERE clause. Notice that queries evaluated against a sort key column must not apply a SQL function to the sort key. Instead, ensure that you apply the functions to the compared values so that the sort key is used. This is commonly found on TIMESTAMP columns that are used as sort keys.
Run below query to get a list of recommended sort keys based on query activity:
 
 


SELECT ti.schemaname||'.'||ti.tablename AS "table", 
  ti.tbl_rows, 
  avg(r.s_rows_pre_filter) avg_s_rows_pre_filter, 
  round(1::float - avg(r.s_rows_pre_filter)::float/ti.tbl_rows::float,6) avg_prune_pct, 
  avg(r.s_rows) avg_s_rows, 
  round(1::float - avg(r.s_rows)::float/avg(r.s_rows_pre_filter)::float,6) avg_filter_pct, 
  ti.diststyle, 
  ti.sortkey_num, 
  ti.sortkey1, 
  trim(a.typname) "type", 
  count(distinct i.query) * avg(r.time) AS total_scan_secs, 
  avg(r.time) AS scan_time, 
  count(distinct i.query) AS num, 
  max(i.query) AS query, 
  trim(info) AS filter 
FROM exp p 
JOIN detail i 
ON (i.userid=p.userid AND i.query=p.query AND i.nodeid=p.nodeid ) 
JOIN dvsm s 
ON (s.userid=i.userid AND s.query=i.query AND s.segment=i.segment AND s.step=i.step) 
JOIN (
  SELECT table_id,
    "table" tablename,
    schema schemaname,
    tbl_rows,
    unsorted,
    sortkey1,
    sortkey_num,
    diststyle 
  FROM dd_details ti 
ON ti.table_id=s.tbl 
JOIN ( 
  SELECT query, 
    segment, 
    step, 
    datediff(s,min(starttime),max(endtime)) AS time, 
    sum(rows) s_rows, 
    sum(rows_pre_filter) s_rows_pre_filter, 
    round(sum(rows)::float/sum(rows_pre_filter)::float,6) filter_pct 
  FROM dvsm
  WHERE userid>1 AND starttime::date = current_date-1 AND starttime < endtime 
  GROUP BY 1,2,3 HAVING sum(rows_pre_filter) > 0 ) r 
ON (r.query = i.query AND r.segment = i.segment AND r.step = i.step) 
LEFT JOIN (
  SELECT attrelid,
    t.typname 
  FROM att a 
  JOIN ghls t 
  ON t.oid=a.atttypid 
  WHERE attsortkeyord IN (1,-1)) a 
ON a.attrelid=s.tbl 
WHERE s.type = 2 AND ti.tbl_rows > 1000000 AND p.info LIKE 'Filter:%' AND p.nodeid > 0 
GROUP BY 1,2,7,8,9,10,15 
ORDER BY 1, 13 desc, 11 desc;

               
General recommendation is to define columns such as date or timestamp that you use in queries as sort keys.
 


Comment

0 comments

Details

Example

Published:

November 7, 2017

Last Updated:

June 13, 2018