If presented with a slow query, the first thing to inspect are the system wide KPIs, like CPU and network usage at the time the query was run. CloudWatch presents a neat dashboard to display them graphically, over time and in specific intervals that start with 1 minute and goes up to an hour. You can also get various units like 'max','min','avg' from CloudWatch.
Once the system-wide performance has been deemed satisfactory, but the query is still found to be slow, the first step to analyze it is by running explain statement on them. The "
Explain" command can be used to get the cost of the query and the execution plan. The execution plan will show those parts of the query that are very costly and needs to be tuned.explain verbose query;
The following are the types of queries that can be explained.
- SELECT
- SELECT INTO
- CREATE TABLE AS
- INSERT
- UPDATE
- DELETE
The resulting plan is a breakdown of each step the cluster takes to run the query. The table
found here shows steps available to process a DML query. Sample :
explain select * from WH.fact_mrkt_gl order by etl_run_id desc;
XN Merge (cost=1000014114457.64..1000014361571.00 rows=98845344 width=300)
Merge Key: etl_run_id
-> XN Network (cost=1000014114457.64..1000014361571.00 rows=98845344 width=300)
Send to leader
-> XN Sort (cost=1000014114457.64..1000014361571.00 rows=98845344 width=300)
Sort Key: etl_run_id
-> XN Seq Scan on fact_mrkt_gl_bal (cost=0.00..988453.44 rows=98845344 width=300)
The above example has 4 steps, a sequential scan getting the table data. Once that gets completed, it sorts them using etl_run_id as requested in the 'order by' clause. This is sent to the leader where it is merged with data from every other node before being sent to the client.
Sample of
explain verbose :
{ |
MERGE |
:startup_cost 1000014114457.64 |
:total_cost 1000014361571.00 |
:plan_rows 98845344 |
:node_id 1 |
:parent_id 0 |
:plan_width 300 |
:best_pathkeys (( |
{ |
PATHKEYITEM |
:key |
{ |
VAR |
:varno 1 |
:varattno 25 |
:vartype 23 |
:vartypmod -1 |
:varlevelsup 0 |
:varnoold 1 |
:varoattno 25 |
} |
:sortop 521 |
} |
)) . . . . |
Outline of tuning exercise :
- Identify the steps with highest cost
- Identify the joins used. The Merge Join is something that needs no tuning as it's job is to combine the data from every node. Nested Loop and Hash joins need to be tuned.
- NL Join usually happens when a join condition gets omitted, making an inner table match it's every row with the outer. This is the costliest.
- Hash joins are used when tables are joined that do not have distribution or sort keys.
- Notice the 'inner' and 'outer' tables in a join. The smaller table is generally inner table. If that does not happen, it points to inadequate statistics. Time to run the 'analyze' command
- If the 'sort' operations are of high cost, it points to Unsorted rows. 'optimize' reorganizes the table according to the sort keys
- Analyze the broadcast operations
- DS_BCAST_INNER - indicates the inner table is being broadcast to each and every other node. This is ok for a small table, but very costly as the table size increases
- DS_DIST_ALL_INNER - indicates the whole workload takes place in a single slice
- DS_DIST_BOTH - indicates heavy redistribution
- DS_DIST_ALL_NONE - No redistribution is required because the inner table was distributed to every node using DISTSTYLE ALL
- DS_DIST_NONE - No tables are redistributed: collocated joins are possible because corresponding slices are joined without moving data between nodes.
- DS_DIST_INNER- The inner table is redistributed.
Moreover, the view SVL_QUERY_SUMMARY gives a deeper analysis of a query that has already run.
Example: