PostgreSQL'de Indexler

Haziran 17, 2020

PostgreSQL'de Indexler

PostgreSQL'de temelde 5 adet index tipi vardır:

1. B-Tree index: Varsayılan olarak kullanışan index tipidir. Diğerlerine göre kullanımı daha yaygındır. Sorgunun WHERE kısmında '=,>,<, LIKE, ILIKE, IN, NOT IN, BETWEEN' gibi ifadeler geçiyorsa B-Tree index sorgunun bu kısmını hızlandıracaktır. 'CREATE INDEX' komutunda index tipi belirtilmezse index B-Tree tipinde oluşur.

2. GIN Index: Genellikle tam metin aramalarında kullanılır. Dizi değerlerinin bulunduğu kolonlarda GIN Index B-Tree indexe göre daha performanslıdır. Tam metin aramalarında genellikle tsvector veri tipi kullanıldığı için ilgili kolona GIN Index eklenmesi önerilir.

3. GIST Index: Genellikle geometrik veri tipli kolonlarda, zaman tipli kolonlarda ve tam metin aramaları için kullanılır. GIN Indexten farklı olarak PostGIS'te kullanımı yaygındır.

4. Hash Index: Sadece eşitlik (=) operatörü için kullanılır. PostgreSQL 10 itibari ile WAL'a yazılmaya başlamıştır. Bu nedenle PostgreSQL 10 öncesinde kullanılması önerilmez.

5. BRIN Index: Büyük veri setlerinde diğer index tiplerine göre daha yararlıdır. İndexin boyutu B-Tree indexe göre oldukça küçük oluşur. Bu nedenle tarih veya şehirlerin zip kodları gibi kolonlarda kullanılması önerilir.

'EXPLAIN' ile sorgu planı inceleme

EXPLAIN ifadesi bir sorgunun çalışma planını bize sunar. Sorgu aslında gerçekte çalışmaz, yani INSERT/UPDATE/DELETE sorgularına da EXPLAIN ile planına bakabilirsiniz. EXPLAIN'i aşağıdaki seçeneklerle birlikte kullanabilirsiniz:

ANALYZE [ boolean ]: Sorguyu gerçekten çalıştırır ve tüm kısımlarla ilgili ayrıntılı bilgi verir. INSERT/UPDATE/DELETE sorguları için;
     BEGIN;
     EXPLAIN ANALYZE sorgu;
     ROLLBACK;
ifadesi içinde kullanılması önerilir.
VERBOSE [ boolean ]: Varsayılan değeri FALSE olan bu değer sorgu ile ilgili ayrıntılı bilgi verir.
COSTS [ boolean ]: Varsayılan olarak TRUE gelen bu değer sorgunun toplam maliyeti ve her bir parçasının tahmini satır değeri gibi ayrıntıları verir.
BUFFERS [ boolean ]: Varsayılan olarak FALSE gelen ve yalnızca ANALYZE ile kullanılabilen bu değer sorgunun buffer kullanımı ile ilgili bilgi verir.
TIMING [ boolean ]: Varsayılan olarak TRUE gelen ve yalnızca ANALYZE ile kullanılabilen bu değer sorgunun her bir parçasının süresi ile ilgili ayrıntılı bilgi verir.
SUMMARY [ boolean ]: ANALYZE ile birlikte varsayılan olarak kullanılan bu değer sorgu planının en sonunda plan ve çalışma süreleri ile ilgili süre bilgisi verir.
FORMAT { TEXT | XML | JSON | YAML }: Varsayılan olarak TEXT gelen bu değer sorgu planının tipini belirlemek için kullanılır.

EXPLAIN ile sorgu planına bakalım:

postgres=# EXPLAIN SELECT c2 FROM t1 WHERE c1 < 20;
                              QUERY PLAN
-----------------------------------------------------------------------
Gather (cost=1000.00..20504.33 rows=100 width=33)
   Workers Planned: 2
   -> Parallel Seq Scan on t1 (cost=0.00..19494.33 rows=42 width=33)
         Filter: (c1 < 20)
(4 rows)

ANALYZE ifadesi eklendiğinde sorgu gerçekten çalıştı ve sorgu planında satır satıyı ve çalışma süresi ile ilgili ayrıntılı bilgi verdi:

postgres=# EXPLAIN ANALYZE SELECT c2 FROM t1 WHERE c1 < 20;
                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..20504.33 rows=100 width=33) (actual time=0.463..66.092 rows=19 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Parallel Seq Scan on t1 (cost=0.00..19494.33 rows=42 width=33) (actual time=36.005..56.564 rows=6 loops=3)
         Filter: (c1 < 20)
         Rows Removed by Filter: 333327
Planning Time: 0.096 ms
Execution Time: 66.126 ms
(8 rows

COST OFF ifadesini eklediğimizde maliyet bilgisi planda yer almadı:

postgres=# EXPLAIN (ANALYZE, COSTS OFF) SELECT c2 FROM t1 WHERE c1 < 20;
                                QUERY PLAN
---------------------------------------------------------------------------
Gather (actual time=0.594..67.692 rows=19 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Parallel Seq Scan on t1 (actual time=36.834..57.909 rows=6 loops=3)
         Filter: (c1 < 20)
         Rows Removed by Filter: 333327
Planning Time: 0.117 ms
Execution Time: 67.729 ms
(8 rows)

BUFFERS ON ifadesi sorgu için ne kadar buffer'ın shared buffers'tan geldiğini gösterir:

postgres=# EXPLAIN (ANALYZE, BUFFERS ON) SELECT c2 FROM t1 WHERE c1 < 20;
                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..20504.33 rows=100 width=33) (actual time=0.662..62.511 rows=19 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=14286
   -> Parallel Seq Scan on t1 (cost=0.00..19494.33 rows=42 width=33) (actual time=33.660..52.948 rows=6 loops=3)
         Filter: (c1 < 20)
         Rows Removed by Filter: 333327
         Buffers: shared hit=14286
Planning Time: 0.144 ms
Execution Time: 62.559 ms
(10 rows)

İndex oluşturma işlemi

'CREATE INDEX' komutu ile oluşturulan indexler tabloda işlem esnasında lock oluşmasıne neden olur. İndex oluşturulurken kolona yalnızca 'SELECT' sorguları gelebilir. Onun dışında 'INSERT/UPDATE/DELETE' sorguları işlem bitimine kadar bekletilir. Büyük kolonlarda bu işlem uzun süreceği için canlı veritabanlarında 'CREATE INDEX' komutu ile index eklerken dikkat edilmesi gerekir.

'CREATE INDEX CONCURRENTLY' komutu index oluşturulan kolonun lock almasına sebep olmaz. Bu nedenle canlı sunucularda da kullanılabilir. İndex oluşturulan kolona sorgu geldiğinde index işlemi sorguya öncelik verir. Bu nedenle indexin oluşması daha uzun sürecektir fakat süreçleri engellemediği için daha kullanışlıdır.

Doğru index oluşturma

İndexler temelde sorguları hızlandırmak için kullanılır. Fakat index yanlış oluşturulursa yarardan çok zararı olacaktır. Çünkü eklenen index temelde kolonun kendisi kadar yer kaplar, bu nedenle gereksiz bir index oluşturulursa diskte gereksiz yer kaplamış olur. Bununla birlikte, ihtiyaç dışı indexler sorguların gereksiz index taramasına neden olursa sorgunun var olan süresi index ile birlikte daha da uzun olacaktır.

Öncelikle random verilerden oluşan bir tablo oluşturalım:

postgres=# CREATE TABLE t1 (c1 serial, c2 text, c3 text, c4 int, c5 int, c6 int);
CREATE TABLE

postgres=# INSERT INTO t1(c2,c3,c4,c5,c6) SELECT md5(random()::text),md5(random()::text), random()*1000+25,random()*5+25,random()*100+25 FROM generate_series(1,1000000);
INSERT 0 1000000

postgres=# SELECT * FROM t1 LIMIT 2;
-[ RECORD 1 ]------------------------
c1 | 1
c2 | cc8d904e161a8a6ae1c1a639d4f1388a
c3 | b32ecc22ea654d3fc31c342052f18876
c4 | 756
c5 | 30
c6 | 47
-[ RECORD 2 ]------------------------
c1 | 2
c2 | 9d3c0b4757614c6a2a778decdd75398e
c3 | 0062e5cb8acf75c43cc183625a9ebd00
c4 | 386
c5 | 28
c6 | 118

Sorgu planını ilk olarak EXPLAIN ile inceleyelim:

Sorguda t1 tablosunun c1 kolonunda sequential scan yapıldığını görüyoruz. Sequential scan yapılmasının nedeni tablonun küçük olması, kolonun büyük bir kısmının WHERE koşulunda filtrelenmesi, kolonda index olmaması olabilir.
Bunun için c1 kolonuna index eklemeyi deneyelim:

postgres=# CREATE INDEX CONCURRENTLY ON t1 (c1);
CREATE INDEX
postgres=# EXPLAIN (ANALYZE, BUFFERS ON) SELECT c2 FROM t1 WHERE c1 < 20;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using t1_c1_idx on t1 (cost=0.42..8.76 rows=19 width=33) (actual time=0.016..0.039 rows=19 loops=1)
   Index Cond: (c1 < 20)
   Buffers: shared hit=4
Planning Time: 0.390 ms
Execution Time: 0.081 ms

c1 kolonunda index olmadığı için sorguda sequential scan yapılıyormuş, index eklediğimizde index scan yapıldı ve sorgunun süresi kısaldı.

B-Tree index:

İndex oluştururken tipini belirtmediğimizde varsayılan olarak B-Tree index oluşur. Aşağıdaki gibi index oluşturup sorgumuzu hızlandırabiliriz:

postgres=# CREATE TABLE table1 (column1 int);
CREATE TABLE

postgres=# INSERT INTO table1 SELECT generate_series(1,100000);
INSERT 0 100000

postgres=# EXPLAIN ANALYZE SELECT column1 FROM table1 WHERE column1 < 100;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on table1 (cost=0.00..1564.50 rows=8939 width=4) (actual time=0.324..29.828 rows=99 loops=1)
   Filter: (column1 < 100)
   Rows Removed by Filter: 99901
Planning Time: 0.093 ms
Execution Time: 29.869 ms
(5 rows)

postgres=# CREATE INDEX CONCURRENTLY ON table1 (column1);
CREATE INDEX

postgres=# EXPLAIN ANALYZE SELECT column1 FROM table1 WHERE column1 < 100;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using table1_column1_idx on table1 (cost=0.29..10.03 rows=99 width=4) (actual time=0.034..0.123 rows=99 loops=1)
   Index Cond: (column1 < 100)
   Heap Fetches: 99
Planning Time: 0.469 ms
Execution Time: 0.182 ms
(5 rows)

GIN Index:

GIN index, veri tipi tsvector olan kolona direkt oluşturulabilir. Veri tipi text, integer ve benzeri olan kolonlarda oluşturulurken şöyle hata verir: " data type integer has no default operator class for access method "gin"". Bu nedenle pg_trgm eklentisini kullanarak index ekleyebiliriz:

postgres=# CREATE TABLE table2(column1 tsvector, column2 text);
CREATE TABLE

postgres=# INSERT INTO table2(column2) SELECT md5(random()::text) FROM generate_series(1,1000000);
INSERT 0 1000000

postgres=# UPDATE table2 SET column1 = tsvector(column2);
UPDATE 1000000

postgres=# EXPLAIN ANALYZE SELECT * FROM table2 WHERE column1 = '43b292219faa56bdac15f283f601d2eb' OR column2 = '2b45ef01affacb2dffaaf429395efa63';
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..28917.20 rows=2 width=74) (actual time=0.686..86.759 rows=2 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Parallel Seq Scan on table2 (cost=0.00..27917.00 rows=1 width=74) (actual time=50.085..77.421 rows=1 loops=3)
         Filter: ((column1 = '''43b292219faa56bdac15f283f601d2eb'''::tsvector) OR (column2 = '2b45ef01affacb2dffaaf429395efa63'::text))
         Rows Removed by Filter: 333333
Planning Time: 0.126 ms
Execution Time: 86.799 ms
(8 rows)

postgres=# CREATE INDEX CONCURRENTLY ON table2 USING gin (column1);
CREATE INDEX

postgres=# CREATE EXTENSION pg_trgm;
CREATE EXTENSION

postgres=# CREATE INDEX CONCURRENTLY ON table2 USING gin (column2 gin_trgm_ops);
CREATE INDEX

postgres=# EXPLAIN ANALYZE SELECT * FROM table2 WHERE column2 ilike '%b2dffaaf4%';
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on table2 (cost=100.78..481.64 rows=100 width=74) (actual time=5.160..5.161 rows=1 loops=1)
   Recheck Cond: (column2 ~~* '%b2dffaaf4%'::text)
   Heap Blocks: exact=1
   -> Bitmap Index Scan on table2_column2_idx (cost=0.00..100.75 rows=100 width=0) (actual time=5.147..5.147 rows=1 loops=1)
         Index Cond: (column2 ~~* '%b2dffaaf4%'::text)
Planning Time: 0.506 ms
Execution Time: 5.206 ms
(7 rows)

Gin index, tsvector, pg_trgm ile ilgili ayrıntılı bilgi için PostgreSQL'de Tam Metin Araması Temelleri blog yazımı okuyabilirsiniz.

GIST Index:

GIST indexlerin geometrik verilerde sıklıkla kullanıldığından bahsetmiştik. Circle veri tipli kolona gist index eklendiğinde sorgu hızının index sayesindeki değişimini görebilirsiniz:

postgres=# CREATE TABLE table3 (column1 circle);
CREATE TABLE

postgres=# insert into table3 values ('<(1,1),5>'),('<(2,2),6>'),('<(3,3),7>'),('<(4,4),8>'),('<(5,5),9>'),('<(6,6),10>'),('<(7,7),9>'),('<(11,10),10>'),('<(1,10),10>'),('<(9,9),9>'),('<(12,12),12>');
INSERT 0 11

postgres=# explain analyze select * from table3 where column1 @> circle '((12,12),12)';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Seq Scan on table3 (cost=10000000000.00..10000004741.77 rows=251 width=24) (actual time=0.035..34.261 rows=22842 loops=1)
   Filter: (column1 @> '<(12,12),12>'::circle)
   Rows Removed by Filter: 228420
Planning Time: 0.192 ms
Execution Time: 35.345 ms
(5 rows)

postgres=# CREATE INDEX CONCURRENTLY ON table3 USING gist(column1);

postgres=# explain analyze select * from table3 where column1 @> circle '((12,12),12)';
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on table3 (cost=14.23..682.70 rows=251 width=24) (actual time=3.222..12.294 rows=22842 loops=1)
   Recheck Cond: (column1 @> '<(12,12),12>'::circle)
   Heap Blocks: exact=1601
   -> Bitmap Index Scan on table3_column1_idx (cost=0.00..14.17 rows=251 width=0) (actual time=2.944..2.944 rows=22842 loops=1)
         Index Cond: (column1 @> '<(12,12),12>'::circle)
Planning Time: 0.053 ms
Execution Time: 13.389 ms
(7 rows)

NOT: Küçük tablolarda index olmasına rağmen sequential scan yapılabilir. Bu nedenle tabloyu büyütmek amacıyla yukarıdaki INSERT ifadesindeki verilerden yaklaşık 250.000 tane veri tabloya ekledim. Test etmek isterseniz bu şekilde test edebilirsiniz.

Hash Index:

Hash indexi aşağıdaki gibi oluşturup kullanabiliriz WHERE koşuşunda eşitlik olduğunda sorguyu hızlandırabiliriz:

postgres=# CREATE TABLE table4 (column1 integer);
CREATE TABLE

postgres=# INSERT INTO table4 SELECT generate_series(1,100000);
INSERT 0 100000

postgres=# EXPLAIN ANALYZE SELECT * FROM table4 WHERE column1 = 100;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on table4 (cost=0.00..1693.00 rows=1 width=4) (actual time=0.058..12.672 rows=1 loops=1)
   Filter: (column1 = 100)
   Rows Removed by Filter: 99999
Planning Time: 0.292 ms
Execution Time: 12.699 ms
(5 rows)

postgres=# CREATE INDEX CONCURRENTLY ON table4 USING HASH(column1);
CREATE INDEX

postgres=# EXPLAIN ANALYZE SELECT * FROM table4 WHERE column1 = 100;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Scan using table4_column1_idx on table4 (cost=0.00..8.02 rows=1 width=4) (actual time=0.023..0.025 rows=1 loops=1)
   Index Cond: (column1 = 100)
Planning Time: 0.271 ms
Execution Time: 0.055 ms
(4 rows)

BRIN Index:

Brin index örneği için bir tarih kolonuna index ekleyelim ve örnek bir sorgunun performansını görelim. Brin indexin boyutunun daha küçük olduğundan bahsetmiştik. Aşağıdaki örnekte olduğu gibi B-Tree indexin boyutu 214 MB iken BRIN indexin boyutu 48 kB:

postgres=# CREATE TABLE table5(column1 date);
CREATE TABLE

postgres=# INSERT INTO table5 SELECT CURRENT_TIMESTAMP + ( g || 'minute' ) :: interval FROM generate_series(1,10000000) as g;
INSERT 0 10000000

postgres=# EXPLAIN ANALYZE SELECT * FROM table5 WHERE column1 between '2020-06-17 12:10:31.363419' AND ' 2020-06-17 12:12:31.363419';
                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..107748.10 rows=1 width=8) (actual time=0.581..352.801 rows=3 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Parallel Seq Scan on table5 (cost=0.00..106748.00 rows=1 width=8) (actual time=184.537..301.495 rows=1 loops=3)
         Filter: ((column1 >= '2020-06-17 12:10:31.363419'::timestamp without time zone) AND (column1 <= '2020-06-17 12:12:31.363419'::timestamp without time zone))
         Rows Removed by Filter: 3333332
Planning Time: 0.224 ms
Execution Time: 352.834 ms
(8 rows)

postgres=# CREATE INDEX CONCURRENTLY ON table5 USING brin (column1);
CREATE INDEX

postgres=# EXPLAIN ANALYZE SELECT * FROM table5 WHERE column1 between '2020-06-17 12:10:31.363419' AND ' 2020-06-17 12:12:31.363419';
                                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on table5 (cost=16.03..41733.70 rows=1 width=8) (actual time=0.349..7.394 rows=3 loops=1)
   Recheck Cond: ((column1 >= '2020-06-17 12:10:31.363419'::timestamp without time zone) AND (column1 <= '2020-06-17 12:12:31.363419'::timestamp without time zone))
   Rows Removed by Index Recheck: 28925
   Heap Blocks: lossy=128
   -> Bitmap Index Scan on table5_column1_idx (cost=0.00..16.03 rows=28902 width=0) (actual time=0.313..0.313 rows=1280 loops=1)
         Index Cond: ((column1 >= '2020-06-17 12:10:31.363419'::timestamp without time zone) AND (column1 <= '2020-06-17 12:12:31.363419'::timestamp without time zone))
Planning Time: 0.337 ms
Execution Time: 7.440 ms
(8 rows)

postgres=# \di+ table5_column1_idx
                               List of relations
Schema |        Name        | Type | Owner   | Table | Size | Description
--------+--------------------+-------+----------+--------+-------+-------------
public | table5_column1_idx | index | postgres | table5 | 48 kB |
(1 row)

postgres=# DROP INDEX CONCURRENTLY table5_column1_idx;
DROP INDEX

postgres=# CREATE INDEX CONCURRENTLY ON table5(column1);
CREATE INDEX

postgres=# EXPLAIN ANALYZE SELECT * FROM table5 WHERE column1 between '2020-06-17 12:10:31.363419' AND ' 2020-06-17 12:12:31.363419';
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using table5_column1_idx on table5 (cost=0.43..8.50 rows=3 width=8) (actual time=0.018..0.022 rows=3 loops=1)
   Index Cond: ((column1 >= '2020-06-17 12:10:31.363419'::timestamp without time zone) AND (column1 <= '2020-06-17 12:12:31.363419'::timestamp without time zone))
   Heap Fetches: 3
Planning Time: 0.706 ms
Execution Time: 0.061 ms
(5 rows)

postgres=# \di+ table5_column1_idx
                               List of relations
Schema |        Name        | Type | Owner   | Table | Size | Description
--------+--------------------+-------+----------+--------+--------+-------------
public | table5_column1_idx | index | postgres | table5 | 214 MB |
(1 row)

İndex oluşturma örnekleri

Yukarıdaki örneklerde gördüğümüz gibi, index oluştururken index adı tanımlamadığımız için varsayılan olarak index adı tabloadı_kolonadı_idx olarak tanımlandı. 'Doğru index oluşturma' bölümünde oluşturduğumuz t1 tablomuzda index olştururken tipini belirtmediğimiz için ise varsayılan olarak btree index olarak tanımlandı:

WHERE koşulunda birden fazla kolon varsa o kolonları içeren tek bir index oluşturmak ayrı ayrı index oluşturmaktan daha yararlı olur. Örneğin WHERE koşulunda c4, c5 ve c6 olan bir sorgu yazalım ve planına bakalım:

postgres=# EXPLAIN (ANALYZE, BUFFERS ON) SELECT * FROM t1 WHERE c4 > 800 AND c5 > 25 AND c6 > 100;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..27601.87 rows=50242 width=82) (actual time=0.561..59.345 rows=49555 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=14286
   -> Parallel Seq Scan on t1 (cost=0.00..21577.67 rows=20934 width=82) (actual time=0.021..47.360 rows=16518 loops=3)
         Filter: ((c4 > 800) AND (c5 > 25) AND (c6 > 100))
         Rows Removed by Filter: 316815
         Buffers: shared hit=14286
Planning Time: 0.249 ms
Execution Time: 61.526 ms
(10 rows)

Kolonlarda index yokken parallel sequential scan yapıldı ve sorgu 61 ms sürdü. Tüm c4, c5 ve c6'ya ayrı ayrı index ekleyerek sorgu planını inceleyelim:

postgres=# CREATE INDEX CONCURRENTLY ON t1 (c4);
CREATE INDEX
postgres=# CREATE INDEX CONCURRENTLY ON t1 (c5);
CREATE INDEX
postgres=# CREATE INDEX CONCURRENTLY ON t1 (c6);
CREATE INDEX
postgres=# EXPLAIN (ANALYZE, BUFFERS ON) SELECT * FROM t1 WHERE c4 > 800 AND c5 > 25 AND c6 > 100;
postgres=# EXPLAIN (ANALYZE, BUFFERS ON) SELECT * FROM t1 WHERE c4 > 800 AND c5 > 25 AND c6 > 100;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=4217.71..22481.41 rows=50242 width=82) (actual time=28.031..99.018 rows=49555 loops=1)
   Recheck Cond: (c4 > 800)
   Filter: ((c5 > 25) AND (c6 > 100))
   Rows Removed by Filter: 174673
   Heap Blocks: exact=14286
   Buffers: shared hit=12829 read=2073 written=1
   -> Bitmap Index Scan on t1_c4_idx (cost=0.00..4205.15 rows=227297 width=0) (actual time=24.680..24.680 rows=224228 loops=1)
         Index Cond: (c4 > 800)
         Buffers: shared read=616
Planning Time: 0.703 ms
Execution Time: 101.372 ms
(11 rows)

Parallel sequential scan indexler sayesinde yerini bitmap index scan'e bıraktı. Fakat 3 farklı index tarandığı için sorgu eskisinden daha uzun sürdü. Var olan indexleri kaldırıp ortak tek bir index oluşturalım ve sorgu planını inceleyelim:

postgres=# DROP INDEX CONCURRENTLY t1_c4_idx ;
DROP INDEX
postgres=# DROP INDEX CONCURRENTLY t1_c5_idx ;
DROP INDEX
postgres=# DROP INDEX CONCURRENTLY t1_c6_idx ;
DROP INDEX
postgres=# CREATE INDEX CONCURRENTLY ON t1 (c4,c5,c6);
CREATE INDEX
postgres=# EXPLAIN (ANALYZE, BUFFERS ON) SELECT * FROM t1 WHERE c4 > 800 AND c5 > 25 AND c6 > 100;

                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=6362.20..21527.43 rows=50242 width=82) (actual time=18.528..38.192 rows=49555 loops=1)
   Recheck Cond: ((c4 > 800) AND (c5 > 25) AND (c6 > 100))
   Heap Blocks: exact=13875
   Buffers: shared hit=14737
   -> Bitmap Index Scan on t1_c4_c5_c6_idx (cost=0.00..6349.64 rows=50242 width=0) (actual time=16.179..16.179 rows=49555 loops=1)
         Index Cond: ((c4 > 800) AND (c5 > 25) AND (c6 > 100))
         Buffers: shared hit=862
Planning Time: 0.201 ms
Execution Time: 40.114 ms
(9 rows)

Ortak bir index oluşturduğumuzda sorgu süresi 40ms'ye düştü. Fakat bunu yaparken veritabanına gelen diğer sorgulara dikkat etmemiz gerekir. Mesela WHERE koşulunca sadece c4 olan bir sorgu varsa c4'teki indexi kaldırdığımızda bahsi geçen sorgu yavaşlayacaktır. Bu nedenle sorgu optimizasyonu yaparken veritabanına sık sık gelen tüm sorguları göz önünde bulundurmalıyız.

JOIN'li yapılarda indexler:

JOIN ifadesi ile iki farklı tablodan veri çekeceksek sorgu planına göre her iki tabloya da index eklememiz gerekebilir. Bir örneğini görmek için t2 tablosu oluşturalım ve random veri ekleyelim:

postgres=# CREATE TABLE t2 (a1 serial, a2 text, a3 text, a4 int, a5 int, a6 int);
CREATE TABLE
postgres=# INSERT INTO t2(a2,a3,a4,a5,a6) SELECT md5(random()::text),md5(random()::text), random()*150+225,random()*100+15,random()*10000+25 FROM generate_series(1,1000000);
INSERT 0 1000000

Örnek bir SELECT sorgusunun planına bakalım. Sorgu hiç index olmadan 42 ms sürüyor:

postgres=# EXPLAIN ANALYZE SELECT t1.c1, t1.c2, t2.a1, t2.a2 FROM t1 inner join t2 on t1.c6=t2.a6;

                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=45576.00..5824771553.47 rows=452702625000 width=73) (actual time=24852.877..39598.092 rows=101254118 loops=1)
   Hash Cond: (t2.a6 = t1.c6)
   -> Seq Scan on t2 (cost=0.00..2112612.25 rows=90540525 width=40) (actual time=24430.768..27743.081 rows=1000000 loops=1)
   -> Hash (cost=24286.00..24286.00 rows=1000000 width=41) (actual time=418.440..418.441 rows=1000000 loops=1)
         Buckets: 65536 (originally 65536) Batches: 128 (originally 32) Memory Usage: 3585kB
         -> Seq Scan on t1 (cost=0.00..24286.00 rows=1000000 width=41) (actual time=0.439..265.654 rows=1000000 loops=1)
Planning Time: 10.554 ms
Execution Time: 42631.034 ms
(8 rows)

Hash condition'da da görüldüğü gibi t2.a6 = t1.c6 eşitliği için her iki tabloda da sequential scan yapılıyor. İlk olarak t1 tablosunun c6 kolonuna index ekleyelim ve sorgu planını inceleyelim:

postgres=# CREATE INDEX CONCURRENTLY ON t1(c6);
CREATE INDEX

postgres=# EXPLAIN ANALYZE SELECT t1.c1, t1.c2, t2.a1, t2.a2 FROM t1 inner join t2 on t1.c6=t2.a6 where t1.c6 > 100;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=21557119.92..1688916439.64 rows=111142116058 width=73) (actual time=29041.381..32252.122 rows=24775892 loops=1)
   Merge Cond: (t1.c6 = t2.a6)
   -> Sort (cost=51490.57..52104.34 rows=245508 width=41) (actual time=213.171..295.186 rows=245057 loops=1)
         Sort Key: t1.c6
         Sort Method: external merge Disk: 12976kB
         -> Bitmap Heap Scan on t1 (cost=4603.11..21957.96 rows=245508 width=41) (actual time=46.588..154.341 rows=245057 loops=1)
               Recheck Cond: (c6 > 100)
               Heap Blocks: exact=14286
               -> Bitmap Index Scan on t1_c6_idx (cost=0.00..4541.73 rows=245508 width=0) (actual time=43.561..43.561 rows=245057 loops=1)
                     Index Cond: (c6 > 100)
   -> Materialize (cost=21505629.35..21958331.97 rows=90540525 width=40) (actual time=28800.931..29637.441 rows=24783563 loops=1)
         -> Sort (cost=21505629.35..21731980.66 rows=90540525 width=40) (actual time=28800.927..28826.787 rows=10192 loops=1)
               Sort Key: t2.a6
               Sort Method: external merge Disk: 52920kB
               -> Seq Scan on t2 (cost=0.00..2112612.25 rows=90540525 width=40) (actual time=15666.184..28271.552 rows=1000000 loops=1)
Planning Time: 7.195 ms
Execution Time: 32824.392 ms

t1 tablosundaki c6 kolonuna index eklediğimizde sorgu planının değiştiğini gördük. T1 tablosunda sequential scan yerine bitmap heap scan yapılıyor ve sorgumuz biraz hızlanıyor. t2 tablosundaki a6 kolonuna da index ekleyelim ve tekrar sorgu planını inceleyelim:

postgres=# CREATE INDEX CONCURRENTLY ON t2(a6);
CREATE INDEX

postgres=# EXPLAIN ANALYZE SELECT t1.c1, t1.c2, t2.a1, t2.a2 FROM t1 inner join t2 on t1.c6=t2.a6 where t1.c6 > 100
postgres-# ;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=51490.99..425421.08 rows=22932793 width=74) (actual time=2876.567..5881.574 rows=24775892 loops=1)
   Merge Cond: (t2.a6 = t1.c6)
   -> Index Scan using t2_a6_idx on t2 (cost=0.42..2727203.84 rows=935183 width=41) (actual time=0.016..33.264 rows=10192 loops=1)
   -> Materialize (cost=51490.57..52718.11 rows=245508 width=41) (actual time=2848.950..3667.857 rows=24775806 loops=1)
         -> Sort (cost=51490.57..52104.34 rows=245508 width=41) (actual time=2848.934..2872.529 rows=245057 loops=1)
               Sort Key: t1.c6
               Sort Method: external merge Disk: 12976kB
               -> Bitmap Heap Scan on t1 (cost=4603.11..21957.96 rows=245508 width=41) (actual time=46.770..2685.804 rows=245057 loops=1)
                     Recheck Cond: (c6 > 100)
                     Heap Blocks: exact=14286
                     -> Bitmap Index Scan on t1_c6_idx (cost=0.00..4541.73 rows=245508 width=0) (actual time=43.936..43.936 rows=245057 loops=1)
                           Index Cond: (c6 > 100)
Planning Time: 9.665 ms
Execution Time: 6422.857 ms
(14 rows)

t2 tablosunun a6 kolonunda sequential scan yerine index scan yapılıyor ve sorgu süresi 6 saniyeye düşüyor. Her iki indexten sonra sorgu 42 saniyeden 6 saniyeye düştü.

Sorgu planında 'Sort Method: external merge Disk: 12976kB' satırına dikkat edelim. t1 tablosunun c6 kolonunda tarama yapılırken work_mem yetmediği için sorgu için diske gidiliyor ve bu da sorgu süresini uzatıyor. Sadece bu sorguya özel work_mem'i aşağıdaki gibi arttırabiliriz:

postgres=# SET work_mem TO '100MB';
SET
postgres=# EXPLAIN ANALYZE SELECT t1.c1, t1.c2, t2.a1, t2.a2 FROM t1 inner join t2 on t1.c6=t2.a6 where t1.c6 > 100
;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=43937.99..417254.31 rows=22932793 width=74) (actual time=149.501..3862.228 rows=24775892 loops=1)
   Merge Cond: (t2.a6 = t1.c6)
   -> Index Scan using t2_a6_idx on t2 (cost=0.42..2727203.84 rows=935183 width=41) (actual time=0.036..28.600 rows=10192 loops=1)
   -> Sort (cost=43937.57..44551.34 rows=245508 width=41) (actual time=130.609..1415.817 rows=24775806 loops=1)
         Sort Key: t1.c6
         Sort Method: quicksort Memory: 25290kB
         -> Bitmap Heap Scan on t1 (cost=4603.11..21957.96 rows=245508 width=41) (actual time=16.531..90.814 rows=245057 loops=1)
               Recheck Cond: (c6 > 100)
               Heap Blocks: exact=14286
               -> Bitmap Index Scan on t1_c6_idx (cost=0.00..4541.73 rows=245508 width=0) (actual time=14.797..14.797 rows=245057 loops=1)
                     Index Cond: (c6 > 100)
Planning Time: 71.776 ms
Execution Time: 4418.580 ms
(13 rows)

Sort metodunun yeni hali quicksort oldu, yani work_mem'i bu sorguya özel 100 MB'a yükseltmemiz yeterli oldu. Sorgu en sonunda 4 saniyeye düştü.

Indexe doğru şekilde WHERE koşulu ekleme:

Veritabanına sık sık gelen sorguların WHERE koşulundaki filtrelemeye göre indexe de WHERE koşulu ekleyebiliriz. Örneğin aşağıdaki sorguda a4 kolonunun 243 olduğu satırdaki a1 değerini çekiyoruz:

postgres=# EXPLAIN ANALYZE SELECT a1 FROM t2 WHERE a4 = 243;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on t2 (cost=0.00..1218896.79 rows=6140 width=4) (actual time=0.038..32635.873 rows=6629 loops=1)
   Filter: (a4 = 243)
   Rows Removed by Filter: 993371
Planning Time: 0.145 ms
Execution Time: 32636.238 ms
(5 rows)

Uygulama tarafından gelen sorgularda a4 kolonunun genellikle 250'den küçük verileri sorguladığını varsayalım. Bu durumda indexi a4<250 koşulu ile oluşturursak indexin boyutu daha küçük olur, index scan süresi kısalmış olur. Fakat benzer bir sorgu a4 kolonundaki 250'den büyük verileri sorgulamak için gelirse bu index kullanılmaz. Bu nedenle veritabanına gelen sorguları iyi analiz etmemiz gerekir. Bizim örneğimizde index sorguyu 32 saniyeden 38 ms'ye indirdi:

postgres=# CREATE INDEX CONCURRENTLY ON t2(a4) where a4< 250;
CREATE INDEX

postgres=# EXPLAIN ANALYZE SELECT a1 FROM t2 WHERE a4 = 243;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t2 (cost=127.31..24958.30 rows=6566 width=4) (actual time=4.930..37.938 rows=6629 loops=1)
   Recheck Cond: (a4 = 243)
   Heap Blocks: exact=5287
   -> Bitmap Index Scan on t2_a4_idx2 (cost=0.00..125.67 rows=6566 width=0) (actual time=2.318..2.318 rows=6629 loops=1)
         Index Cond: (a4 = 243)
Planning Time: 0.459 ms
Execution Time: 38.530 ms
(7 rows)

PostgreSQL fonksiyonları ile index oluşturma:

İndexlere WHERE koşulu eklerken, sorguda postgres fonksiyonları kullanılmışsa indexi de bunlarla oluşturabiliriz. Örneğin:

postgres=# EXPLAIN ANALYZE SELECT * FROM t2 WHERE coalesce(a4,0) < 100;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on t2 (cost=0.00..1219707.00 rows=333333 width=82) (actual time=19206.053..19206.053 rows=0 loops=1)
   Filter: (COALESCE(a4, 0) < 100)
   Rows Removed by Filter: 1000000
Planning Time: 0.101 ms
Execution Time: 19206.085 ms
(5 rows)

WHERE koşulund COALESCE fonksiyonu olduğu için aşağıdaki şekilde index eklediğimide bu index kullanılmaz ve sorguda yine sequential scan yapılır:

postgres=# CREATE INDEX CONCURRENTLY ON t2 (a4);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM t2 WHERE coalesce(a4,0) < 100;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on t2 (cost=0.00..1219707.00 rows=333333 width=82) (actual time=23540.760..23540.760 rows=0 loops=1)
   Filter: (COALESCE(a4, 0) < 100)
   Rows Removed by Filter: 1000000
Planning Time: 0.325 ms
Execution Time: 23540.790 ms
(5 rows)

Bu nedenle indexi aşağıdaki gibi eklemeliyiz:

postgres=# CREATE INDEX CONCURRENTLY ON t2 (COALESCE(a4,0));
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM t2 WHERE coalesce(a4,0) < 100;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t2 (cost=6251.76..756590.21 rows=333333 width=82) (actual time=0.016..0.016 rows=0 loops=1)
   Recheck Cond: (COALESCE(a4, 0) < 100)
   -> Bitmap Index Scan on t2_coalesce_idx (cost=0.00..6168.42 rows=333333 width=0) (actual time=0.011..0.012 rows=0 loops=1)
         Index Cond: (COALESCE(a4, 0) < 100)
Planning Time: 0.467 ms
Execution Time: 0.065 ms
(6 rows)

Sorgu süresi bu index sayesinde 23 saniyeden 0.065 ms'ye düştü. İndexi daha da özelleştirmek istersek aşağıdaki gibi WHERE koşulu ile index oluşturabiliriz:

postgres=# CREATE INDEX CONCURRENTLY ON t2 (COALESCE(a4,0)) WHERE (COALESCE(a4,0)<100);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM t2 WHERE coalesce(a4,0) < 100;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t2 (cost=87.46..750425.92 rows=333333 width=82) (actual time=0.007..0.008 rows=0 loops=1)
   Recheck Cond: (COALESCE(a4, 0) < 100)
   -> Bitmap Index Scan on t2_coalesce_idx1 (cost=0.00..4.13 rows=333333 width=0) (actual time=0.004..0.004 rows=0 loops=1)
Planning Time: 0.402 ms
Execution Time: 0.047 ms
(5 rows)

Bu indexle sorgu süresi 0.065 ms'den 0.047 ms'ye ye düştü. Bizim örneğimizde bu çok küçük bir değişim olsa da milyonlarca satırı olan tablolarda daha büyük performans kazancınız olacaktır.

Aşağıdaki sorguda WHERE koşulunda UPPER() fonksiyonunun olduğunu gözlemliyoruz. Sorgu index olmadan 21 saniye sürüyor.

postgres=# EXPLAIN ANALYZE SELECT * from t2 WHERE UPPER(a2) = '7B8F7BDF7370F2D8A2958C3C029F5770';
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on t2 (cost=0.00..1222207.00 rows=5000 width=82) (actual time=0.062..21726.194 rows=1 loops=1)
   Filter: (upper(a2) = '7B8F7BDF7370F2D8A2958C3C029F5770'::text)
   Rows Removed by Filter: 999999
Planning Time: 0.162 ms
Execution Time: 21726.235 ms
(5 rows)

İlgili kolona indexi UPPER() fonksiyonu ile eklediğimizde sorgu 21 saniyeden 0.184 ms'ye düştü.

postgres=# create index on t2 (upper(a2));
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * from t2 WHERE UPPER(a2) = '7B8F7BDF7370F2D8A2958C3C029F5770';
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t2 (cost=187.18..19259.72 rows=5000 width=82) (actual time=0.138..0.140 rows=1 loops=1)
   Recheck Cond: (upper(a2) = '7B8F7BDF7370F2D8A2958C3C029F5770'::text)
   Heap Blocks: exact=1
   -> Bitmap Index Scan on t2_upper_idx (cost=0.00..185.93 rows=5000 width=0) (actual time=0.125..0.125 rows=1 loops=1)
         Index Cond: (upper(a2) = '7B8F7BDF7370F2D8A2958C3C029F5770'::text)
Planning Time: 0.545 ms
Execution Time: 0.184 ms
(7 rows)

Aşağıdaki örnekteki gibi, sorgularınızda WHERE koşulunda pipe işareti kullanıyorsanız ilgili kolona index eklerken de pipe kullanabilirsiniz:

postgres=# EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a1 || ' ' || a2) = '83504443 7b8f7bdf7370f2d8a2958c3c029f5770';
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on t2 (cost=0.00..1229707.00 rows=5000 width=82) (actual time=0.042..22753.522 rows=1 loops=1)
   Filter: ((((a1)::text || ' '::text) || a2) = '83504443 7b8f7bdf7370f2d8a2958c3c029f5770'::text)
   Rows Removed by Filter: 999999
Planning Time: 0.202 ms
Execution Time: 22753.556 ms
(5 rows)

postgres=# CREATE INDEX CONCURRENTLY ON t2 ((a1 || ' ' || a2));
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a1 || ' ' || a2) = '83504443 7b8f7bdf7370f2d8a2958c3c029f5770';
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t2 (cost=207.18..19317.22 rows=5000 width=82) (actual time=0.056..0.057 rows=1 loops=1)
   Recheck Cond: ((((a1)::text || ' '::text) || a2) = '83504443 7b8f7bdf7370f2d8a2958c3c029f5770'::text)
   Heap Blocks: exact=1
   -> Bitmap Index Scan on t2_expr_idx (cost=0.00..205.93 rows=5000 width=0) (actual time=0.044..0.045 rows=1 loops=1)
         Index Cond: ((((a1)::text || ' '::text) || a2) = '83504443 7b8f7bdf7370f2d8a2958c3c029f5770'::text)
Planning Time: 0.724 ms
Execution Time: 0.113 ms
(7 rows)

Yorumlar

husam11 Ocak 2022 11:48
selamlar, bir queryda sorun alıyorum yardımcı olabilirmisiniz?
YanıtlaSil
Yanıtlar

Yorum ekle

Bu Blogda Ara

Seda Halacli

PostgreSQL'de Indexler

'EXPLAIN' ile sorgu planı inceleme

İndex oluşturma işlemi

Doğru index oluşturma

B-Tree index:

GIN Index:

GIST Index:

Hash Index:

BRIN Index:

İndex oluşturma örnekleri

JOIN'li yapılarda indexler:

Indexe doğru şekilde WHERE koşulu ekleme:

PostgreSQL fonksiyonları ile index oluşturma:

Yorumlar

Yorum Gönder