Engineering Core
ISB Vietnam's skilled software engineers deliver high-quality applications, leveraging their extensive experience in developing financial tools, business management systems, medical technology, and mobile/web platforms.

Query performance is the heartbeat of any high-traffic application. Even with a well-designed architecture, a single inefficient query can lock up resources, spike CPU usage, and degrade the user experience.

In this guide, we go beyond the basics to explore how PostgreSQL executes queries, why bottlenecks occur, and actionable techniques to ensure your database scales efficiently.

How PostgreSQL Executes a Query

Understanding the lifecycle of a query helps you "think" like the database.

  1. Parsing: PostgreSQL checks the SQL syntax.

  2. Rewriting: It applies rules (e.g., converting views into base tables).

  3. Planning / Optimization: The Planner evaluates multiple paths (Scan types, Join methods) and calculates a Cost for each. It selects the path with the lowest estimated cost.

  4. Execution: The Executor runs the plan and returns results.

Key Scan Types to Watch:

  • Seq Scan (Sequential Scan): Reads the entire table. Good for small tables, bad for large ones.

  • Index Scan: Looks up specific rows using an index.

  • Index Only Scan: Retrieves data directly from the index without visiting the heap (table storage). (Fastest)

  • Bitmap Heap Scan: A middle ground, combining index lookups with batch table reads.

Master EXPLAIN (ANALYZE, BUFFERS)

Don't just guess—measure. While EXPLAIN ANALYZE gives you time, adding BUFFERS reveals the true cost regarding memory and disk I/O.

Command:

SQL
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM orders WHERE customer_id = 123;

What to look for:

  • Execution Time: Actual time taken.

  • Buffers shared hit: How many data blocks were found in memory (RAM). High is good.

  • Buffers read: How many blocks had to be read from the disk (Slow).

  • Rows (Estimated vs. Actual): If these numbers differ significantly (e.g., est 1 row, actual 10,000), your table statistics are likely outdated.

Proper Indexing Strategy

Indexes are powerful, but they are not free—they consume disk space and slow down INSERT/UPDATE operations.

Choose the Right Index Type

  • B-Tree (Default): Best for =, <, >, BETWEEN, and sorting (ORDER BY).

  • GIN: Essential for JSONB and Full-Text Search.

  • GiST: Optimized for geospatial data (PostGIS) and complex geometric types.

  • BRIN: ideal for massive, time-series tables (e.g., billions of rows of logs) ordered physically by date.

Pro Tip: Use Covering Indexes (INCLUDE)

If you only need a few columns, you can include them in the index payload to achieve an Index Only Scan.

Scenario: You frequently query status by customer_id.

SQL
-- Standard Index
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Covering Index (Better)
CREATE INDEX idx_orders_customer_covering ON orders(customer_id) INCLUDE (status);

Now, SELECT status FROM orders WHERE customer_id = 123 never touches the main table table, reducing I/O drastically.

Optimizing JOIN Operations

Joins are expensive. Help the planner by keeping them simple.

  • Index Foreign Keys: Ensure columns used in ON clauses are indexed on both sides.

  • Avoid Casting:

    • JOIN ... ON o.order_code::text = p.product_code (Cannot use index)

    • ✅ Ensure data types match in the schema design.

  • Filter Before Joining: Reduce the dataset size before the join happens.

Example:

SQL
SELECT c.name, o.total 
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.created_at > '2024-01-01';
-- Ensure an index exists on orders(created_at) so PG filters orders first!

Fetch Only What You Need

Transferring data over the network is often the unseen bottleneck.

Avoid SELECT *

Using SELECT * prevents "Index Only Scans" and adds network latency. Always list specific columns.

Use LIMIT and Paging

Never return thousands of rows to a frontend application.

SQL
SELECT id, status FROM logs ORDER BY created_at DESC LIMIT 50;

Materialized Views for Heavy Analytics

For complex aggregations (Sums, Averages) over large datasets that don't need real-time accuracy, use Materialized Views.

SQL
CREATE MATERIALIZED VIEW monthly_sales_report AS
SELECT 
    DATE_TRUNC('month', created_at) as month, 
    SUM(total_amount) as revenue
FROM orders
GROUP BY 1;

-- Create an index on the view for fast lookup
CREATE INDEX idx_monthly_sales ON monthly_sales_report(month);

To update data without locking the table (Crucial for production):

SQL
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales_report;

Partitioning for Scale

When a table exceeds ~50GB or 100 million rows, B-Tree indexes become deep and slow. Partitioning breaks the table into smaller, manageable chunks.

Example: Range Partitioning

SQL
-- 1. Create Parent Table
CREATE TABLE logs (
    id SERIAL, 
    created_at TIMESTAMP NOT NULL, 
    message TEXT
) PARTITION BY RANGE (created_at);

-- 2. Create Partitions (Child Tables)
CREATE TABLE logs_2024_01 PARTITION OF logs 
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE logs_2024_02 PARTITION OF logs 
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

Benefit: Queries with WHERE created_at = '2024-01-15' will only scan logs_2024_01 and ignore others (Partition Pruning).

Optimizing String Search

The standard LIKE '%term%' starts with a wildcard, rendering B-Tree indexes useless.

Solution: pg_trgm extension Trigram indexes break strings into 3-character chunks, allowing fast wildcard searches.

SQL
CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_users_email_trgm ON users USING GIN (email gin_trgm_ops);

-- Now this query is indexed:
SELECT * FROM users WHERE email LIKE '%gmail%';

Configuration Tuning

Don't run PostgreSQL with default settings (which are often tuned for compatibility, not performance). Key parameters in postgresql.conf:

  • shared_buffers: Set to ~25% of total RAM. Dedicated memory for caching data.

  • work_mem: Memory per operation (sorts/hash joins). Caution: This is per connection. Too high = Out of Memory. Start with 16MB-64MB.

  • random_page_cost: Default is 4.0 (for HDDs). If using SSDs, set to 1.1. This encourages the planner to use indexes more often.

  • effective_cache_size: Set to ~75% of total RAM. Helps the planner estimate OS caching capability.

Conclusion

Optimization is an iterative process. Start by identifying the bottleneck using EXPLAIN (ANALYZE, BUFFERS), apply the appropriate index or schema change, and measure again.

Quick Checklist:

  1. Did you avoid SELECT *?

  2. Are your Foreign Keys indexed?

  3. Are you using the correct Index type (B-Tree vs GIN)?

  4. Did you run VACUUM ANALYZE recently to update statistics?

  5. Is your random_page_cost configured for SSDs?


Reference Links

 

Ready to get started?

Contact IVC for a free consultation and discover how we can help your business grow online.

Contact IVC for a Free Consultation
Written by
Author Avatar
Engineering Core
ISB Vietnam's skilled software engineers deliver high-quality applications, leveraging their extensive experience in developing financial tools, business management systems, medical technology, and mobile/web platforms.

COMPANY PROFILE

Please check out our Company Profile.

Download

COMPANY PORTFOLIO

Explore my work!

Download

ASK ISB Vietnam ABOUT DEVELOPMENT

Let's talk about your project!

Contact US