Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nothing impacts Postgres insert performance more than index updates during insert.

If it is possible to take your table offlkine for the insert then you'll get 10 to 100 X speedup by deleting all the indexes first then recreating them.

You should also ensure you set all the Postgres session variables to have enough memory and workers on the job.

WAL writes during inserts have a massive impact on insert performance too so you can look at setti8ngs like synchronous_commit and checkpoint_timeout. Creating the table as UNLOGGED will also disable WAL writes for that table giving a massive speedup.

Also, recreating your indexes sequentially can be faster than doing them in parallel and look out for CONCURRENTLY in index creation - its positive is it allows DB operations during index creation but its negative is it is much slower and risks failure.

Probably something along the lines of this - which also show how to set default workers for a table so you don't need your queries to keep setting that Postgres session variable.

    DO $$
    DECLARE
        table_name TEXT := 'your_table_name';  -- Replace with your table name
        schema_name TEXT := 'public';          -- Replace with your schema
        data_file TEXT := '/path/to/your/data.csv'; -- Replace with your data file path
        index_info RECORD;
        index_sql TEXT;
    BEGIN
        -- 1. Store existing indexes for later recreation
        CREATE TEMP TABLE index_definitions AS
        SELECT indexname, indexdef
        FROM pg_indexes
        WHERE schemaname = schema_name AND tablename = table_name;
        
        -- 2. Drop all existing indexes (except primary key)
        FOR index_info IN 
            SELECT indexname 
            FROM pg_indexes 
            WHERE schemaname = schema_name AND tablename = table_name
                  AND indexdef NOT LIKE '%PRIMARY KEY%'
        LOOP
            EXECUTE 'DROP INDEX ' || schema_name || '.' || index_info.indexname;
            RAISE NOTICE 'Dropped index: %', index_info.indexname;
        END LOOP;
        
        -- 3. Optimize PostgreSQL for bulk loading (non-sysadmin settings only)
        -- Memory settings
        SET maintenance_work_mem = '1GB';        -- Increase for faster index creation
        SET work_mem = '256MB';                  -- Increase for better sort performance
        
        -- WAL and checkpoint settings
        SET synchronous_commit = OFF;            -- Delay WAL writes as requested
        SET checkpoint_timeout = '30min';        -- Less frequent checkpoints during load
        
        -- Worker/parallel settings
        SET max_parallel_workers_per_gather = 8; -- Increase parallel workers
        SET max_parallel_workers = 16;           -- Maximum parallel workers
        SET effective_io_concurrency = 200;      -- Better IO performance for SSDs
        SET random_page_cost = 1.1;              -- Optimize for SSD storage
        
        -- 4. Set parallel workers on the target table
        EXECUTE 'ALTER TABLE ' || schema_name || '.' || table_name || ' SET (parallel_workers = 8)';
        
        -- 5. Perform the COPY operation
        EXECUTE 'COPY ' || schema_name || '.' || table_name || ' FROM ''' || data_file || ''' WITH (FORMAT CSV, HEADER true)';
        
        -- 6. Rebuild all indexes (using the stored definitions)
        FOR index_info IN SELECT * FROM index_definitions LOOP
            index_sql := index_info.indexdef;
            RAISE NOTICE 'Recreating index: %', index_info.indexname;
            EXECUTE index_sql;
        END LOOP;
        
        -- 7. Drop temporary table
        DROP TABLE index_definitions;
        
        RAISE NOTICE 'Data loading completed successfully';
    END $$;


If batch sizes are sufficiently large, for example by staging updates, is this really necessary to achieve good insert performance?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: