0%
Reading Settings
Font Size
18px
Line Height
1.5
Letter Spacing
0.01em
Font Family
Table of contents
    blog cover

    Optimizing Bulk Create Operations in Rails

    Software Engineer
    Software Engineer
    Ruby on Rails
    Ruby on Rails
    published 2026-01-03 19:48:24 +0700 · 4 mins read
    Recently, my team ran into performance issues while handling bulk CUD (Create, Update, Delete) operations in a Rails application.

    Active Record provides validations and callbacks, which are excellent tools for maintaining data consistency. Rails controllers also offer convenient create, update, and destroy actions that work very well for single-record workflows. However, in most applications where users manage tabular data, bulk actions are unavoidable. Users upload CSV files, select multiple rows, or trigger batch actions in a single request. In these cases, Active Record’s per-record lifecycle can quickly become a performance bottleneck.

    In this post, I focus only on bulk create. I will cover update and delete in separate posts.

    Example: Why Bulk Create Is Expensive

    Consider the following models:
    // language: ruby
    class Tag < ApplicationRecord
    end
    
    class Office < ApplicationRecord
    end
    
    class Book < ApplicationRecord
      validates :name, length: { maximum: 200 }, presence: true, uniqueness: true
      validates :tags, presence: true
      validates :office, presence: true
    
      validate :office_must_be_active
    
      has_many :book_tags
      has_many :tags, through: :book_tags
      belongs_to :office
    
      after_create :sync_search_index
    
      private
    
      def office_must_be_active
        return if office.nil?
        errors.add(:office, "is inactive") unless office.active?
      end
    
      def sync_search_index
        BookSearchIndexer.perform(id: id)
      end
    end
    

    Creating a single record works fine:
    // language: ruby
    Book.create(name: "Math Basic", office_id: 100, tag_ids: [10, 12])

    Now imagine importing 100 books from a CSV file:
    // language: ruby
    CSV.foreach("books.csv", headers: true) do |row|
      Book.create(
        name: row["name"],
        office_id: row["office_id"],
        tag_ids: row["tag_ids"].split(",")
      )
    end

    For each book, Active Record may trigger:
    • One SELECT for name uniqueness
    • One SELECT to load the office for presence and custom validation
    • One or more SELECTs for tags
    • One INSERT into books
    • Multiple INSERTs into book_tags
    • All callbacks, including BookSearchIndexer.perform
    Multiply this by 100 rows, and the cost becomes obvious.

    activerecord-import Is Fast, But Not Always a Fit

    The activerecord-import gem is often suggested for bulk inserts. It is fast and uses bulk SQL under the hood.

    // language: ruby
    books = CSV.read("books.csv", headers: true).map do |row|
      Book.new(
        name: row["name"],
        office_id: row["office_id"],
        tag_ids: row["tag_ids"].split(",")
      )
    end
    
    Book.import(books, validate: true)

    It relies on INSERT ALL  for performance and allows validations and callbacks to be optional. While this is fast, it is often a poor fit for public APIs or user-uploaded data, where correctness, predictable behavior, and clear error reporting matter more than raw speed.
    In most real-world cases, we still need validations and callbacks.

    Optimized Bulk Create Strategy

    The idea is simple: move expensive work out of the per-record lifecycle and do it once, in batches.
    Core principles:
    • Validate data in batches upfront
    • Avoid repeated queries per record
    • Keep database constraints as the final safety net
    • Skip expensive per-record validations and callbacks during bulk inserts

    1. Always Start With Database Constraints

    Database constraints are non-negotiable. They are the final line of defense.
    // language: ruby
    add_index :books, :name, unique: true
    add_foreign_key :books, :offices
    add_foreign_key :book_tags, :books
    add_foreign_key :book_tags, :tags
    Why this matters:
    • Protects data even if validations are skipped
    • Handles race conditions correctly
    • Covers bugs, direct SQL, or missed checks

    2. Move Side Effects to Background Jobs

    In the example above, BookSearchIndexer.perform might call Elasticsearch or another external service. This is expensive and unnecessary during bulk inserts. If indexing is not required immediately, move it to a background job:
    // language: ruby
    BookSearchIndexer.perform_later(id: id)
    For bulk operations, it is often better to index once for all records instead of once per record.

    3. Make Expensive Validations and Callbacks Skippable

    Some validations are expensive when repeated many times, such as uniqueness checks or external side effects. Make them conditional:
    // language: ruby
    class Book < ApplicationRecord
      attr_accessor :skip_name_uniqueness, :skip_sync_search_index
    
      validates :name, presence: true
      validates :name, uniqueness: true, unless: :skip_name_uniqueness
    
      validate :office_must_be_active
    
      belongs_to :office
      has_many :tags
    
      after_create :sync_search_index, unless: :skip_sync_search_index
    
      private
    
      def office_must_be_active
        errors.add(:office, "is inactive") unless office&.active?
      end
    
      def sync_search_index
        BookSearchIndexer.perform_later(id: id)
      end
    end
    Rule of thumb:
    • Single create: run all validations and callbacks
    • Bulk create: validate once upfront, skip per-record expensive checks and run bulk callbacks

    4. Validate in Batches, Not Per Record

    Instead of validating each row individually:
    • One query to check existing book names
    • One query to load all offices
    • One query to load all tags
    This turns N×queries into a small, fixed number of queries and allows you to return all errors at once.

    5. Pass Objects, Not IDs

    Passing IDs often triggers hidden SELECTs during validation (e.g, office_must_be_active).
    // language: ruby
    # Good
    Book.new(name: name, office: office, tags: tags)
    
    # Bad
    Book.new(name: name, office_id: office_id, tag_ids: tag_ids)
    Passing objects avoids redundant lookups and makes validation behavior predictable.

    BulkBookImporter Service Example

    // language: ruby
    class BulkBookImporter
      class ImportError < StandardError; end
    
      def initialize(rows:)
        @rows = rows
        @office_ids = rows.pluck(:office_id).uniq
        @tag_ids    = rows.pluck(:tag_ids).flatten.uniq
      end
    
      # All or nothing: if one row fails, nothing is created
      def perform
        raise ImportError, "No data provided" if @rows.empty?
    
        validate_offices!
        validate_tags!
        validate_uniqueness!
    
        created_books = []
    
        ActiveRecord::Base.transaction do
          @rows.each do |row|
            created_books << create_book!(row)
          end
        end
    
        sync_search_index!(created_books)
        created_books
      end
    
      private
    
      # ---------- Preloading ----------
    
      def offices_by_id
        @offices_by_id ||= Office.where(id: @office_ids).index_by(&:id)
      end
    
      def tags_by_id
        @tags_by_id ||= Tag.where(id: @tag_ids).index_by(&:id)
      end
    
      # ---------- Batch validations ----------
    
      def validate_offices!
        if offices_by_id.size != @office_ids.size
          raise ImportError, "One or more offices are missing"
        end
      end
    
      def validate_tags!
        if tags_by_id.size != @tag_ids.size
          raise ImportError, "One or more tags are missing"
        end
      end
    
      def validate_uniqueness!
        names = @rows.pluck(:name)
    
        # Duplicates inside the input
        if names.uniq.size != names.size
          raise ImportError, "Duplicate names in input"
        end
    
        # Duplicates against existing data
        if Book.exists?(name: names)
          raise ImportError, "One or more names already exist"
        end
      end
    
      # ---------- Creation ----------
    
      def create_book!(row)
        book = Book.new(
          name:   row[:name],
          office: offices_by_id[row[:office_id]],
          tags:   row[:tag_ids].map { |id| tags_by_id[id] }
        )
    
        # Skip expensive per-record work
        book.skip_name_uniqueness     = true
        book.skip_sync_search_index   = true
    
        book.save!
        book
      rescue ActiveRecord::RecordNotUnique
        # Final safety net for race conditions
        raise ImportError, "Name already taken"
      end
    
      # ---------- Side effects ----------
    
      def sync_search_index!(books)
        return if books.empty?
    
        BookSearchIndexer.perform_later(ids: books.map(&:id))
      end
    end

    Conclusion

    Bulk operations in Rails do not require abandoning Active Record or blindly reaching for external gems. By understanding what validations and callbacks actually do under the hood, and by moving expensive work into batch operations, you can achieve significant performance improvements while still maintaining data correctness

    Related blogs