Table of contents
Solving the "must appear in the GROUP BY clause" Error in Rails and PostgreSQL
Software Engineer
Software Engineer
Ruby on Rails
Ruby on Rails
This blog will give an example of how to resolve this common error when using group_by in Active Record
1. Example problem
First, take a look at this SQL problem. Imagine you're working on a booking application where users can have multiple sessions. You have two models:
// language: ruby # User # email: string class User < ApplicationRecord has_many :sessions end # Session # tag: string class Session < ApplicationRecord belongs_to :user end
Now, you need to find users with the same email address and have sessions with a specific tag. This might be necessary for identifying potential duplicate accounts or for a data cleanup task.
2. The wrong approach
Your first attempt might look something like this:
// language: ruby User.joins(:sessions) .where(sessions: { tag: Session::SPECIFIC_TAG }) .group(:email) .having('COUNT(DISTINCT users.id) > 1')
The generated SQL:
// language: sql SELECT "users".* FROM "users" INNER JOIN "sessions" ON "sessions"."user_id" = "users"."id" WHERE "sessions"."tag" = 'SPECIFIC_TAG' GROUP BY "users"."email" HAVING COUNT(DISTINCT users.id) > 1
However, running this query results in an error:
// language: bash PG::GroupingError: ERROR: column "users.id" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT "users".* FROM "users" INNER JOIN "sessions" ON "sessions"...
This error occurs because of how SQL handles grouping and non-aggregated columns. When using a GROUP BY clause, every column in the SELECT statement must either:
- Be included in the GROUP BY clause, or
- Be used in an aggregate function (like COUNT, SUM, AVG, etc.)
In our query, we're grouping by email, but Rails' default SELECT statement (SELECT "users".*) is trying to select all columns from the users table. This creates a conflict because columns like id, which are not in the GROUP BY clause, are being selected without being aggregated.
3. Solution
To solve this problem and efficiently retrieve the data we need, we can use PostgreSQL's array_agg function along with pluck. Here's the working query:
// language: ruby User .joins(:sessions) .where(sessions: { tag: Session::SPECIFIC_TAG }) .group(:email) .having('COUNT(DISTINCT users.id) > 1') .pluck("users.email, array_agg(users.id)")
It will return an array of arrays, where each inner array contains:
- The shared email
- An array of user IDs that share that email (using PostgreSQL's array_agg function)
// language: ruby [["[email protected]", [1, 99, 543]]
You might wonder why we use pluck instead of select in our solution. Let's explore what happens when we try to use select:
// language: ruby User .joins(:sessions) .where(sessions: { tag: Session::SPECIFIC_TAG }) .group(:email) .having('COUNT(DISTINCT users.id) > 1') .select("users.email, array_agg(users.id) as user_ids")
This query still works, but it comes with some caveats:
- Partial Objects: This query returns User objects, but they're partial objects. They only have the email attribute populated, plus a custom user_ids attribute that doesn't correspond to any column in the users table.
- Type Casting: The user_ids attribute will be a string representation of the PostgreSQL array, not a Ruby array. You'd need to parse it manually.
- ActiveRecord Expectations: Some ActiveRecord methods might not work as expected on these partial objects, potentially leading to subtle bugs.
- Performance: Creating ActiveRecord objects is more resource-intensive than just returning the raw data.
// language: ruby [#<User:0x00000001309c3098 id: nil, primary_email: "[email protected]">]
Remember, when dealing with complex queries involving grouping and aggregation, it's often helpful to think about exactly what data you need and how you can retrieve it most efficiently from the database.
Created at
2024-07-10 11:17:29 +0700
Related blogs
One Design Pattern a Week: Week 3
Welcome back to my "One Design Pattern a Week" series!
Try to solve this real problem: Incompatible InterfacesImagine you're developing a large web ap...
Software Engineer
Software Engineer
2024-09-21 15:53:29 +0700
How Google achieves seamless SSO across multiple domains like Gmail and Youtube?
Hey there! Ever wondered how you can log into Gmail and then magically find yourself logged into YouTube, Google Drive, and all other Google services ...
Software Engineer
Software Engineer
2024-09-24 22:52:06 +0700