0%
Reading Settings
Font Size
18px
Line Height
1.5
Letter Spacing
0.01em
Font Family
Table of contents

Solving the "must appear in the GROUP BY clause" Error in Rails and PostgreSQL
Software Engineer
Software Engineer
Ruby on Rails
Ruby on Rails

This blog will give an example of how to resolve this common error when using group_by in Active Record
1. Example problem
First, take a look at this SQL problem. Imagine you're working on a booking application where users can have multiple sessions. You have two models:
// language: ruby # User # email: string class User < ApplicationRecord has_many :sessions end # Session # tag: string class Session < ApplicationRecord belongs_to :user end
Now, you need to find users that having the same email address and joined sessions with a specific tag. This might be necessary for identifying potential duplicate accounts or for a data cleanup task.
2. The wrong approach
Your first attempt might look something like this:
// language: ruby User .joins(:sessions) .where(sessions: { tag: Session::SPECIFIC_TAG }) .group(:email) .having('COUNT(DISTINCT users.id) > 1')
The condition COUNT(DISTINCT users.id) > 1 is checking if each email group has more than 1 distinct user
The generated SQL:
The generated SQL:
// language: sql SELECT "users".* FROM "users" INNER JOIN "sessions" ON "sessions"."user_id" = "users"."id" WHERE "sessions"."tag" = 'SPECIFIC_TAG' GROUP BY "users"."email" HAVING COUNT(DISTINCT users.id) > 1
However, running this query results in an error:
// language: bash PG::GroupingError: ERROR: column "users.id" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT "users".* FROM "users" INNER JOIN "sessions" ON "sessions"...
This error occurs because of how SQL handles grouping and non-aggregated columns. When using a
- Be included in the
GROUP BY clause, or - Be used in an aggregate function (like
COUNT ,SUM ,AVG , etc.)
In our query, we're grouping by email, but Rails' default
3. Solution
To solve this problem and efficiently retrieve the data we need, we can use PostgreSQL's array_agg function along with pluck. Here's the working query:
// language: ruby User .joins(:sessions) .where(sessions: { tag: Session::SPECIFIC_TAG }) .group(:email) .having('COUNT(DISTINCT users.id) > 1') .pluck("users.email, array_agg(users.id)")
It will return an array of arrays, where each inner array contains:
- The shared email
- An array of user IDs that share that email (using PostgreSQL's
array_agg function)
// language: ruby [["[email protected]", [1, 99, 543]]
You might wonder why we use pluck instead of select in our solution. Let's explore what happens when we try to use select:
// language: ruby User .joins(:sessions) .where(sessions: { tag: Session::SPECIFIC_TAG }) .group(:email) .having('COUNT(DISTINCT users.id) > 1') .select('users.email, array_agg(users.id) as user_ids')
This query still works, but it comes with some caveats:
- Partial Objects: This query returns
User objects, but they're partial objects. They only have the email attribute populated, plus a custom user_ids attribute that doesn't correspond to any column in the users table. - Type Casting: The user_ids attribute will be a string representation of the PostgreSQL array, not a Ruby array. You'd need to parse it manually.
- ActiveRecord Expectations: Some ActiveRecord methods might not work as expected on these partial objects, potentially leading to subtle bugs.
- Performance: Creating ActiveRecord objects is more resource-intensive than just returning the raw data.
Remember, when dealing with complex queries involving grouping and aggregation, it's often helpful to think about exactly what data you need and how you can retrieve it most efficiently from the database.
Related blogs

Jekyll Asset Caching Strategy for AWS S3 + CloudFront Deployment
Deploying a static Jekyll site to AWS can be fast, but without the right caching strategy, users might see stale content or you might waste bandwidth re-downloading unchanged assets.This blog explains a dual-cache policy that combines long-term cachi...
Software Engineer
Software Engineer


Quick Test Your IP Knowledge
You don’t need to be a software engineer to have heard of IP addresses. They pop up everywhere in our digital lives, from setting up Wi-Fi to troubleshooting connection issues. In this post, I’ve collected some common questions so you can quickly che...
Software Engineer
Software Engineer
