Better Queries, Less Code - How to Use COALESE, IFNULL and NULLIF

So I wouldn’t advocate putting raw SQL in your code in general. But there are some functions there that can be worth it, especially compared to the hassles that translating things into Activerecord and Arel can get you into sometimes.

Especially when that translation effort leaves you with more code – and more tangled queries – than really seem worth it.

All of this and more was at play recently when I started using COALESCE, NULLIF and IFNULL while querying around in my db browser.

What I wound up needing to do was going to be a giant pain to translate into ActiveRecord, and likely would have left me with something like this:

some_records.joins(other_records: :still_more).
	where(boolean: true, StillMore.arel_table[:this_thing].gt(1.week.ago)).
	select("some_records.name, other_records.description, still_more.preferred_attribute as preferred, still_more.default_attribute as default")

Kind of real ugly, and that’s about half of what I was doing. More to the point, each of the objects returned would need to be sorted through so I could display and/or use the correct attribute. And it would have left each of the return objects in an unpredictable state – maybe both attributes would have a value, maybe one or the other would be null.

So I’d still have more coding (and likely iterating) to do to deal with all that.

And that’s just for a simple select. If I needed to join there, it woulda gotten messier.

If only I could let the database figure this out for me, so that whatever needed these attributes had a single, predictable, non-null value to read when the time came.

Short takeaway: These MySQL functions move some routine, low-level and potentially n+1 logic having to do with null-handling and preferred vs. default values down into the database level, leveraging the approximately 200 years of db development and production to make your queries more useful for whatever other classes, views, helpers, methods or what-have-you need that info.

Read on for a run-down of these functions and, if yer really digging this, some skeletal use-cases afterward.

ENTER COALESCE, IFNULL, NULLIF

The documentation on this stuff is actually pretty good, but there aren’t a lot of Ruby examples there. I come from a Ruby background, so this matters to me.

Anyway, here’s what’s going on:

Also, this is by no means the sole, or even primary, use of this function. It’s the one I’m mostly looking at in this case, though. It’s often going to be useful to have your db compare some values and return the first non-null one.

For my purposes here, you can think of this as operating much like the pipes (||), only on the db level. If you plan on using this value in another query right away, it also saves you the hassle (and time) of selecting your rows, interating over them, then running at least another query.

Those are easy enough to code around, of course; you can add a try(:some_col) or a some_col.present? or bang in an if some_col at the end of a particular line, and there you go.

Letting the db istelf sort this out for you, though, lets your other methods and functions can do what they’re intended to do without worrying about what your db’s up to or what shape your data’s in.

As for the ugliness of tossing it in there, it’s easy (and useful!) enough to stash away like this:

  class PaymentProfile < ActiveRecord::Base
      scope :account, -> { select("IFNULL(preferred_account, default_account)") }
  end

If this looks, to you, a lot like COALESCE, you’re right. I’ll break out some of the differences below.

This one’s great for dealing with columns that can be populated both with empty strings and NULL, despite, from your system’s point of view, being the same thing.

So in our brief little use case, if a bunch of empty strings (or even invalid values) snuck into preferred_account, we can feed that in here to make sure we’re getting the default one.

This is how NULLIF makes COALESECE a lot more useful, in my book: you can use NULLIF to “sanitize” one column’s value, so to speak, and make sure to select the one with the valid value.

And things of this sort come up from time to time. Developers from the past have this habit of not having predicted what we in the present will be wanting from the database. Or not seeing all the ways that other developers in the future might need protection from themselves.

So if a big backfill to clean up the db is out of the question – or really not worth the time in cases when its only effect is to require slightly trickier querying – this function comes in handy.

COALESCE AND IFNULL LOOK AWFULLY SIMILAR

That’s mainly down to the kinds of use cases I’ve mentioned here, which is what I’ve used them for almost exclusively.

In reality, they’re up to different things.

(Very) Simple Use Case

So in the case above, I was dealing with a situation like the following:

Consider an app that stores users’ payment information, (PaymentProfile), including bank account numbers. You let your users add multiple accounts and specify which one they prefer. But they have to have at least one. Call ‘em preferred_account and default_account.

What you need, much of the time, is just a valid account number. Yeah, you want to choose preferred_account if it’s there, but you don’t need to know whether that’s NULL. You just need to be able to call something like user.account and get a non-null value.

For one record at a time, this is no sweat. Just do something like

def account
  preferred_account || default_accountpreferred_account
end

Even truthy but useless junk is easy enough to work around:

def accounts
  preferred_account.present? ? preferred_account : default_account
end

And that’s fine, but you’re still reading that row in the db and introspecting it, just to get a non-null value. If you need to get these numbers from, say, a couple hundred thousand user records, then … well, meet tmux.

A Slightly Different Use Case

So let’s say the values stored at preferred_account and default_account aren’t actual account numbers because you aren’t crazy people.

Instead, they’re a kind of industry-standard identifier for records, people, accounts, whatever. And you’re building out a feature to integrate some info from some of those places onto your users’ accounts.

In this case, it’s publicly-available profile information, let’s say. You don’t need any OAuth-level security, and the main thing your users want, aside from incorporating some of this info, is not to be bothered with setting it up.

So you’re going to need a way to identify users on that other service and query their API for that info, hence this account field. And you’re probably going to be running jobs regularly to check for/update whole batches of users, maybe in cron or something.

So far, so easy:

users_for_query = User.joins(:this_thing).where(boolean: true, this_thing: {some_other_attr: "yup"}).pluck(:account)

some_query_logic(user_for_query)
do_the_thing_with_that_other_stuff

But for whatever reason, there’s two types of identifiers. Say a new industry standard came out that’s more secure and/or efficient, but everyone’s agreed to be backward-compatible with the old one. So now you have to deal with that.

This works:

users_for_query = User.(:this_thing).where(boolean: true, this_thing: {some_other_attr: "yup"}).map(&:find_account)

def find_account
	preferred_account || default_account

end

But seriously, meh.

Also, you find out that there’s no callbacks/constraints/filters anything controlling how a preferred_account with a value, to Ruby, of nil gets recorded. Or maybe it’s some other part of the code that’s not deleting these properly.

You don’t know – all you know is this table is used by things all over the codebase and, for one reason or another, a backfill isn’t an option here.

So now find_account has to check for a null value, and whatever uses users_for_query or similar things will have to account for this. If you need something from all of these tables, but the one with the preferred/optional attribute isn’t one of the attributes for the present class, then it gets a bit more complicated. Plus, you’re first querying the db and then iterating over the response.

And we haven’t even looked at the other end of this: matching the response objects from your API calls to users in your db so you can update the right fields.

As it happens, you’ve got a service that isn’t really specifying whether this is one of the old IDs or the new ones. And there’s no particularly easy way to identify these in code.

Let’s say the response object is simple enough for you to extract something like this:

{ 
  info_we_want: 19893875,
  identifier: 90980asdga098
}

And since you’re searching the API for a lot of IDs all at once, you really aren’t going to want to run a separate INSERT for every one.

Here’s how COALESCE can help:

First, make sure that users_for_query thing has a account field from the word go:

users_for_query = User.(:this_thing).where(boolean: true, this_thing: {some_other_attr: "yup"}).select("users.id, COALESCE(whatever.preferred_attribute, whatever.default_attribute")

This makes it relatively easy to set up your inserts by mapping users_for_query to response_objects by taking the identifier attribute from the response objects, looking into the users_for_query for that number and grabbing the user id (your internal identifier), then updating whatever table you’re working on accordingly.

If you’re simply adding new lines to a table (again, this is a new feature, so you’re pulling in data more than you are modifying and updating existing records), then you can hold your nose, put in some more raw SQL, and make it even easier.

Here’s how the process could work, more or less:

inserts = response_objects.map{ |obj| {user_id: users_for_query[obj["identifier"]], info_we_want: obj["info_we_want"] }
bulk_insert_statement = "INSERT INTO new_table (user_id, info_we_want) VALUES ("
inserts.each{ |k,v| bulk_insert_statment << "#{k}, #{v}), ("}

Not super-pretty, especially all that string manipulation (and the chomping you’ll have to do to make sure it closes correctly).

But you’ll make exactly 2 db calls: one to grab all the user info you need for the API call and one – regardless of how many records you get back – to insert the new rows.

So that’s a lot of stuff about a couple relatively simple functions. But spending a little more time thinking about how I could leverage things the db’s optimized to do to simplify my code and speed up my queries has kept on paying off even beyond the benefits I’ve so far wrung from these functions.