Building Lists in Elixir with Reduce

Building Lists in Elixir with Enum.reduce/3

My first experience with Elixir was building an internal tool for TuneCore’s accounting department. We needed an application to process tens of millions of transactional records into a single report. Since the amount of data to process was massive we wanted something that could process it all as fast as possible, and with its built-in concurrency, Elixir was an easy choice.

The project was my first attempt at purely functional programming so I came up against quite a few challenges. One of the trickiest things I encountered was how to create subsets of data from larger collections of data. In Ruby or Javascript, I routinely build up collections iteratively, adding or subtracting items along the way. I never thought much about how I was mutating an array or a hash by altering its contents. It worked and that was it. But then came Elixir and that all changed.

Elixir data types are immutable, so once you create a list or a map (or any other data type) you cannot change it. For example, you can’t add or remove items from a list. If you want a subset of items in a list, you create an entirely new list with the select items you want.

I needed to create many subsets of thousands of transaction records over and over again. The GenServer module provides functionality that can solve this problem with a stack implementation. However, a GenServer seemed like overkill and with some experimentation I found a much simpler solution.

The simpler solution I found was to use the reduce/3 function from the Enum module. It makes it easy to build lists in place by iteratively appending to a list’s head.

However, before we look at reduce/3 let’s look at reduce/2.

reduce/2

The common example of reduce/2 is a basic arithmetic function such as getting the sum of a list of numbers.

Enum.reduce([1,2,3,4,5], fn(x, acc) -> x + acc end)

# (returns 15)

This iterates through the supplied Enumerable (here, the list of numbers 1 to 5), where with each pass x is the current item from the Enumerable that is being operated on and acc is the accumulator.

It’s important to notice that the accumulator is initially set to the first item in the Enumerable and then is reset to the “return” value after each pass. To illustrate this clearly, we can print out the accumulator on each pass using IO.puts.

Enum.reduce([1,2,3,4,5], fn(x, acc) ->
  IO puts "x = #{x}, acc = #{acc}"
  x + acc
end)

# x = 2, acc = 1
# x = 3, acc = 3
# x = 4, acc = 6
# x = 5, acc = 10
# (returns 15)

reduce/3

The key difference with reduce/3 is that you set the starting accumulator with an argument. For example, we could pass in 100 as our starting accumulator.

Enum.reduce([1,2,3,4,5], 100, fn(x, acc) ->
  IO puts "x = #{x}, acc = #{acc}"
  x + acc
end)

# x = 1, acc = 100
# x = 2, acc = 101
# x = 3, acc = 103
# x = 4, acc = 106
# x = 5, acc = 110
# returns (115)

Supplying the starting accumulator is how we can build up lists (or any other sort of collection) iteratively. We simply supply an empty list as the starting accumulator and add to the list by conditionally setting an item to the list head.

Let’s look at a contrived example of simplified music distribution records. Each record contains the following data: [id, store, status]

distributions = [
  {1001, 'itunes', 'complete'},
  {1002, 'spotify', 'complete'},
  {1003, 'spotify', 'error'},
  {1004, 'amazon', 'complete'},
  {1005, 'itunes', 'error'},
  {1006, 'amazon', 'complete'},
  {1007, 'amazon', 'complete'},
  {1008, 'itunes', 'processing'},
  {1009, 'itunes', 'processing'},
  {1010, 'spotify', 'processing'}
]

Now, using an empty list as the starting accumulator and checking each item’s status, we can easily get all the distributions in an “error” state:

errors =
  Enum.reduce(distributions, [], fn(x, acc) ->
    state = elem(x, 2)
    if state == 'error', do: [x | acc], else: acc
  end)

# errors = [{1005, 'itunes', 'error'}, {1003, 'spotify', 'error'}]

Or we can check each item’s store to create a list of distributions for the “spotify” store:

spotify_distros =
  Enum.reduce(distributions, [], fn(x, acc) ->
    store = elem(x, 1)
    if store == 'spotify', do: [x | acc], else: acc
  end)

# spotify_distros = [{1010, 'spotify', 'processing'}, {1003, 'spotify', 'error'}, {1002, 'spotify', 'complete'}]

Next time you need to create subsets of data in Elixir consider using reduce/3 like in the examples above. You can view the full documentation here

* Note:

After reading this, a colleague pointed out Enum’s filter/2 function as a simpler way to create sublists. In many cases, it’s probably true that filter/2 is the more straightforward solution. Nonetheless, I think this trick with reduce/3 is pretty cool and worth learning anyway. It’s just one example of the many ways you can use reduce to do all sorts of things in functional programming.