cult3

Working with Enumerables and Streams in Elixir

Jun 13, 2016

Table of contents:

  1. What are Enumerables?
  2. Enumerables are eager
  3. Using Streams instead
  4. Conclusion

So far in this introduction to Elixir series we’ve touched upon the Enum module a couple of times. The Enum module is a collection of functions that act on enumerable data structures.

The Enum module is extremely useful, and it will probably be something that you use a lot in your day-to-day Elixir work.

Elixir also has the Stream module, which like the Enum module, allows you to act on enumerables in much the same way.

So what is the difference between the two modules, and when should you use one or the other?

In today’s tutorial we are going to be exploring these two very useful modules, and understanding when and where you should use them in your Elixir code.

What are Enumerables?

Before we actually get into the difference between the Enum module and the Stream module I’m aware that I shouldn’t automatically assume you know what enumerables are.

Enumerables are data structures that can enumerate. For example in Elixir, lists, maps, and ranges are all enumerable types because you can enumerate the values.

Enumerate simply means take each item at a time and do something with it. For example, iterating through each item in a list.

The Enum module provides generic functions that can be applied to enumerable data structures.

A couple of the most common Enum functions you will find yourself using are map, transform, sort, group, and filter.

Here are a couple of examples of using the map function:

# List
Enum.map([1, 2, 3], &(&1 * 2))

# Range
Enum.map(1..3, &(&1 * 2))

# Map
Enum.map(%{1 => 1, 2 => 2, 3 => 3}, fn {k, v} -> v * 2 end)

As you can see, we can use the same Enum.map/2 function with many different types of enumerable data structure in Elixir.

The enumerable data structures that can be used with the Enum module all implement the Enumerable protocol. We haven’t covered protocols in Elixir just yet, but we will in the coming weeks. You don’t need to worry about this for now.

Enumerables are eager

One of the characteristics of the functions in the Enum module is eagerness. This means the function will act on the data structure immediately.

For example:

Enum.map([1, 2, 3], &(&1 * 3))
|> Enum.filter(&(rem(&1, 2) == 0))

In this example we pass a list of [1,2,3] into the map function and then multiple each value by 3.

This returns a new list (because data in Elixir is immutable) containing the values [3, 6, 9].

This list is then passed into the filter function using the Pipe operator (Using the Pipe Operator in Elixir).

In the filter function we filter out any odd numbers. This returns a new list that looks like this [6].

The important thing to note here is each function acts on the list and produces a new list in isolation. So for this process we iterate through the list twice.

If we were to add another function, we would be iterating through the list for a third time.

This isn’t a big problem with such a small set of data. But once you start working with big data structures this “eager” execution starts to break down.

Using Streams instead

The Stream module offers an alternative to the Enum module for acting on enumerable data structures.

Instead of eagerly acting on the data structure, the Stream module will create a stream that represents the function, but without actually acting on it straight away.

This is known as lazy, as opposed to eager.

These streams can be composed together in a pipeline and then acted on, greatly reducing the overhead of producing a list after each individual function call.

For example, we could write the example from above using the Stream module:

Stream.map([1, 2, 3], &(&1 * 3))
|> Enum.filter(&(rem(&1, 2) == 0))

In this example I’ve replaced the first use of the Enum module with the Stream module. As you can see, you can switch the two modules without having to change the function that is passed as the second parameter.

After the first stage of the pipeline, instead of passing a new list to the filter function, a Stream will be passed instead:

# Stream<[enum: [1, 2, 3], funs: [#Function<30.103178510/1 in Stream.map/2>]]>

This is like a latent enumerable data structure that has the knowledge of the previous stage, but it hasn’t acted upon it yet.

The filter function accepts the Stream, which wakes it up and then the data is acted on.

Whilst this is a simple example, it is possible to compose many lazy Stream function calls together.

For example:

1..1_000_000
|> Stream.map(&(&1 * 2))
|> Stream.filter(&(rem(&1, 2) == 0))
|> Enum.sum()

In this example I create a range from 1 to 1,000,000.

I then create a new Stream that doubles each number.

I then filter out the odd numbers.

I then add up all of the remaining numbers.

Now instead of creating the intermediate lists after each stage we use the Stream module and only act on the data structure at the end of the pipeline.

Conclusion

The Enum module offers a number of really useful functions for acting on enumerable data structures. You will likely find yourself using these functions a lot in your day-to-day programming.

The Enum is “eager” and so it will act on the data straight away. For most use cases this is perfectly fine.

However, if you find yourself in the situation where you need to make a pipeline of transformations on a large dataset, it will probably be better to use the Stream module instead.

The Stream module will lazily act on the data, rather than creating a new list at each step of the process. The Stream module also has the exact same interface, so you don’t need to significantly rewrite your code.

For the most part you will typically be using the Enum module, but you will likely come across a situation where the Stream module is a better choice.

Philip Brown

@philipbrown

© Yellow Flag Ltd 2024.