Lodash's chain vs native methods

2021-08-25

Taking a closer look at lodash's chain, how it differs from using the native map, reduce and filter methods of Array.

Image of an old pocketwatch with a chain attached to it, represents that using lodash's chain is faster sometimes than using the native map, filter and reduce.
Photo by Jason D

Intro

Lodash is one of my favorite JavaScript utility libraries, it includes almost anything you might need, for manipulating JavaScript data structures. Whenever I write some code which does some serious data wrangling, I check lodash first.

That being said, I usually just use the native map, reduce and filter methods of Array in my React code, instead of using the lodash equivalents.

In this post I want to explore the differences between using the lodash array methods, the native array methods, and what using _.chain brings us. Plus answer the question why I use the native methods in React.

Native vs Lodash

Basically there no difference between using the lodash equivalents of map, reduce, every, some, or filter, they are all equivalent to the Array methods. Yes the code differs, but you end up with the same results.

If this is true then why does lodash even have them? I think the answers are:

  • lodash has been around for a long time
  • lodash offers DX shortcuts, and lodash's map, filter and reduce also work on objects
  • lodash has very powerful chain method

I'll expand these answers a bit.

Lodash has been around

So lodash is quite old in internet years, and existed in a time when ECMAScript 5, which contained map, filter, reduce etc, was not fully supported by all browsers. So using lodash was a safe way to support older browsers.

Developer experience

Lodash also offers some nifty DX shortcuts:

const persons = [
  { name: "Maarten", age: 32 },
  { name: "Tosca", age: 30 },
  { name: "Owen", age: 4 },
  { name: "Jane", age: 1 },
];

_.map(persons, 'age');

// Results in  [32, 30, 4, 1]

Now this does not seem very useful vs using:

persons.map(person => person.age);

But remember before ECMAScript 6 we didn't have arrow / lambda functions, so we had to write this instead:

persons.map(function (person) {
  return person.age;
});

Also lodash's functions work on both arrays and objects:


const person = { name: "Maarten", age: 32, email: null };

_.filter(person, (x) => !!x);

// ["Maarten", 32]

This can sometimes be useful, but I'll have to admit it is not used very often, even lodash's documentation does not show examples of passing in objects.

Lodash's chain / sequence

One killer feature of lodash is chains / sequences.

Chains / sequences are lazy, in contrast with the native Array methods, which are eager. This means that lodash has better performance, because it minimizes the amount of work it does.

Lets look at how the lazy works vs how eager works.

Say we have an array of people with the following type signature:

type Person = {
  id: number;
  name: string;
  age: number;
};

Where the array is defined as such:

export const persons: Person[] = [
  { id: 1, name: "Thompson Gregory", age: 24 },
  { id: 2, name: "Orr Mcdonald", age: 37 },
  { id: 3, name: "Patterson Cline", age: 36 },
  { id: 4, name: "Ryan Roy", age: 26 },
  { id: 5, name: "Johnston Foreman", age: 36 },
  { id: 6, name: "Bowers Cortez", age: 29 },
  { id: 7, name: "Bernard Pena", age: 38 },
  { id: 8, name: "Gladys Spears", age: 26 },
  { id: 9, name: "Pittman Cain", age: 40 },
  { id: 10, name: "Carver Wall", age: 26 },
  { id: 11, name: "Mara Massey", age: 37 },
  { id: 12, name: "Fletcher Peck", age: 20 },
  { id: 13, name: "Bentley Moran", age: 34 },
  { id: 14, name: "Ellison Witt", age: 33 },
  { id: 15, name: "Freida Ramsey", age: 35 },
  { id: 16, name: "Hines Zimmerman", age: 27 },
  { id: 17, name: "Miranda Gilliam", age: 31 },
  { id: 18, name: "Alejandra Dunlap", age: 29 },
  { id: 19, name: "Lynnette Sawyer", age: 39 },
  { id: 20, name: "Katy Adams", age: 33 },
  { id: 21, name: "Howe Wilcox", age: 21 },
  { id: 22, name: "Paige Holmes", age: 32 },
  { id: 23, name: "Mable Estrada", age: 34 },
  { id: 24, name: "York Lopez", age: 36 },
  { id: 25, name: "Kris Mejia", age: 29 },
  { id: 26, name: "Carlson Payne", age: 27 },
  { id: 27, name: "Staci Flores", age: 34 },
  { id: 28, name: "Eloise Lowe", age: 25 },
  { id: 29, name: "Lyons Hicks", age: 30 },
  { id: 30, name: "Goodwin Castro", age: 31  }
];

Say we want to get: "The first 5 people who are in their thirties now, or are going to be thirty after their next birthday." We could write the following code using native eager Array methods:

persons
  .map(p => ({ ...p, age: p.age + 1 }))
  .filter(p => p.age >= 30)
  .slice(0, 5);

It is pretty straightforward: first increase the age by one via the map. Second select all persons who are now older than 30 or are already thirty via the filter. Third slice to get the first five.

Here is the lazy lodash variant using chain:

There are two ways of making a chain / sequence in lodash, one is via the lodash function, and the other via the chain function. I'll focus on chain here, the difference is that when using chain you must end the sequence using .value. I'll usechain throughout this post because it has a more distinct name.

chain(persons)
  .map(p => ({ ...p, age: p.age + 1 }))
  .filter(p => p.age >= 30)
  .take(5)
  .value();

The only difference between the two versions, is that instead of slice we use a convenience method called take.

Both versions have the exact same result, the difference is that the lodash variant is lazy and that it will not do as much work, as the native version, and is therefore more performant.

The lodash version does 7 maps and 7 filters, the native one does 30 filters, and 30 maps. You can check the numbers here.

To understand how laziness works, lets look at how a human would solve this problem:

You are given the task of the example in real life. There is a neat orderly queue of 30 people standing in a line. You walk past each person and ask them their current age, and do a calculation: age + 1. If the person is 30 or older, you ask them to follow you.

After "evaluating" just 7 people you realise that you now have 5 people following you. Now ask yourself the question: will you continue "processing" the other 23 people, or will you stop and go home?

The reason we go home is... that we humans understand that our task is now complete. There is no point in "evaluating" the other 23, because even if 10 people still match the criteria, we already have the first 5. Nothing will change the result at this point.

We humans have a great intuition about these sort of things.

What chain / sequences will do is mimic our human intuition by being lazy.

Native array methods are eager they will first do the map 30 times, and then do the filter 30 times, and then they will take the first 5 results.

So with 30 people in the array, and after the first 7 people there are 5 matches. Chain will have looped through just 7 items.

This is pretty impressive.

Implementing our own chain

Lets make a toy example of how we could implement our own lazy array methods like lodash. It is always fun to see how the magic works. Just promise me you won't use it in production...

The most tricky we should get out of the way first: which is understanding why we need to build a chain of operations.

There are two types of operations: commutative operations, and non-commutative operations. Commutative means that you can re-order the operations without changing the result.

Take filter for example: it does not matter in a chain of sequential filters which filter is done first. The result is always the same. This means that filter is commutative. The reason that filter is commutative is that it does not transform the item it is given.

For map this is not the case, the order of sequential maps is essential. There is a difference between: doubling the age first and then adding one year versus adding one year and then doubling it.

This means that when we encounter a Map we need to start a new chain of operations, when we encounter a filter we need to continue the current chain of operations.

So by knowing which operations we can bundle together, we can perform them sequentially, which is what makes it more performant.

Before we program anything lets look at the usage of our own function first:

lazy(persons)
  .map(p => ({ ...p, age: p.age + 1 }))
  .filter(p => p.age >= 30)
  .take(5);

As you can see it is the same usage as lodash's chain, except we call it lazy.

Here is the full code in all its glory, before we take it apart:

type FilterFn<F> = (item: F) => boolean;

type FilterOperation<T> = {
  type: "filter";
  fn: FilterFn<T>;
};

type MapFn<T, TResult> = (item: T) => TResult;

type MapOperation<T, TResult> = {
  type: "map";
  fn: MapFn<T, TResult>;
};

type Operation<T> = FilterOperation<T> | MapOperation<T, any>;

type Chain<T> = Operation<T>[][];

type Lazy<T> = {
  filter(fn: FilterFn<T>): Lazy<T>;
  map<TResult>(fn: MapFn<T, TResult>): Lazy<TResult>;
  take(n: number): T[];
};

const terminated = Symbol("terminated");

type OperationResult<T> = T | typeof terminated;

export function lazy<T>(collection: T[]): Lazy<T> {
  const chain: Chain<T> = [[]];

  const wrapper: Lazy<T> = { filter, map, take };

  function filter(fn: FilterFn<T>): Lazy<T> {
    const operation: FilterOperation<T> = { type: "filter", fn };

    chain[chain.length - 1].push(operation);

    return wrapper;
  }

  function map<TResult>(fn: MapFn<T, TResult>): Lazy<TResult> {
    const operation: MapOperation<T, TResult> = { type: "map", fn };

    chain.push([operation]);

    const wrapperResult = (wrapper as any) as Lazy<TResult>;
    return wrapperResult;
  }

  function take(n: number): T[] {
    const result: T[] = [];

    for (let i = 0; i < collection.length; i++) {
      let item: OperationResult<T> = collection[i];

      for (const operations of chain) {
        if (isTerminated(item)) {
          break;
        }

        for (const operation of operations) {
          item = performOperation(operation, item);

          if (isTerminated(item)) {
            break;
          }
        }
      }

      if (!isTerminated(item)) {
        result.push(item);

        if (result.length === n) {
          return result;
        }
      }
    }

    return result;
  }

  return wrapper;
}

function performOperation<T>(
  operation: Operation<T>,
  item: T
): OperationResult<T> {
  const { fn, type } = operation;

  switch (type) {
    case "filter":
      return fn(item) ? item : terminated;

    case "map":
      return fn(item);
  }
}

function isTerminated<T>(item: OperationResult<T>): item is typeof terminated {
  return item === terminated;
}

You can play with it here here and see how it fares against lodash.

Now lets break it down, one step at a time, starting with the types for the chain:

type FilterFn<F> = (item: F) => boolean;

type FilterOperation<F> = {
  type: "filter";
  fn: FilterFn<F>;
};

type MapFn<T, TResult> = (item: T) => TResult;

type MapOperation<T, TResult> = {
  type: "map";
  fn: MapFn<T, TResult>;
};

type Operation<T> = FilterOperation<T> | MapOperation<T, unknown>;

type Chain<T> = Operation<T>[][];

First we define two operations FilterOperation and MapOperation these two together form the type Operation. The Operation type represents all operations our lazy can perform.

All Operation's should have a type type key which is a string. This so we know which operations to actually perform in the performOperation function, but more on that later on.

The FilterFn and MapFn represent the functions were are going to be given by the users of lazy, when calling filter and map respectively. These are the function which perform the operation, but will only be called when needed.

Next is the type for Chain, a chain is a nested array, where each nested array is of type Operation<T>[].

The nesting works like this: each commutative operation, such as filter, will add itself to the last array inside the Chain. Each non-commutative operation, such as map, will start a new nested array. This way each nested array represent operations which can happen sequentially.

Now lets look at the lazy function's skeleton:

type Lazy<T> = {
  filter(fn: FilterFn<T>): Lazy<T>;
  map<TResult>(fn: MapFn<T, TResult>): Lazy<TResult>;
  take(n: number): T[];
};

export function lazy<T>(collection: T[]): Lazy<T> {
  const chain: Chain<T> = [[]];

  const wrapper: Lazy<T> = { filter, map, take };

  function filter(fn: FilterFn<T>): Lazy<T> {
    // shortened
  }

  function map<TResult>(fn: MapFn<T, TResult>): Lazy<TResult> {
    // shortened
  }

  function take(n: number): T[] {
    // shortened
  }

  return wrapper;
}

The Lazy type is self referencing. To make a nice fluent API we need to make sure that we always return Lazy from all chainable operations, these are map and filter in this case.

A fluent API means that the methods of the API can be chained together. For a more detailed explanation see: wikipedia.

The exception is take which is a terminating operation, in take we need to actually return a result. It is terminating because it stops the chain, which is why it returns T[].

On line 8 we initialize the chain which is were all the operations will live.

On line 10 the wrapper is initialized, it is the return value of both filter, map and of lazy itself.

Now that we understand the way the chain works filter is kind of easy:

export function lazy<T>(collection: T[]): Lazy<T> {
  const chain: Chain<T> = [[]];

  function filter(fn: FilterFn<T>): Lazy<T> {
    const operation: FilterOperation<T> = { type: "filter", fn };

    chain[chain.length - 1].push(operation);

    return wrapper;
  }

  // shortened
}

As you can see it creates a FilterOperation and puts it in the last chain of operations. This is because filter is commutative and we can perform all filters sequentially in one go.

This means that as long as we add chain filters, they will end up in the same Operations array.

map, the non-commutative operation, works like this:

export function lazy<T>(coll: T[]): Lazy<T> {
  const chain: Chain<T> = [[]];

  function map<TResult>(fn: MapFn<T, TResult>): Lazy<TResult> {
    const operation: MapOperation<T, TResult> = { type: "map", fn };

    chain.push([operation]);

    const wrapperResult = (wrapper as any) as Lazy<TResult>;
    return wrapperResult;
  }

  // shortened
}

Because map is non-commutative we know we need to always start a new Operations in the chain at this point. So we create a MapOperation and push it on the chain inside of a new array.

By calling push we ensure that it is added in the last place in the array.

Line 9 makes sure TypeScript understands that the type of T has now changed to the return type (TResult) of the MapFn. Otherwise the developer using map will get wrong type hints in subsequent map and filters.

Now it is time to examine take, which is where the operations are actually performed, but first we have to understand terminated:

const terminated = Symbol("terminated");

type OperationResult<T> = T | typeof terminated;

function isTerminated<T>(item: OperationResult<T>): item is typeof terminated {
  return item === terminated;
}

First we declare a Symbol called terminated. A Symbol in JavaScript is a special variable that is only equal to itself:

const terminated = Symbol("terminated");

// This is true
terminated === terminated; 

// But this is false
terminated === Symbol("terminated");

The terminated Symbol represents that the item in the collection is finished processing early. This happens when an item does not make it past a filter. By using a Symbol for this case we are guaranteed that no-one can accidentally return it from a map by accident.

If we had used null, undefined, -1 or NaN instead. Every map which accidentally returns those values would get filtered out.

By not exporting / exposing the terminated symbol our users can never accidentally access it.

The OperationResult represents the fact that each operation can either result in a termination: Symbol('terminated'), or a continuation: T.

The isTerminated function is a TypeScript type guard which tells us if an OperationResult is terminated or continued.

Now that we understand what terminated means and does, lets look at take:

To clarify take it helps to show what the value of the chain variable could be at this point:

[
  [{"type":"map"}, {"type":"filter"}],
  [{"type":"map"}]
]
export function lazy<T>(collection: T[]): Lazy<T> {
  const chain: Chain<T> = [[]];

  function take(n: number): T[] {
    const result: T[] = [];

    for (let i = 0; i < collection.length; i++) {
      let item: OperationResult<T> = collection[i];

      for (const operations of chain) {
        if (isTerminated(item)) {
          break;
        }

        for (const operation of operations) {
          item = performOperation(operation, item);

          if (isTerminated(item)) {
            break;
          }
        }
      }

      if (!isTerminated(item)) {
        result.push(item);

        if (result.length === n) {
          return result;
        }
      }
    }

    return result;
  }

  // shortened
}
  • Line 5 we create an array which will contain the result of the take function.
  • Line 7:8 we loop through the collection and create a let item which is either going to stay of type T, or is going to become terminated when it fails to get past a filter.
  • Line 10:13: remember the chain is an array of arrays containing Operations, so we are going to loop over each Operation array one at the time. But we need to stop whenever the item is terminated.
  • Line 15:20 executes every Operation in the current Operations array, and break's out of the loop whenever the item becomes terminated.
  • Line 24:30 acts as a gate keeper, only non terminated values end up in the results array. If the results are now the desired length of n, take can now stop and return early.
  • Line 33 is the return which is hit if nothing can get taken, because the collection is empty, or not enough values could get taken based on the filter's. In that case we just return what we could take.

The actual operations are performed in performOperation:

function performOperation<T>(
  operation: Operation<T>,
  item: T
): OperationResult<T> {
  const { fn, type } = operation;

  switch (type) {
    case "filter":
      return fn(item) ? item : terminated;

    case "map":
      return fn(item);
  }
}

It is basically a router function, which performs an operation based on the type of Operation.

For type filter: when the FilterFn returns false return terminated, this will cause it to be removed. When true return the item so more operations can be performed.

For map: simply apply the MapFn and returning the result. Remember that because we do not export the terminated symbol, we cannot accidentally have MapFn's that terminate.

Now we are done, we have created a tiny toy version of lodash's chain.

Here is the full example. If you want to challenge yourself add some more methods such as some, every and first.

Conclusion

Now that we have seen how we can make our own chain I want to reflect on my current use of lodash.

"Lodash is dead long live lodash." would be a good summary.

A couple of years ago all my projects would always include lodash right from the start by default. Today the story is a little different, I no longer use it for the things that the Array methods already provide.

Adding a library comes with the following penalties:

  • A library adds to the size of the application / page.
  • Using a libraries adds some cognitive load. It is one more thing to worry / know about.
  • Libraries can have security issues, by not using a library you remove an attack vector.

That being said I will still use lodash, even in the front-end, when I need one of its functions. Having battle tested functions just beats writing them yourself, most bugs and kinks have long since been removed.

To get around the bundle size you can import lodash functions like this: import times from 'lodash/times';.

This prevents you from importing the whole of lodash as explained here.

In version 5 (currently on master, but not released) lodash will be marked as "sideEffects": false. This means that bundlers, such as webpack, can more aggressively tree-shake lodash.

So what about chain / sequences?

Chains are large bundle wise, because they have to include more of lodash due to the fluent API.

When processing data on a Node.js / Deno back-end I would definitely still use lodash's sequences / chain. On a server the penalty for the end user bundle size does not exist.

On the front-end it would have to weigh against increased bundle size, and the number of items being processed. When processing a small amount of items, both approaches are faster than the end user can blink, so using lodash would be overkill.

I hoped you enjoyed reading this article.