Transducers From The Ground Up

It is not an exaggeration to say transducers are the greatest thing since sliced bread. They open up new frontiers in performance and composition.

In this series, we'll take a deep look at transducers. We'll derive them naturally from existing concepts, see the wide variety of use cases in which they apply, and how much performance we gain thereby.

All the examples in this post work in the REPL. I recommend keeping one open and following along, especially if you're starting to feel lost.

Hope you brought your swimming trunks, we're going on a deep dive.

Reducing Like It's 1979

Remember this book?

If you've been hanging long enough around programmers, you've probably met one who swears up and down by it. "It changed my life, dude!".

I can't very well argue with that because it changed mine, too, but lest I get biographical, let's have a crack at one of my favorite exercises in the book: implementing reduce.

We'll only implement foldl here. Skip to the appendix for an implementation of foldr if you're interested.

(defn -reduce
  [f init coll]
  (if-let [s  (seq coll)]
    (recur f (f init (first s)) (rest s))
    init))

(-reduce conj [] [1 2 3]) ;; => [1 2 3]
(-reduce + 0 [1 2 3]) ;; => 6

In general, the function reduce takes has the following signature:

(fn rf [accum x] ...) ;; returns new accum

We dub it rf for reducing function.

Standing On The Shoulders Of Reducers

Now that we have implemented reduce, how about we implement map?

The following implementation is tail recursive, and requires introducing a "state" variable, init in this example.

(defn -map
  [f init coll]
  (if-let [s (seq coll)]
    (recur f (conj init (f (first s))) (rest s))
    init))

(-map inc [] (range 3)) ;; => [1 2 3]

Nice. It works. It also looks very similar to reduce. So similar in fact, that we can factor out the differences and implement map with reduce:

(defn -map
  [f init coll]
  (-reduce (fn [init x] (conj init (f x))) init coll))

(-map inc [] (range 3)) ;; => [1 2 3]

That actually worked. But what does it mean?

Let's take a deeper look at the function we pass to reduce. It's only a function of f. We can factor it out as well:

(defn -mapper
  [f]
  (fn [init x] (conj init (f x))))

(defn -map
  [f init coll]
  (-reduce (-mapper f) init coll))

(-map inc [] (range 3)) ;; => [1 2 3]

There's only one problem left with our mapper. It might not look like a problem, but it's there: Its behavior is still concreted via conj. What is its purpose?

conj is a reducing function itself. Remember this example?

(-reduce conj [] [1 2 3]) ;; => [1 2 3]

conj is just a function which takes an aggregate value and an element, and adds the element to the aggregate. Let's factor it out as rf:

(defn -mapper
  [f]
  (fn [rf]
    (fn [accum x] (rf accum (f x)))))

(defn -map
  [f init coll]
  (-reduce ((-mapper f) conj) init coll))

(-map inc [] (range 3)) ;; => [1 2 3]

Now we've made something interesting. It just looks even more complicated.

I also performed a slight of hand and renamed init to accum to fully embed us in the context of reducing functions.

Let's drop map for a while and remain in a reducing context:

(defn -mapper
  [f]
  (fn [rf]
    (fn [accum x] (rf accum (f x)))))

(-reduce ((-mapper inc) conj) [] (range 3)) ;; => [1 2 3]
(-reduce ((-mapper inc) +) 0 (range 3)) ;; => 6

Hold up. That's illegal.

Or is it? Let's take a look and understand what we did here: we have completely separated the behavior of mapping over a sequence, from the behavior of accumulating the results of the mapping into something.

There's some magic to it, but we can actually do it with every function which can be implemented with reduce. There's also a nice equivalence between functions that can be implemented with reduce and with a loop, so every function you can think of that can be implemented by one of those, can turn into a function which changes the behavior of a reducing process. Transforms it. A trans-ducer.

Finally, we can derive our very own transduce:

(defn -transduce
  [xf rf init coll]
  (-reduce (xf rf) init coll))

Do these things even compose?

Since Clojure is functional, we can use simple substitution, no need to eval the next section in the REPL. I use the REPL ;; => output prompt symbol as a visual line-break. Every stage is an expansion of the previous one or function application.

We want to see what's the result of composing two mappers:

(comp (-mapper g) (-mapper f))

Remember the definition of -mapper:

(defn -mapper
  [f]
  (fn [rf]
    (fn [accum x] (rf accum (f x)))))

Transducers are applied to reducing functions. Let's substitute the definition and apply it:

((-mapper f) rf)
;; =>
(fn [accum x] (rf accum (f x)))

Let's now apply a second mapper to it:

((-mapper g) ((-mapper f) rf))
;; =>
(fn [accum x] ((fn [accum' x'] (rf accum' (f x'))) accum (g x)))

Apply the internal function, compose f and g and extract a mapper:

(fn [accum x] ((fn [accum' x'] (rf accum' (f x'))) accum (g x)))
;; =>
(fn [accum x] (rf accum (f (g x))))
;; =>
(fn [accum x] (rf accum ((comp f g) x)))
;; =>
((-mapper (comp f g)) rf)

Finally:

((-mapper (comp f g)) rf) == ((comp (-mapper g) (-mapper f)) rf)
;; =>
(-mapper (comp f g)) == (comp (-mapper g) (-mapper f))

Now that's interesting. The order of composition of mappers is the order of execution of mapped functions.

We have two questions to answer:

Why is the execution order reversed relative to regular comp?
Why do they actually compose?

Reversed order

Let's interpret the comp body:

((-mapper g) ((-mapper f) rf))

(-mapper g) takes a function as its argument. Looking at the definition of mapper, notice it will only be called after (g x) is evaluated. This is how g is called before f.

Why do they compose?

Let's try to understand the type of mapper.

We have a function which takes a function, returns a function of a function. quite involved to write in words, so let's try to write the type signatures:

f :: a -> b
mapper :: (a -> b) -> (rf -> (accum -> a -> accum))
rf :: accum -> a -> accum
--
mapper :: (a -> b) -> ((accum -> a -> accum) -> (accum' -> a' -> accum'))
mapper :: (a -> b) -> (rf -> rf')

By looking at the types and implementation we can start to make sense of what mapper does. It takes a function, and returns a function which, deep breath, takes a reducing function, and returns a modified reducing function. The transformation encapsulates a computational process.

That way, we can chain transducers on top of another to create a computational chain, and only activate it in the end with a reducer.

A reducer is any function which accumulates an element into an accumulator. + and conj both count as reducing functions. We don't have to accumulate scalar values.

Landing Back In The Land Of Clojure

We have reached the end of the first part.

Up until now, we have rolled our own. Let's tie the work we have done here to the Clojure APIs, naming conventions, and usage:

rf: some reducing function
xf: some transducer
composition: (comp xf1 xf2) returns a new transducer which executes from left to right.
application: (xf rf) returns a new reducing function.

A short list of functions with new arities (as of Clojure 1.7) which return or consume transducers:

returns transducer: dedupe, distinct, drop, drop-while, filter, halt-when, interpose, keep, keep-indexed, map, map-indexed, mapcat, partition-all, partition-by, random-sample, remove, replace, take, take-nth, take-while
consume transducers: sequence, into.
is a transducer: cat

Moreover, the following two functions have been added: eduction and transduce.

Now that we have laid down the theoretical foundations for understanding transduscers, we can continue on the next post to see them in practice.

Any correction and all feedback are always welcome.

Happy hacking

Appendix

foldr

(defn reduce-rec
  [f init coll]
  (if-let [s  (seq coll)]
    (f (reduce-rec f init (rest coll)) (first s))
    init))

(reduce-rec conj () [1 2 3])
;; => (1 2 3)

filter

(defn filter-rec
  ([pred coll]
   (when-let [s (seq coll)]
     (let [f (first s) r (rest s)]
       (if (pred f)
         (cons f (filter pred r))
         (filter pred r))))))

(filter-rec even? (range 6))
;; => (0 2 4)

(defn filter-iter
  ([pred init coll]
   (if-let [s (seq coll)]
     (let [f (first s) r (rest s)]
       (recur pred (if (pred f) (conj init f) init) r))
     init)))

(filter-iter even? [] (range 6))
;; => [0 2 4]

(defn filter-red
  [pred]
  (fn [init coll]
    (fn [rf]
      (reduce-iter
       (fn [init x]
         (if (pred x) (rf init x) init))
       init
       coll))))

(defn filterer
  [pred]
  (fn [init x]
    (if (pred x) (conj init x) init)))

(defn filterer
  [pred]
  (fn [comb]
    (fn [init x]
      (if (pred x) (comb init x) init))))

core transducers implementation

The complete core transducer API has three arities for the returned reducing function:

(defn map-xf
  ([f]
   (fn [rf]
     (fn
       ([] (rf))
       ([result] (rf result))
       ([result input]
        (rf result (f input)))
       ([result input & inputs]
        (rf result (apply f input inputs)))))))

(defn filter-xf
  ([pred]
   (fn [rf]
     (fn
       ([] (rf))
       ([result] (rf result))
       ([result input]
        (if (pred input)
          (rf result input)
          result))))))

Acknowledgment

There has already been a post called Transducers From The Ground Up by Uswitch Labs. ~~That post is unfortunately gone and is only found on archiving websites~~. Republished!

This isn't a reprint, but my personal take on the subject.