It is not an exaggeration to say transducers are the greatest thing since sliced bread. They open up new frontiers in performance and composition.
In this series, we'll take a deep look at transducers. We'll derive them naturally from existing concepts, see the wide variety of use cases in which they apply, and how much performance we gain thereby.
All the examples in this post work in the REPL. I recommend keeping one open and following along, especially if you're starting to feel lost.
Hope you brought your swimming trunks, we're going on a deep dive.
Reducing Like It's 1979
Remember this book?
If you've been hanging long enough around programmers, you've probably met one who swears up and down by it. "It changed my life, dude!".
I can't very well argue with that because it changed mine, too, but
lest I get biographical, let's have a crack at one of my favorite
exercises in the book: implementing reduce
.
We'll only implement foldl here. Skip to the appendix for an implementation of foldr if you're interested.
(defn -reduce [f init coll] (if-let [s (seq coll)] (recur f (f init (first s)) (rest s)) init)) (-reduce conj [] [1 2 3]) ;; => [1 2 3] (-reduce + 0 [1 2 3]) ;; => 6
In general, the function reduce
takes has the following signature:
(fn rf [accum x] ...) ;; returns new accum
We dub it rf
for reducing function.
Standing On The Shoulders Of Reducers
Now that we have implemented reduce
, how about we implement map
?
The following implementation is tail recursive, and requires
introducing a "state" variable, init
in this example.
(defn -map [f init coll] (if-let [s (seq coll)] (recur f (conj init (f (first s))) (rest s)) init)) (-map inc [] (range 3)) ;; => [1 2 3]
Nice. It works. It also looks very similar to reduce
. So similar in
fact, that we can factor out the differences and implement map
with
reduce
:
(defn -map [f init coll] (-reduce (fn [init x] (conj init (f x))) init coll)) (-map inc [] (range 3)) ;; => [1 2 3]
That actually worked. But what does it mean?
Let's take a deeper look at the function we pass to reduce. It's only
a function of f
. We can factor it out as well:
(defn -mapper [f] (fn [init x] (conj init (f x)))) (defn -map [f init coll] (-reduce (-mapper f) init coll)) (-map inc [] (range 3)) ;; => [1 2 3]
There's only one problem left with our mapper
. It might not look
like a problem, but it's there: Its behavior is still concreted via
conj
. What is its purpose?
conj
is a reducing function itself. Remember this example?
(-reduce conj [] [1 2 3]) ;; => [1 2 3]
conj
is just a function which takes an aggregate value and an
element, and adds the element to the aggregate. Let's factor it out as
rf
:
(defn -mapper [f] (fn [rf] (fn [accum x] (rf accum (f x))))) (defn -map [f init coll] (-reduce ((-mapper f) conj) init coll)) (-map inc [] (range 3)) ;; => [1 2 3]
Now we've made something interesting. It just looks even more complicated.
I also performed a slight of hand and renamed init
to accum
to
fully embed us in the context of reducing functions.
Let's drop map
for a while and remain in a reducing context:
(defn -mapper [f] (fn [rf] (fn [accum x] (rf accum (f x))))) (-reduce ((-mapper inc) conj) [] (range 3)) ;; => [1 2 3] (-reduce ((-mapper inc) +) 0 (range 3)) ;; => 6
Hold up. That's illegal.
Or is it? Let's take a look and understand what we did here: we have completely separated the behavior of mapping over a sequence, from the behavior of accumulating the results of the mapping into something.
There's some magic to it, but we can actually do it with every function which can be implemented with reduce. There's also a nice equivalence between functions that can be implemented with reduce and with a loop, so every function you can think of that can be implemented by one of those, can turn into a function which changes the behavior of a reducing process. Transforms it. A trans-ducer.
Finally, we can derive our very own transduce:
(defn -transduce [xf rf init coll] (-reduce (xf rf) init coll))
Do these things even compose?
Since Clojure is functional, we can use simple substitution, no need
to eval the next section in the REPL. I use the REPL ;; =>
output
prompt symbol as a visual line-break. Every stage is an expansion of
the previous one or function application.
We want to see what's the result of composing two mappers:
(comp (-mapper g) (-mapper f))
Remember the definition of -mapper
:
(defn -mapper [f] (fn [rf] (fn [accum x] (rf accum (f x)))))
Transducers are applied to reducing functions. Let's substitute the definition and apply it:
((-mapper f) rf) ;; => (fn [accum x] (rf accum (f x)))
Let's now apply a second mapper to it:
((-mapper g) ((-mapper f) rf)) ;; => (fn [accum x] ((fn [accum' x'] (rf accum' (f x'))) accum (g x)))
Apply the internal function, compose f
and g
and extract a mapper:
(fn [accum x] ((fn [accum' x'] (rf accum' (f x'))) accum (g x))) ;; => (fn [accum x] (rf accum (f (g x)))) ;; => (fn [accum x] (rf accum ((comp f g) x))) ;; => ((-mapper (comp f g)) rf)
Finally:
((-mapper (comp f g)) rf) == ((comp (-mapper g) (-mapper f)) rf) ;; => (-mapper (comp f g)) == (comp (-mapper g) (-mapper f))
Now that's interesting. The order of composition of mappers is the order of execution of mapped functions.
We have two questions to answer:
- Why is the execution order reversed relative to regular
comp
? - Why do they actually compose?
Reversed order
Let's interpret the comp
body:
((-mapper g) ((-mapper f) rf))
(-mapper g)
takes a function as its argument. Looking at the
definition of mapper
, notice it will only be called after (g x)
is evaluated. This is how g
is called before f
.
Why do they compose?
Let's try to understand the type of mapper
.
We have a function which takes a function, returns a function of a function. quite involved to write in words, so let's try to write the type signatures:
f :: a -> b mapper :: (a -> b) -> (rf -> (accum -> a -> accum)) rf :: accum -> a -> accum -- mapper :: (a -> b) -> ((accum -> a -> accum) -> (accum' -> a' -> accum')) mapper :: (a -> b) -> (rf -> rf')
By looking at the types and implementation we can start to make sense of what mapper does. It takes a function, and returns a function which, deep breath, takes a reducing function, and returns a modified reducing function. The transformation encapsulates a computational process.
That way, we can chain transducers on top of another to create a computational chain, and only activate it in the end with a reducer.
A reducer is any function which accumulates an element into an
accumulator. +
and conj
both count as reducing functions. We
don't have to accumulate scalar values.
Landing Back In The Land Of Clojure
We have reached the end of the first part.
Up until now, we have rolled our own. Let's tie the work we have done here to the Clojure APIs, naming conventions, and usage:
rf
: some reducing functionxf
: some transducer- composition:
(comp xf1 xf2)
returns a new transducer which executes from left to right. - application:
(xf rf)
returns a new reducing function.
A short list of functions with new arities (as of Clojure 1.7) which return or consume transducers:
- returns transducer: dedupe, distinct, drop, drop-while, filter, halt-when, interpose, keep, keep-indexed, map, map-indexed, mapcat, partition-all, partition-by, random-sample, remove, replace, take, take-nth, take-while
- consume transducers: sequence, into.
- is a transducer: cat
Moreover, the following two functions have been added: eduction and transduce.
Now that we have laid down the theoretical foundations for understanding transduscers, we can continue on the next post to see them in practice.
Any correction and all feedback are always welcome.
Happy hacking
Appendix
foldr
(defn reduce-rec [f init coll] (if-let [s (seq coll)] (f (reduce-rec f init (rest coll)) (first s)) init)) (reduce-rec conj () [1 2 3]) ;; => (1 2 3)
filter
(defn filter-rec ([pred coll] (when-let [s (seq coll)] (let [f (first s) r (rest s)] (if (pred f) (cons f (filter pred r)) (filter pred r)))))) (filter-rec even? (range 6)) ;; => (0 2 4) (defn filter-iter ([pred init coll] (if-let [s (seq coll)] (let [f (first s) r (rest s)] (recur pred (if (pred f) (conj init f) init) r)) init))) (filter-iter even? [] (range 6)) ;; => [0 2 4] (defn filter-red [pred] (fn [init coll] (fn [rf] (reduce-iter (fn [init x] (if (pred x) (rf init x) init)) init coll)))) (defn filterer [pred] (fn [init x] (if (pred x) (conj init x) init))) (defn filterer [pred] (fn [comb] (fn [init x] (if (pred x) (comb init x) init))))
core transducers implementation
The complete core transducer API has three arities for the returned reducing function:
(defn map-xf ([f] (fn [rf] (fn ([] (rf)) ([result] (rf result)) ([result input] (rf result (f input))) ([result input & inputs] (rf result (apply f input inputs))))))) (defn filter-xf ([pred] (fn [rf] (fn ([] (rf)) ([result] (rf result)) ([result input] (if (pred input) (rf result input) result))))))
Acknowledgment
There has already been a post called Transducers From The Ground Up by
Uswitch Labs. That post is unfortunately gone and is only found on
archiving websites. Republished!
This isn't a reprint, but my personal take on the subject.
Other tutorials worth looking at: