Understanding Clojure transducers through types
Yesterday, Rich Hickey published a blog post, “Transducers are Coming”, which attracted a lot of attention.
I have a confession to make, which I have made before: I find it very difficult to understand ideas or code not presented with types. So I decided that the only way I could possibly understand what “transducers” are would be to actually implement them in a typed language. I ended up doing so and am sharing my findings here.
Vague types in the original blog post
Rich informally gave some type signatures in his blog post:
;;reducing function signature
whatever, input -> whatever
;;transducer signature
(whatever, input -> whatever) -> (whatever, input -> whatever)
This was, unfortunately, not very helpful. It is hard to make sense of this pseudo-notation for types. What is quantified over what? And what is bound to what? I’ll explain later what I mean by these questions.
First discussion thread I saw
There was much tweeting online about transducers after Rich Hickey’s initial announcement; the tweets did not help me, except for links posted to discussion elsewhere.
One of them was on Hacker News. I browsed through it but found it mostly not useful. The problem was that although a lot of interesting Haskell code was thrown around, it tended to be related to transducers but not an exact translation of the concept. I already had my own intuitions about transducers being related to well-known types such as foldables, iteratees, lenses, etc. That “ordinary function composition” was involved immediately suggested the connections, because function composition is huge in these existing Haskell libraries.
But what I wanted was to understand transducers as they are, before even thinking about generalizations and comparisons.
What are the types?
Rich Hickey informally offered some types (which he said were “a la Haskell”) to try to help out:
;;reducing fn
x->a->x
;;transducer fn
(x->a->x)->(x->b->x)
OK, by using type variables a, b, and x, that indicates what is bound to what. The blog post should have used this notation rather than
;;transducer signature
(whatever, input -> whatever) -> (whatever, input -> whatever)
Sample Clojure code
He also posted some sample Clojure code:
Second discussion thread I saw
Then today, I saw a discussion thread on Reddit, titled “Clojure’s Transducers are Perverse Lenses”.
Actual runnable Haskell code
Rich finally posted some actual type-checked, runnable Haskell code!
-- Transducers in Haskell
mapping :: (a -> b) -> (r -> b -> r) -> (r -> a -> r)
-- Original was (b -> a) -> (r -> a -> r) -> (r -> b -> r)
-- but Michael O'Keefe in comment pointed out this is misleading
mapping f xf r a = xf r (f a)
filtering :: (a -> Bool) -> (r -> a -> r) -> (r -> a -> r)
filtering p xf r a = if p a then xf r a else r
flatmapping :: (a -> [b]) -> (r -> b -> r) -> (r -> a -> r)
flatmapping f xf r a = foldl xf r (f a)
-- for exposition only, yes, conj is gross for lazy lists
-- in Clojure conj and left folds dominate
conj xs x = xs ++ [x]
xlist xf = foldl (xf conj) []
-- build any old list function with its transducer, all the same way
xmap :: (a -> b) -> [a] -> [b]
xmap f = xlist $ mapping f
xfilter :: (a -> Bool) -> [a] -> [a]
xfilter p = xlist $ filtering p
xflatmap :: (a -> [b]) -> [a] -> [b]
xflatmap f = xlist $ flatmapping f
-- again, not interesting for lists, but the same transform
-- can be put to use wherever there's a step fn
xform :: (r -> Integer -> r) -> (r -> Integer -> r)
xform = mapping (+ 1) . filtering even . flatmapping (\x -> [0 .. x])
print $ xlist xform [1..5]
-- [0,1,2,0,1,2,3,4,0,1,2,3,4,5,6]
After this post, I knew it would not take me long to figure out transducers.
Refactoring his Haskell code
Two things to notice about the original code:
- It has long, low-level function types rather than types that actually name the concepts being discussed (reducers and transducers).
- It uses hardcoded list types
[a].
Type synonyms and higher-rank types
Defining lots and lots of types (whether synonyms or newtypes is standard practice when programming in a modern typed language. OK, so I defined a type synonym
-- Left reduce
type Reducer a r = r -> a -> r
But what about transducer? This is trickier.
An invalid attempt at a type would be
-- Illegal!
type Transducer a b = Reducer a r -> Reducer b r
because the type variable r is not bound in the type definition. And it would be incorrect to just randomly add r on the left hand side as an extra parameter to the Transducer type, because in fact it is critical that a transducer does not care about the underlying reducer’s return type r. How do we write the desired type?
It turns out you need higher-rank types. Rank-1 types are not sufficient; we need a rank-2 type to quantify r, to say that a transducer from a to b is a transformation that takes a reducer to a specific r and returns another reducer to the same r.
-- Here's where the rank-2 type is needed
type Transducer a b = forall r . Reducer a r -> Reducer b r
Now we can see more clearly some completely generic ways to create a transducer:
mapping :: (a -> b) -> Transducer b a
mapping f xf r a = xf r (f a)
filtering :: (a -> Bool) -> Transducer a a
filtering p xf r a = if p a then xf r a else r
A bit of history
Higher-rank types are a powerful technique for expressing “hiding” of unnecessary details about types going on somewhere. My first recollection of the real world use of rank-2 types is from 1994 (the year I started using Haskell, although I did not actually use it in my work as a software engineer until 1995), when I was excited to read a paper by John Launchbury and Simon Peyton Jones that solved, using a rank-2 type, a specific, important, practical problem, “Lazy Functional State Threads”; twenty years later, their ST monad is still part of the standard library!
Introducing type classes
Clojure uses protocols as an abstraction mechanism, and the “magic” of transducers uses protocols. In Haskell, type classes are the major abstraction mechanism (this is true of Scala also).
So I abstracted away from the hardcoded list-oriented functions and values in Rich Hickey’s Haskell code:
foldlabstracted to aclass Foldableconjand empty list[]abstracted to aclass Conjable
-- Left fold
class Foldable f where
fold :: Transducer a (f a)
class Conjable f where
empty :: f a
conj :: Reducer a (f a)
Note our reliance on transducing and reducing from one type a to another, f a.
Foldable constraint
Unlike mapping and filtering, flatmapping is not completely generic, because it depends on something being Foldable (implementing a fold):
flatmapping :: Foldable f => (a -> f b) -> Transducer b a
flatmapping f xf r a = fold xf r (f a)
Conjable constraint
Finally, here’s the originally list-specific code that now depends only on Foldable and Conjable:
-- I changed Rich Hickey's code to be more general than just list
-- but accept anything Conjable
xlist :: (Foldable f, Conjable f) => Transducer b a -> f a -> f b
xlist xf = fold (xf conj) empty
-- build any old Foldable function with its transducer, all the same way
xmap :: (Foldable f, Conjable f) => (a -> b) -> f a -> f b
xmap f = xlist $ mapping f
xfilter :: (Foldable f, Conjable f) => (a -> Bool) -> f a -> f a
xfilter p = xlist $ filtering p
xflatmap :: (Foldable f, Conjable f) => (a -> f b) -> f a -> f b
xflatmap f = xlist $ flatmapping f
List-specific stuff
Here is the list-specific code:
-- Stuff specialized to lists.
-- To use another type, just make it a Foldable and Conjable.
instance Foldable [] where
fold = foldl
-- for exposition only, yes, conj is gross for lazy lists
-- in Clojure conj and left folds dominate
instance Conjable [] where
empty = []
conj xs x = xs ++ [x]
-- Note: the type does not say anything about Foldable or Conjable,
-- even though the implementation just happens to use a list!
xform :: Transducer Integer Integer
xform = mapping (+ 1) . filtering even . flatmapping (\x -> [0 .. x])
-- Again, this can munge anything Foldable and Conjable, not just a list.
munge :: (Foldable f, Conjable f) => f Integer -> f Integer
munge = xlist xform
Notice some very important properties of this code:
xformhas a type that does not mention lists at all, even though it is implemented using a list and cannot compile without the listinstanceimplementations.mungealso does not mention lists, and can transform anything that isFoldableandConjable.
Example:
-- munge a list
-- [0,1,2,0,1,2,3,4,0,1,2,3,4,5,6]
example1 :: [Integer]
example1 = munge [1..5]
Implementing another type to illustrate the genericity of transducers
To illustrate Rich Hickey’s main point, I implemented instances of Foldable and Conjable for a standard Haskell Vector library as an alternate “collection-like” type.
-- For example using Vector instead of list
import qualified Data.Vector as V
-- Implement Foldable, Conjable type classes for Vector
instance Foldable V.Vector where
fold = V.foldl
instance Conjable V.Vector where
empty = V.empty
conj = V.snoc
And we can run munge directly on a vector instead of a list, without making any changes:
-- return a vector rather than a list; note the fact that munge actually
-- internally uses a list
example2 :: V.Vector Integer
example2 = munge $ V.enumFromN 1 5
This is code reuse at its best.
Note that there is nothing that ties transducers to any concrete “collection” type. We could write instances of Foldable and Conjable for some kind of “channel” abstraction, for example, and instantaneously be able to munge data coming from it and to another. In fact, this is already what is done in the real world, where Haskell and Scala are used in production at places like Facebook and Twitter to efficiently handle large amounts of data.
My code repository
My complete code is available on GitHub.
Conclusion
It was pretty exciting to see the announcement of the transducers library for Clojure, because it represents a level of abstraction that I think has not been expressed much in the world of dynamically typed languages, although the techniques are two decades old in the Haskell community in a statically typed setting. And I hope that I was able to convey the sheer elegance of Haskell as a way to express interesting types with practical ramifications for abstraction and pluggability.
The transducers code in my Haskell example ends on line 10. The rest is an example of use. To consider transducers something that requires some T be Foldable or Conjable is beside the point and confusing. (I know you are not saying that, but people following along need to realize it is *only* about reducing function transformation, it is not about Foldable things) Also, while it may be convenient to factor out r, the facts of reducing functions in dynamic languages with runtime polymorphism, and of applying transducers to processes, are much more nuanced than r -> a -> r can capture. It's a convenient shorthand that when reified like this is significantly less expressive and less correct than the truth of the rules.
Thanks for the feedback. Yes, I tried to make clear, through types, that Transducer is just "forall a b r . Reducer a r -> Reducer b r" and therefore has NOTHING to do with Foldable or Conjable. This type is really the key point I tried to make, with examples to back up why it is important: the beauty of the type annotations is that they make clear exactly which functons of the sample code actually involve Foldable or Conjable (in the type constraints).
Also, I know I have only scratched the surface (by exactly duplicating some sample code of yours). I look forward to learning more about uses of transducers and improving the types in different examples in order to aid my understanding of what is going on. I am, of course, particularly interested in whether a wall will be hit in terms of the expressiveness of static type systems to capture the full truth of your concept. I will report on any new understanding I get. I am not at all claiming that "all this stuff has already been done".
To be clear, I think you did a good job. The rank-2 type in particular captures an important property. I wanted as much to make a point about my code, where I didn't make the transition from transducers lib to example clear. The biggest thrust of transducers is moving away from thinking about functions from Collection -> Collection, or Xable -> Xable. If you can express your process in terms of a reducing function, you are in business.
I'd be interested to see examples where the factored r is insufficiently expressive. I think that's one of the most interesting things I ended up chasing when I tried to push on this reification.
Interesting post, thanks. Just a note when looking at the type signature versus the argument list for "mapping" -- please check my understanding (which may be wrong): argument f has type (b -> a), argument xf has type (r -> a -> r), argument r has type r, and argument a has type b. If that's a correct interpretation, switching the argument name or type signature may help with the understandability.
Yes, the "mapping" type signature Rich gave in his code is backwards in spirit (everything should focus on "a" to "b" transformations), and I copied it and also I see that my "xlist" annotation is also backwards. I have fixed these. Thanks for spotting the glitch: naming type parameters consistently is critical documentation!
This "backwards" property is really interesting. For instance:
sequence (map (*2) . map (+1)) [1..10]is
[3,5,7,9,...]not
[4,6,8,10,...]like you might expect. I'm anticipating that transducers naturally "compose backwards". Although, you can just CPS things as much as you like to make composition work backwards or forwards.
You might make the UX a little nicer by defining
type Transducer a b = forall r . Reducer b r -> Reducer a rand then just noting that
(.)works backwards. Or maybe evennewtypeing it and defining a backwardsCategoryinstance.I'm not sure what the best solution is, since something will be backwards somewhere. I remember the CPS trick Olivier Danvy showed me in 1998 for his type-safe printf but it seemed too clever.
You can minimize the backwardness with a
newtype. Lens was criticized a lot for not doing this until people started to realize the mileage Ed was getting out of abusing Haskell's subtyping relation. I don't think a similar advantage applies here—although it does force everything to be abstract.Haskell's subtyping relation? Could you please say more about that?
Whenever you have a constrained, quantified type like
forall a . Fractional a => athen there's an implicit subtyping relationship. If we're in a positive context then we can always "forget" some constraints and use our type in this less constrained setting, for instance ifx :: forall a . Fractional a => athenx + xuses this implicit subtyping. If we're in a negative context, then the same thing applies just in the opposite direction.I'm trying to piece together why I would use a transducer over something like `fmap`. There are two things I've noticed that are interesting to me. First is that `xmap` has almost the exact same type as `fmap`. Why is that? Does it do the same thing? The second thing I noticed was that `foldl` has a type that looks to match the required type for a transducer `(a -> b -> a) -> a -> [b] -> a`. So is `foldl` a transducer? If so what does that mean for transducers and `foldl`?
You're absolutely right, fold actually has the type of a transducer. I'm changing the type signature to make this clear.
As for xmap and xflatmap, I've deliberately omitted, for now, discussion about whether we're deriving Functor and Monad from other principles. The intended laws for Foldable and Conjable definitely need to be made explicit (one would expect, in the case of "a" and "r" types being the same, a Monoid property, for example). I've left all this for later investigation pending examination of actual use cases in the Clojure library.
I believe the intention of all this machinery is to allow for configurability and optimization of the implementation of the overall computation that is different from the "obvious" sequential pipeline of intermediate "collections".
Thanks for the response, I'm very curious about the use cases for transducers as well.
one use of transducers is to express effectful processes, such as consuming a stream delivered by async IO while at the same time producing an output stream after performing some sort of transformation on the data
Thanks for your insightful post !
To each his own, I tried to understand transducers trough C++ https://gist.github.com/sci... ☺.
Coming late to this party, I noticed 3 things:
1. Very nice.
2. It's might be clearer to say ReducingFn instead of Reducer here. As I understand it, the latter is, essentially, the reducible, which might or might not have had a Transducer embedded in it.
3. The polymorphism can be expressed as a core.typed type function, in pretty much the same way as in Haskell:
(t/defalias Transducer (t/TFn [[a :variance :covariant]
[b :variance :contravariant]]
(t/All [r] [(ReducingFn a r) -> (ReducingFn b r)])))
Hi Franklin, you may be interested in how this comes out in scala. Here is my effort: http://notes.langdale.com.a... Cheers, Arnold