The inconceivable effectiveness of multiple dispatch

Under the cut are offered the transcript of the report, Stefan Karpinski, one of the key developers of the Julia language. In the report he talks about what unexpected results have led convenient and efficient multiple dispatch, is taken as the basic paradigm Julia.

From the translator: the title of the report refers to the article by Eugene Wigner “the Inconceivable effectiveness of mathematics in natural Sciences”.

Multiple dispatch is a key paradigm of the Julia language, and during its existence we, the creators of the language noticed something expected, but at the same time perplexing. At least we didn’t expect it to the extent in which they saw. It is something is a stunning level of reuse of code in the Julia ecosystem, which is much higher than any other known to me language.

We constantly see that some people write generic code, someone else defines a new data type, these people are not familiar with, and then someone uses this code to this unusual type of data… And everything just works. And it happens surprisingly often.
I always thought that such behavior should be expected from object-oriented programming, but I used many object-oriented languages, and it turns out that they are usually so simple is not working. So at some point I wondered why Julia is such an efficient language in this regard? why is there such a high level of reusing the code? And what lessons can be learned that other languages could borrow from Julia to become better?

Sometimes, when I say it, the audience doesn’t believe me, but you’re already on JuliaCon, so you know what is happening, so I will focus on why, in my view, this is happening.

But first — one of my favorite examples.

On the slide is the work of Chris Rackauckas. He writes all sorts of very generic packages for solving differential equations. You can feed back the dual number, or BigFloat, whatever you want. And somehow he decided that he wanted to see the error of the result of integration. And there was a package of Measurements, which can track how the value of some physical quantity, and the spread of errors through a sequence of formulas. Also, this package supports an elegant syntax for quantities with uncertainty, using the Unicode character ±. This slide shows that the acceleration of free fall, the length of the pendulum, the initial velocity, the angle of deflection — all known with some error. So, you define a simple pendulum, gives his equations of motion using ODE solver and — BAM! all works. And you can see a graph with the moustache errors. And that I did not show that code for drawing graphics too generalized, and you just let go, and the magnitude with an error of Measurements.jl and get a graph with errors.

The level of compatibility of different packages and generalization of the code is just shit-kicking. Like, it just works? It turns out, Yes.

Well, not that we really did not expect this. After all, the concept of multiple dispatch we have included in the language precisely because it allows to Express generic algorithms. So all of the above is not so crazy. But one thing to know it in theory and another to see it in practice, that approach really works. Because single-dispatch and overloading operators in C++, you also have to give a similar result — but in reality often do not work as desired.

In addition, we are seeing more than we anticipated, in developing the language: written not just a generic code. I’ll try to tell me what, in my opinion, it is more.

So, there are two ways of reusing code, and they are quite different. One is the generalized algorithms, and it’s the first thing I remember. The second, less obvious, but I guess more important aspect is the ease with which Julia the same data types are used in a variety of packages. To some extent this is because the methods of the type does not become an obstacle to its use: you don’t need to agree with the author about the type of interfaces and methods that it inherits; you could just say, “Oh, I like this type of RGB. Operations over him I will come up with, but its structure is like”.

Preface. Multiple dispatch vs. overloading functions

Now I have to mention the overload of functions in C++ or Java, because I kept asking questions about them. At first glance, it is no different from multiple dispatch. What’s the difference and what function overloading is worse?

Start with the example in Julia:

abstract type Pet end struct Dog <: Pet; name::String end
struct Cat <: Pet; name::String end function encounter(a: Pet, b::Pet) verb = meets(a, b) println("$(a.name) meets $(b.name) and $verb")
end meets(a::Dog, b::Dog) = "sniffs"
meets(a::Dog, b::Cat) = "chases"
meets(a::Cat, b::Dog) = "hisses"
meets(a::Cat, b::Cat) = "slinks"

We define an abstract type Pet, placed on it by the subtypes of the Dog and Cat, they have a name field (code is a bit repeated, but tolerable) and user-defined aggregate function “meeting” that takes arguments two objects of type Pet. We first calculated action determined by the result of a call to generalized function meet()and then print the proposal, describing the meeting. In the function meets() we use multiple dispatching to determine the action that makes one animal when meeting others.

Add a couple dogs and a few cats and look at the results of the meeting:

fido = Dog("Fido")
rex = Dog("Rex")
whiskers = Cat("Whiskers")
spots = Cat("Spots") encounter(fido, rex)
encounter(rex, whiskers)
encounter(spots, fido)
encounter(whiskers, spots)

Now the same “translate” to C++ as possible, literally. We define the class Pet with a field name in C++ we can do it (by the way, one of the advantages of C++ — data fields you can add even abstract types. Then define a base function meets(), define a function encounter() for two objects of type Pet , and finally, we define derived classes Dog and Cat will do overload meets() for them:

class Pet {
public: string name;
}; string meets(Pet a, Pet b) { return "FALLBACK"; } void encounter(Pet a, Pet b) { string verb = meets(a, b); cout << a.name << "meets" << b. name << " and " << verb << endl;
} class Cat : public Pet {};
class Dog : public Pet {}; string meets(Dog a, Dog b) { return "sniffs"; }
string meets(Dog a, Cat b) { return "chases"; }
string meets(Cat a, Dog b) { return "hisses"; }
string meets(Cat a, Cat b) { return "slinks"; }

Function main()as in the code in Julia, creates dogs and cats and makes them meet:

int main() { Dog fido; fido.name = "Fido"; Dog rex; rex.name = "Rex"; Cat whiskers; whiskers.name = "Whiskers"; Cat spots; spots.name = "Spots"; encounter(fido, rex); encounter(rex, whiskers); encounter(spots, fido); encounter(whiskers, spots); return 0;
}

So multiple dispatch against overload functions. Gong!

What, in your opinion, will return the code with multiple dispatch?

$ julia pets.jl

Fido meets Rex and sniffs
Rex meets Whiskers and chases
Spots meets Fido and hisses
Whiskers meets Spots and slinks

Animals meet, obnubilate, hiss and play catch-up — as it was intended.

$ g++ -o pets pets.cpp && ./pets

Fido meets Rex and FALLBACK
Rex meets Whiskers and FALLBACK
Spots meets Fido and FALLBACK
Whiskers meets Spots and FALLBACK

In all cases, it returns a “spare” option.

Why? Because this is function overloading. If I worked multiple dispatch, then meets(a, b) inside encounter() would invoke with specific types that a and b have at the time of the call. But applies overload, therefore, meets() is invoked for static types of a and b, both of which, in this case, Pet.

So, in the approach of C++ is a direct “translation” of the generalized Julia code does not give the desired behavior because the compiler uses the types derived statically at compile time. But the whole point is that we want to call a function based on the actual concrete types that have variables in runtime. Template function, although things are looking better, still require knowledge of all of the expression types statically at compile time, and it is easy to think of an example where it would be impossible.

For me, such examples show that multiple dispatch does is right and all other approaches is just not a very good approximation to the correct result.

Now look at this table. I hope you find it meaningful:

In languages without dispatching you just write f(x, y, ...), the types of all arguments are fixed, i.e. the call to f() is a call to a single function f(), which can be in the program. The degree of a constant expression: call f() always does one thing and only one thing. Single dispatching was a big breakthrough in the transition to the PLO in the 1990s and 2000s. Usually you should use the dot syntax that people like. And an additional expressive capability: the call will dispetcherizaciya by object type x1. Expressive ability is characterized by the cardinality of the set |X1| types that have a method f(). In multiple dispatch, the number of potential variants for the function f() is equal to the capacity of the Cartesian product of the sets of types that may belong to the argument. In reality, of course, hardly anyone need so many different functions in one program. But the key point here is that the programmer is given a simple and natural way to use any element of this diversity, and this leads to exponential growth of possibilities.

Part 1. Generic programming

Let’s talk now about the generic code — the main feature of multiple dispatch.

Here’s a (completely artificial) example of a generalized code:

using LinearAlgebra inner_sum function(A, vs) t = zero(eltype(A)) for v in vs t += inner(v, A, v) # multiple dispatch!
end return t
end inner(v, A, w) = dot(v, A * w) # a very generic definition

Here A is something macrocephalae (although I did not specify the types and guess what-it is possible is that the name), vs is a vector of any Vectora, similar items, and then using this “matrix” is a scalar product, which is given a generalized definition without specifying any types. Generic programming here is the call to the function inner() in a loop (an insider tip: if you want to write generic code — simply remove any restrictions on the types).

So, “look, mom, it works”:

julia> A = rand(3, 3)
A 3×3 Array{Float64,2}: 0.934255 0.712883 0.734033 0.145575 0.148775 0.131786 0.631839 0.688701 0.632088 julia> vs = [rand(3) for _ in 1:4]
4-element Array{Array{Float64,1},1}: [0.424535, 0.536761, 0.854301] [0.715483, 0.986452, 0.82681] [0.487955, 0.43354, 0.634452] [0.100029, 0.448316, 0.603441] julia> inner_sum(A, vs)
6.825340887556694

Nothing special, it calculates some value. But code written in a generalized style and will work for any A and vs, if only they could make the corresponding operations.

About the effectiveness for specific types of data — as lucky. I mean that for dense vectors and matrices, this code will do a “how to” — generates machine code with a call to BLAS operations, etc., etc. If you pass a static array the compiler and it will take into account, deploy cycles, apply vectorization — all right.

But more importantly — the code will work for new types, and it is possible to make it not just super effective, and super-duper effective! Let’s cooperdale new type (this is a real data type that is used in machine learning), the unitary vector (one-hot vector). It is a vector, whose one component is 1 and all others zero. To imagine it can be very compact: everything you need to store is the length of the vector and the number of the nonzero component.

import Base: size, getindex, * struct OneHotVector <: AbstractVector{Int} len :: Int ind :: Int
end size(v::OneHotVector) = (v.len) getindex(v::OneHotVector, i::Integer) = Int(i == v.ind)

Actually, it’s really all the definition of the type of package that it adds. And with this definition inner_sum() also works:

julia> vs = [OneHotVector(3, rand(1:3)) for _ in 1:4]
4-element Array{OneHotVector,1}: [0, 1, 0] [0, 0, 1] [1, 0, 0] [1, 0, 0] julia> inner_sum(A, vs)
2.6493739294755123

But the dot product is used here is a General definition for this type of data is slow, not cool!

So, a General definition of the work, but not always optimally, and this when using Julia, you can periodically face: “and, then is called a General definition, that is why this GPU code has been working for five hours…”

In inner() by default is called a General definition of the product matrix vector that when multiplied by a unitary vector returns a copy of one of the columns of type Vector{Float64}. Then called on the General definition of the scalar product dot() with a unitary vector and this column, which makes a lot of unnecessary work. In fact, each component is checked “you are equal to one? and you?” etc.

We can greatly optimize this procedure. For example, replace the multiplication of a matrix on OneHotVector just a column is selected. Well, we define this method, and that’s all.

*(A::AbstractMatrix, v::OneHotVector) = A[:, v.ind]

And here it is, the power: we say “want to dispatch on the second argument, “no matter what’s in the ground. This definition simply pull the line from the matrix, and will be much faster than the General method — removed iteration and summation of columns.

But one can go further and directly optimize inner(), because the multiplication of two unitary vectors through the matrix just pulls out the element of this matrix:

inner(v::OneHotVector, A, w::OneHotVector) = A[v.ind, w.ind]

Here is the promised super-duper-efficiency. And all that is necessary is to define this method inner().

This example shows one use of multiple dispatch: there is a General definition of the function, but for some data types it does not function optimally. And then we add point method, which preserves the behaviour of the function for these types, but it works much more efficiently.

But there is another area — when the total of the function definition there, and I want to add functionality for some types. Then it is possible with minimal effort to add it.

A third option — just want to have the same function name but with different behavior for different data types — for example, so that the function behave differently when working with dictionaries and arrays.

How to get similar behavior in languages with single dispatch? Possible, but difficult. Problem: overload function * needed to do the dispatching on the second argument, not the first. You can do double dispatch: first dispetcherskoe on the first argument and call the method AbstractMatrix.*(v). And this method in turn calls something like a v.__rmul__(A), i.e. the second argument in the original call has now become the object whose method is actually invoked. __rmul__ here is taken from Python, where this behavior is standard pattern, but it works, it seems that only addition and multiplication. I.e. double dispatch problem is solved, if we want to call a function called + or *, otherwise — alas, not our day. In C++, and other languages — need to build your bike.

OK, and what inner()? Now there are three arguments, and dispatch is on the first and third. What to do in languages with single dispatch — is unclear. “Triple dispatch” I live never met. Good solutions there. Usually, when there is such a need (in numerical codes, it appears very often), people eventually sell their system multiple dispatch. If you look at large projects for the numerical calculations in Python, you’ll be amazed how many of them go on this way. Of course, this implementation work ad hoc, poorly designed, full of bugs and slow (a reference to the tenth rule Greenspan — approx. transl.), because not on one of these projects was not working Jeff bezanson (author and main developer of the dispatch system types in Julia — approx. transl.).

Part 2. Common types

Go to the back of the paradigm Julia — common types. This, in my opinion, the main “workhorse” language, because this is an area I see a high level of reuse of code.

For example, suppose you have a RGB type, such as that available in ColorTypes.jl. There is nothing complicated, just gathered together three values. For the sake of simplicity, we assume that the type is not parametric (but could be), and the author has set for him a few basic operations that found it useful. You take this type and think “Hmm, I’d like to add another operation on this type.” For example, imagine RGB as a vector space (which, strictly speaking, wrong, but as a first approximation will do). In Julia you can just go ahead and add in your code, all transactions that are missing.

The question arises — and what? Why am I on this focus? It turns out that object-oriented languages based on classes, this approach is surprisingly difficult to implement. Since in these languages, the method definitions are inside the class definition, the add method can only be in two ways — either edit the class code, adding the desired behavior, or create a class that inherits with the right methods.

The first option inflates the base class definition and forces the developer of the base class to take care of the support of all the added methods when the code changes. That may one day make such a unsupported class.

Inheritance is the classical “recommended” option, but also not devoid of disadvantages. First, you need to change the name of the class — let it will now not RGB, and MyRGB. In addition, new methods will no longer work for the original class RGB; if I want to apply my new method to the object RGBcreated in someone else’s code — you need to do a conversion or wrapper in MyRGB. But it’s not so bad. If I did a class MyRGB with some added functionality, someone else OurRGB etc., then if someone wants a class that has all of the new functionality, you need to use multiple inheritance (and that’s only if the programming language allows it at all!).

So, both options are so-so. There are, however, other solutions:

  • To make the functionality in the external function instead of a class method — go to f(x, y) instead of x.f(y). But this is a generalized behavior.
  • Spit on reuse of code (and I think in many cases are). Just copy others class RGB and add what is missing.

A key feature of Julia in terms of reusing the code is almost entirely limited to the fact that the method is defined outside the class. All. To do the same in languages with single dispatch, and the types can be reused with the same ease. This whole “let’s make methods part of the class” — so-so idea, in fact. There is, however, a good point — using classes as namespaces. If I write x.f(y)f() is not required to be in the current namespace, you can look for it in the namespace x. Yes, it is a good thing — but it is worth all the other trouble? I do not know. In my opinion, no (though my opinion, as you can guess, slightly biased).

Epilogue. The expression problem

There is a problem in programming that was seen in the 70s. It is largely associated with the static type checking, because in these languages it came from. I really think she has nothing to do with static type checking. The problem is the following: is it possible to change the data model and a set of operations on the data simultaneously without resorting to questionable methods.

The problem more or less boils down to this:

  1. can you easily and without error to add new data typesto which the applicable available techniques and
  2. is it possible to add new operations on existing types.

(1) easily done in object-oriented languages and difficult functional, (2) — on the contrary. In this respect we can talk about the duality of OOP and FP approaches.

In languages with multiple dispatching both operations are done easily. (1) can be solved by adding methods to existing functions for new types, (2) — definition of new functions on existing types. And this is not new, we invented it. If you go to the Wikipedia page about the expression problem (https://en.wikipedia.org/wiki/Expression_problem), multiple dispatch is just number one in the list of possible solutions. Why is it no one uses? I have the feeling that the majority believes that the problem is very niche. But if you think about it, “adding new types to existing operations work” — it’s “generic programming works as it should” in other words. And “add new operations that work for existing types” is possible, because methods can be added after the types.

And these two aspects of the problem of the expression correspond exactly to the types of code reuse that we observed. This seemingly theoretical problem — the key to a high degree of code reusing.

Thus, to Julia freed from the problem of expressions (the simplest and oldest of proposed methods), we have an incredible degree of reusing code. Who would have thought.

Source

Leave a Reply

Your email address will not be published. Required fields are marked *