JVM/Dynamic Languages: Very Fast Return Path Trick

Nothing fishy happening here!
Java has a simple policy for returning from methods and escaping from loops to labels. Some JVM languages, especially dynamic ones, require a more powerful 'return dispatch' system:

I call this 'return dispatch' because it is a little like the reverse of method dispatch. In method dispatch, logic in the program causes different methods to be called. In return dispatch, the called method has logic which determines where the program continues to execute from after the return. In Java, break and return are very different things. break is implemented using jump bytecode instructions where as return is implemented via return bytecodes. Further, break is only ever within one method where as return escapes from a method to the previous one on the call stack. However, these differences are implementational; loop bodies can be considered as and implemented by methods or closures. Doing so makes 'return' and 'break' semantics the same:

The JVM has a very powerful system for offering typed, multiple return routes. This mechanism is the try/catch/finally system. This is hard wired into the JVM and tightly integrated into the JVM type system. The beauty of it is that a JVM method can throw a Throwable at any point and the stack will unwind until an appropriate catch is reached. 

You are probably already hating this!
  1. This feels like an abuse of the Throwable mechanism.
  2. This will perform badly.
No - it is not an abuse.
3 years ago, before I started working on bytecode generating compilers, I would have agreed with you. Looked at it from the point of view of Java, my suggestion looks terrible. But we are not looking at it from the point of view of Java; we are looking at it from the point of view of the JVM and bytecode. Throwable in Java is usually just a super class of Exception or Error. Both Exception and Error are used to denoting just those things, exceptions and errors. If one were to use Exception or Error objects to for return dispatch then I would agree that it would be an abuse. However, Throwable is just something which can be thrown and caught; it carries no more semantic information as to why it was thrown. What I am saying here is 'we throw a Throwable up the call stack until the appropriate catch statement catches it'. 

It is very fast indeed.
Traditionally, throwing Exceptions and Errors is seen as expensive. Well, actually, it is expensive. This is because the throw action in the JVM causes the fillInStackTrace() method on Throwable to be called. It is not throwing which is expensive its self, but the execution of that method. Fortunately, fillInStackTrace() is not final, so we can override it with a null-op version. This makes throwing Throwables very fast indeed. If we can avoid creating them as well, then the throw/catch mechanism becomes as fast as normal method return!

How it works.
The approach I am suggesting is to create a base class which extends Throwable and overrides fillInStackTrace. For example, we could consider the situation where a language implements 'break to label' for escape out of nested loops, but the loops might be implemented using closures. In this case the compiler would create a synthetic class as a sub-type of our 'no stack Throwable' (which I will call ReturnDispatch) for each label in the method. These can be implemented as tiny static inner classes. What this means is that each label now has a unique type associated with it. Where break/return/jump to label happens, the compiler can generate a throw for the appropriate inner class type. The objects thrown can actually be JVM statics (to avoid creating instance more than once).

Here is a very simple example illustrating how fast this approach can actually be:

package nerdscentral;

public class LoopTest {

    static long count = 0;

    static class ReturnDispatch extends RuntimeException{
        
        @Override
        public Throwable fillInStackTrace(){
            return this;
        }
    }
    
    static class TestDispatch extends ReturnDispatch{}
    static final TestDispatch ret = new TestDispatch();
    
    public static void main(String[] args){
        long time = 0;
        for(int j=0;j<10000;++j){
            if(j==10){
                time=System.currentTimeMillis();
            }
            for(int i=0;i<1000000;++i){
                try{
                    loop2();
                }catch(Exception e){count++;}
            }
        }
        System.out.println("Dispatch:    " + (System.currentTimeMillis() - time));
        for(int j=0;j<10000;++j){
            if(j==10){
                time=System.currentTimeMillis();
            }
            for(int i=0;i<1000000;++i){
                loop1();
                count++;
            }
        }
        System.out.println("No Dispatch: " + (System.currentTimeMillis() - time));
        System.out.println(""+count);
    }
    
    private static void loop1(){
        count++;
    }
    
    private static void loop2(){
        count++;
        throw ret;
    }
    
}

...

Dispatch:    477
No Dispatch: 713
40000000000

In the above code I have implemented a super simplified closure system where the loops call out to methods which are acting as closures around the static field count. In a more realistic example the closures would update state in a some inner object (see my explanation of closures in Scala in a previous post). I have updated count in the invoked methods and in the loop to demonstrate that the code is being executed correctly. The resulting value for count should be and is 4x10**10.

We can see in the above code that the method loop2 uses my return dispatch approach by throwing a static instance of a Throwable which has had the fillinStackTrace method overridden. Because I am using Java and it has the defined throws clause stuff I have used a subclass of RuntimeException. In a non Java implementation I would suggest directly subclassing Throwable (see my explanation of why this is not an abuse above).

The amazing thing in the example code is that on the 1.7 JVM on Linux 64 bit I am using, the method using the return dispatch approach runs FASTER than the one with normal return. I can only assume that the JVM has made more aggressive inlining choices or some other magic under the hood. However, this result illustrated that there is no performance hit taken for using this approach. 

Thoughts? Does anyone have an ideas on how to improve this and make it even easier to implement?