Perl 6 Core Hacking: QASTalicious

2018-01-26 | 6528 words | Overview of "Q" Abstract Syntax Trees + bug fix tutorial

Over the past month, I spent some time in Rakudo's QAST land writing a few optimizations, fixing bugs involving warnings, as well as squashing a monster hive of 10 thunk scoping bugs with a single commit. In today's article, we'll go over that last feat in detail, as well as learn what QAST is and how to work with it.

PART I: The QAST

"QAST" stands for "Q" Abstract Syntax Tree. The "Q" is there because it's comes after letter "P", and "P" used to be in "PAST" to stand for "Parrot", the name of an earlier, experimental Perl 6 implementation (or rather, its virtual machine). Let's see what QAST is all about!

Dumping QAST

Every Rakudo Perl 6 program compiles down to a tree of QAST nodes and you can dump that tree if you specify --target=ast or --target=optimize command line option to perl6 when compiling a program or a module:

$ perl6 --target=ast -e 'say "Hello, World!"'
[...]
- QAST::Op(call &say) <sunk> :statement_id<?> say \"Hello, World!\"
  - QAST::Want <wanted> Hello, World!
    - QAST::WVal(Str)
    - Ss
    - QAST::SVal(Hello, World!)
[...]

The difference between the --target=ast and --target=optimize is that the former shows the QAST tree as soon as it has been generated, while the later shows the QAST tree after the static optimizer has had a go at it.

While the command line option gives you the QAST for the entire program (excluding modules pre-compiled separately), each QAST::Node object has a .dump method you can use to dump specific QAST pieces of interest from within Rakudo's source code.

For example, to examine the QAST generated by the statement token, I'd find method statement in src/Perl6/Actions.nqp and stick nqp::say('statement QAST: ' ~ $past.dump) close to the end of the method.

Since Rakudo's compilation takes a couple of minutes for each go, I like to key my debug dumps on env variables, like this:

nqp::atkey(nqp::getenvhash(),'ZZ1') && nqp::say('ZZ1: something or other');
...
nqp::atkey(nqp::getenvhash(),'ZZ2') && nqp::say('ZZ2: something else');

Then, I can execute the compiled ./perl6 as if I didn't add anything, and enable my dumps by running ZZ1=1 ./perl6, ZZ2=1 ./perl6, or both dumps at the same time with ZZ1=1 ZZ2=1 ./perl6.

Viewing QAST

Looking at the output of --target dumps in the terminal is sufficient for a quickie glance at the trees, but for extra assistance you can install CoreHackers::Q module that brings in q command line utility.

Simply prefix your regular perl6 invocation with q a or q o to produce --target=ast and --target=optimize QAST dumps respectively. The program will generate out.html file in the current directory:

$ q a perl6 -e 'say "Hello, World!"'
$ firefox out.html

Pop open the generated HTML file and reap these benefits:

  • Color-coded QAST nodes
  • Color hints for sunk nodes
  • Ctrl+Click on any node to collapse it
  • Muted view of QAST::Want alternatives, makes it easier to ignore them

Eventually, I hope to extend this tool and make it more helpful, but at the time of this writing, that's all it does.

The QAST Forest

There are four main files in rakudo's source where you'd expect to be working with QAST nodes: src/Perl6/Grammar.nqp, src/Perl6/Actions.nqp, src/Perl6/World.nqp, and src/Perl6/Optimizer.nqp. If you're using Z-Script utility, you can even run z q command to open these four files in Atom editor.

Grammar.nqp is the Perl 6 grammar. Actions.nqp are the actions for it. World.nqp contains all sorts of helpful routines used by both Grammar.nqp and Actions.nqp that access them via the $*W dynamic variable containing a Perl6::World object. Lastly, Optimizer.nqp contains Rakudo's static optimizer.

The root (of all evil) is the QAST::Node object, with all the other QAST nodes being its subclasses. Let's review some of the popular ones:

QAST::Op

QAST::Op nodes are the workhorse of the QAST world. The :op named argument specifies the name of an NQP op or the name of a Rakudo's NQP extension op and its children are the arguments:

Here's a say op printing a string value:

QAST::Op.new: :op<say>,
  QAST::SVal.new: :value('Hello, World!');

And here's a QAST node for a call op that calls Perl 6's infix:<+> operator; notice how the name of the routine we call is given via :name named argument:

QAST::Op.new: :op<call>, :name('&infix:<+>'),
  QAST::IVal.new( :value(2)),
  QAST::IVal.new: :value(2)

QAST::*Val

The QAST::SVal, QAST::IVal, QAST::NVal, and QAST::WVal nodes, specify string, integer, float, and "World" object values respectively. The first three are the "unboxed" raw values, while World objects are everything else, such as DateTime, Block, or Str objects.

QAST::Want

Some of the objects can be represented by multiple QAST::*Val nodes, where the most appropriate value is used depending on what is wanted in the current context. QAST::Want node contains these alternatives, interleaved with string markers indicating what those alternatives are.

For example, numeric value 42 in Perl 6 could be wanted as an object to call some method on, or as a raw value to be assigned to a native int variable. The QAST::Want node for it would look like this:

QAST::Want.new:
  QAST::WVal.new(:value($Int-obj))),
  'Ii',
  QAST::IVal.new: :value(42)

The $Int-obj above would contain an instance of Int type with value set to 42. The Ii marker indicates the following alternative is an integer value and we provide a QAST::IVal object containing it. The other possible markers are Nn (float), Ss (string), and v (void context) alternatives.

When these nodes are later converted to bytecode, the most appropriate value will be selected, with the first child being the "default" value, to be used when none of the available alternatives make the cut.

QAST::Var

These nodes are used for variables and parameters. The :name named argument specifies the name of the variable and :scope its scope:

QAST::Op.new: :op('bind'),
  QAST::Var.new(:name<$x>, :scope<lexical>, :decl<var>, :returns(int)),
  QAST::IVal.new: :value(0)

The :decl named arg is present when the node is used for the variable's declaration (when it's absent, we simply reference the variable) and its value dictates what sort of variable it is: var for variables and param for routine parameters. Several other :decl types, as well as optional arguments specifying additional configuration of the variable exist. You can find them discussed in the QAST documentation

QAST::Stmt / QAST::Stmts

These are statement grouping constructs. For example, here, the truthy branch of an nqp::if contains three nqp::say statements, all grouped inside QAST::Stmts:

QAST::Op.new: :op<if>,
  QAST::IVal.new(:value(42)),
  QAST::Stmts.new(
    QAST::Op.new( :op<say>, QAST::SVal.new: :value<foo>),
    QAST::Op.new( :op<say>, QAST::SVal.new: :value<bar>),
    QAST::Op.new: :op<say>, QAST::SVal.new: :value<ber>),
  QAST::Op.new: :op<say>, QAST::SVal.new: :value<meow>,

The singular QAST::Stmt is similar. The difference is it marks a register allocation boundary, beyond which, any temporaries are free to be reused. When used correctly, this alternative can result in better code generation.

QAST::Block

This node is both a unit of invocation and a unit of lexical scoping. For example, code sub foo { say "hello" } might compile to a QAST::Block like this:

Block (:cuid(1)) <wanted> :IN_DECL<sub> { say \"hello\" }
[...]
  Stmts <wanted> say \"hello\"
    Stmt <wanted final> say \"hello\"
      Want <wanted>
        Op (call &say) <wanted> :statement_id<?> say \"hello\"
          Want <wanted> hello
            WVal (Str)
            - Ss
            SVal (hello)
        - v
        Op (p6sink)
          Op (call &say) <wanted> :statement_id<?> say \"hello\"
            Want <wanted> hello
              WVal (Str)
              - Ss
              SVal (hello)
[...]

Each block demarcates a lexical scope boundary—this detail comes into play in Part II of this article, when we'll be going over a fix for a bug.

Others

A few more QAST nodes exist. They're out of scope of this article, but you may wish to read the documentation or, since some of them are not appear in those docs, go straight to the source.

Executing QAST Trees

Having a decent familarity with nqp ops (as well as Rakudo's nqp extensions) is helpful when working with QAST. A sharp eye would notice in QAST dumps that many QAST::Op nodes correspond to nqp::* op calls, where :op named argument specifies the name of the op.

When writing large QAST trees, it's handy to write them down using pure NQP ops first, and then translate the result into a tree of QAST node objects. Let's look at a simplified example:

nqp::if(
  nqp::isgt_n(nqp::rand_n(1e0), .5e0),
  nqp::say('Glass half full'),
  nqp::say('Glass half empty'));

We have NQP op, so we'll start with QAST::Op node, using 'if' as the value for :op. The op takes three positional arguments—the three ops used for the conditional, the truthy branch, and the falsy branch. Some of the ops also take float and string values, so we'll use QAST::NVal and QAST::SVal nodes for those. The result is:

QAST::Op.new(:op('if'),
  QAST::Op.new(:op('isgt_n'),
    QAST::Op.new(:op('rand_n'),
      QAST::NVal.new(:value(1e0))
    ),
    QAST::NVal.new(:value(.5e0))
  ),
  QAST::Op.new(:op('say'),
    QAST::SVal.new(:value('Glass half full'))
  ),
  QAST::Op.new(:op('say'),
    QAST::SVal.new(:value('Glass half empty'))
  )
)

I find it easier to track the tree's nesting by using parentheses only when necessary, preferring colon method call syntax whenever possible:

QAST::Op.new: :op<if>,
  QAST::Op.new(:op<isgt_n>,
    QAST::Op.new(:op<rand_n>,
      QAST::NVal.new: :value(1e0)),
    QAST::NVal.new: :value(.5e0)),
  QAST::Op.new(:op<say>,
    QAST::SVal.new: :value('Glass half full')),
  QAST::Op.new: :op<say>,
    QAST::SVal.new: :value('Glass half empty')

If a .new is followed by a colon, there aren't any more nodes on the same level. If .new is followed by an opening parentheses, there are more sister nodes yet to come.

Due to Rakudo's lengthy compilation, it can be handy to execute your QAST tree without having to stick it into src/Perl6/Actions.nqp or similar file first. To some extent, it's possible to do that with a regular Perl 6 program. We'll simply access Perl6::World object in $*W variable inside a BEGIN block, where it still exists, and call .compile_time_evaluate method, giving it an empty variable as the first positional (it expects a Match object for the tree) and our QAST tree as the second positional:

use QAST:from<NQP>;
BEGIN $*W.compile_time_evaluate: $,
    QAST::Op.new: :op<if>,
      QAST::Op.new(:op<isgt_n>,
        QAST::Op.new(:op<rand_n>,
          QAST::NVal.new: :value(1e0)),
        QAST::NVal.new: :value(.5e0)),
      QAST::Op.new(:op<say>,
        QAST::SVal.new: :value('Glass half full')),
      QAST::Op.new: :op<say>,
        QAST::SVal.new: :value('Glass half empty')

The one caveat with this method is we're using full-blown Perl 6 language, whereas in src/Perl6/Actions.nqp and related files, as .nqp extension suggests, we're using NQP language only. Keep an eye out for strange explosions; it's possible your QAST tree that explodes in Perl 6 will compile just fine in the land of pure NQP.

Annotating QAST Nodes

All QAST nodes support annotations that allow you to attach an arbitrary value to a node and then read that value elsewhere. To add an annotation, use .annotate method, which takes two positional arguments—a string containing name of the annotation and the value to attach to it—and returns that value. Recent versions of NQP also have .annotate_self method that works the same, except it returns the QAST node itself:

$qast.annotate_self('foo', 42).annotate: 'bar', 'meow';

Later, you can read that value using .ann method that takes the name of the annotation as the argument. If the annotation doesn't exist, NQPMu is returned instead:

note($qast.ann: 'foo'); # OUTPUT: «42␤»

You can also check for whether an annotation merely exists using .has_ann method that returns 1 (true) or 0 (false):

note($qast.has_ann: 'bar'); # OUTPUT: «1␤»

Or dump all of the annotations on the node (to prevent potential flood of output, most values will be dumped as simply a question mark):

note($qast.dump_annotations); # OUTPUT: « :bar<?> :foo<?>␤»);

Lasty, to clear all annotations on the node, simply call .clear_annotations method.

Mutating QAST Nodes

A handy thing to do with QAST node objects is to mutate them into something better. That's essentially all the static optimizer in src/Perl6/Optimizer.nqp does. Named arguments can be mutated by calling them as methods and providing a value. For example, $qast.op('callstatic') will change the value of :op from whatever it is to callstatic. Positional arguments can be altered by re-assignment to a positional index, as well as shift, push, unshift, pop operations performed either via method calls with those names or nqp:: ops. Some nodes also support nqp::elems calls on them, which is slightly faster than the generic pattern of +@($qast) that can be used on all nodes to find out the number of children a node contains.

As an exercise, let's write a small optimization: some operations, like $foo < $bar < $ber compile to nqp::chain ops. That is so even if we have only two children, e.g. $foo < $bar. In such cases, rewriting the op to be nqp::call has performance advantages: not only nqp::call on its own is a little bit faster than nqp::chain, the static optimizer knows how to do further optimizations on nqp::call ops.

Let's take a look at what both 2-child and 2+-child nqp::chain chains look like:

$ perl6 --target=ast -e '2 < 3 < 4; 2 < 3'

The first statement compiled to this (I removed QAST::Wants for clarity):

- QAST::Op(chain &infix:«<»)  :statement_id<?> <
  - QAST::Op(chain &infix:«<») <wanted> <
    - QAST::IVal(2)
    - QAST::IVal(3)
  - QAST::IVal(4)

And the second one to:

- QAST::Op(chain &infix:«<»)  :statement_id<?> <
  - QAST::IVal(2)
  - QAST::IVal(3)

Thus, to target our optimization correctly, we need to ensure neither child of our chain op is a chain op. In addition, we need to ensure that the op we're optimizing is not itself a child of another chain op.

Raking the code of the optimizer, we can spot that chain depth is already tracked via $!chain_depth attribute, so we merely need to ensure we're at the first link of the chain. The code then becomes:

$qast.op: 'call'
  if nqp::istype($qast, QAST::Op)
  && $qast.op eq 'chain'
  && $!chain_depth == 1
  && ! (nqp::istype($qast[0], QAST::Op) && $qast[0].op eq 'chain')
  && ! (nqp::istype($qast[1], QAST::Op) && $qast[1].op eq 'chain');

Once we find a chain QAST::Op, we index into it and use nqp::istype to check the type of kid nodes, and if those happen to be QAST::Op nodes, we ensure the :op parameter is not a chain op. If all of the conditions are met, we simply call .op method on our node with value 'call' to convert it into a call op.

We then stick our optimization early enough into .visit_op method of the optimizer and its later portions will further optimize our call.

A fairly easy and straightforward optimization that can bring a lot of benefit.

PART II: A Thunk in The Trunk


Note: it took me three evenings to debug and fix the following tickets. To learn the solution I tried many dead ends that I won't be covering, to keep you from getting bored, and instead will instantly jump to conclusions. The point I'm making is that fixing core bugs is a lot easier than may seem from reading this article—you just need to be willing to spend some time on them.


Now that we have some familiarity with QAST, let's try to fix a bug that existed in Rakudo v2018.01.30.ga.5.c.2398.cc and earlier. The ticket in question is R#1212, that shows the following problem:

$ perl6 -e 'say <a b c>[$_ xx 2] with 1'

Use of Nil in string context
  in block  at -e line 1
Unable to call postcircumfix [ (Any) ] with a type object
Indexing requires a defined object
  in block <unit> at -e line 1

It looks like the $_ topical variable inside the indexing brackets fails to get the value from with statement modifier and ends up being undefined. Sounds like a challenge!

It's A Hive!

Both with and xx operator create thunks (thunks are like blocks of code, without having explicit blocks in the code; this, for example, lets rand xx 10 to produce 10 different random values; rand is thunked and the thunk is called for each iteration). This reminded me of some other tickets I've seen, so I went to fail.rakudo.party and looked through open tickets for anything that mentioned thunking or wrong scoping.

I ended up with a list of 7 tickets, and with the help of dogbert++ later increased the number to 9, which with the original Issue gives us a total of 10 different manifestations of a bug. The other tickets are RT#130575, RT#132337, RT#131548, RT#132211, RT#126569, RT#128054, RT#126413, RT#126984, and RT#132172. Quite a bug hive!

Test It Out

Our starting point is to cover each manifestation of the bug with a test. Make all the test pass and you know you've fixed the bug, plus you already have something to place into roast, to cover the tickets. My tests ended up looking like this, where I've used gather/take duo to capture what the tickets' code printed to the screen:

use Test;
plan 1;
subtest 'thunking closure scoping' => {
    plan 10;

    # https://github.com/rakudo/rakudo/issues/1212
    is-deeply <a b c>[$_ xx 2], <b b>.Seq, 'xx inside `with`' with 1;

    # RT #130575
    is-deeply gather {
        sub itcavuc ($c) { try {take $c} andthen 42 };
        itcavuc $_ for 2, 4, 6;
    }, (2, 4, 6).Seq, 'try with block and andthen';

    # RT #132337
    is-deeply gather {
        sub foo ($str) { { take $str }() orelse Nil }
        foo "cc"; foo "dd";
    }, <cc dd>.Seq, 'block in a sub with orelse';

    # RT #131548
    is-deeply gather for ^7 {
        my $x = 1;
        1 andthen $x.take andthen $x = 2 andthen $x = 3 andthen $x = 4;
    }, 1 xx 7, 'loop + lexical variable plus chain of andthens';

    # RT #132211
    is-deeply gather for <a b c> { $^v.uc andthen $v.take orelse .say },
        <a b c>.Seq, 'loop + andthen + orelse';

    # RT #126569
    is-deeply gather { (.take xx 10) given 42 }, 42 xx 10,
        'parentheses + xx + given';

    # RT #128054
    is-deeply gather { take ("{$_}") for <aa bb> }, <aa bb>.Seq,
        'postfix for + take + block in a string';

    # RT #126413
    is-deeply gather { take (* + $_)(32) given 10 }, 42.Seq,
        'given + whatever code closure execution';

    # RT #126984
    is-deeply gather {
        sub foo($x) { (* ~ $x)($_).take given $x }; foo(1); foo(2)
    }, ("11", "22").Seq, 'sub + given + whatevercode closure execution';

    # RT #132172
    is-deeply gather { sub {
        my $ver =.lines.uc with "totally-not-there".IO.open
            orelse "meow {$_ ~~ Failure}".take and return 42;
    }() }, 'meow True'.Seq, 'sub with `with` + orelse + block interpolation';
}

When I brought up the first bug in our dev chatroom, jnthn++ pointed out that such bugs are often due to mis-scoped blocks, as p6capturelex op that's involved needs to be called in the immediate outer of the block it references.

Looking through the tickets, I also spotted skids++'s note that changing a conditional for statement_id in block migrator predicate fixed one of the tickets. This wasn't the full story of the fix, as the many still-failing tests showed, but it was a good start.

What's Your Problem?

In order to find the best solution for a bug, it's important to understand what exactly is the problem. We know mis-scoped blocks are the cause of the bug, so lets grab each of our tests, dump their QAST (--target=ast), and write out how mis-scoped the blocks are.

To make it easier to match the QAST::Blocks with the QAST::WVals referencing them, I made a modification to QAST::Node.dump to include CUID numbers and statement_id annotations in the dumps.

Going through mosts of the buggy code chunks, we have these results:

is-deeply <a b c>[$_ xx 2], <b b>.Seq, 'xx inside `with`' with 1;
# QAST for `xx` is ALONGSIDE RHS `andthen` thunk, but needs to be INSIDE

is-deeply gather {
    sub itcavuc ($c) { try {take $c} andthen 42 };
    itcavuc $_ for 2, 4, 6;
}, (2, 4, 6).Seq, 'try with block and andthen';
# QAST for try block is INSIDE RHS `andthen` thunk, but needs to be ALONGSIDE

is-deeply gather {
    sub foo ($str) { { take $str }() orelse Nil }
    foo "cc"; foo "dd";
}, <cc dd>.Seq, 'block in a sub with orelse';
# QAST for block is INSIDE RHS `andthen` thunk, but needs to be ALONGSIDE

is-deeply gather for ^7 {
    my $x = 1;
    1 andthen $x.take andthen $x = 2 andthen $x = 3 andthen $x = 4;
}, 1 xx 7, 'loop + lexical variable plus chain of andthens';
# each andthen thunk is nested inside the previous one, but all need to be
# ALONGSIDE each other

is-deeply gather for <a b c> { $^v.uc andthen $v.take orelse .say },
    <a b c>.Seq, 'loop + andthen + orelse';
# andthen's block is INSIDE orelse's but needs to be ALONGSIDE each other

is-deeply gather { (.take xx 10) given 42 }, 42 xx 10,
    'parentheses + xx + given';
# .take thunk is ALONGSIDE given's thunk, but needs to be INSIDE of it

is-deeply gather { take ("{$_}") for <aa bb> }, <aa bb>.Seq,
    'postfix for + take + block in a string';
# the $_ is ALONGSIDE `for`'s thunk, but needs to be INSIDE

is-deeply gather { take (* + $_)(32) given 10 }, 42.Seq,
    'given + whatever code closure execution';
# the WhateverCode ain't got no statement_id and is ALONGSIDE given
# block but needs to be INSIDE of it

So far, we can see a couple of patterns:

  • xx and WhateverCode thunks don't get migrated, even though they should
  • andthen thunks get migrated, even though they shouldn't

The first one is fairly straightforward. Looking at the QAST dump, we see xx thunk has a higher statement_id than the block it was meant to be in. This is what skids++'s hint addresses, so we'll change the statement_id conditional from == to >= to look for statement IDs higher than our current one as well, since those would be from any substatements, such as our xx inside the positional indexing operator:

($b.ann('statement_id') // -1) >= $migrate_stmt_id

The cause is very similar for the WhateverCode case, as it's missing statement_id annotation altogether, so we'll just annotate the generated QAST::Block with the statement ID. Some basic detective work gives us the location where that node is created: we search src/Perl6/Actions.nqp for word "whatever" until we spot whatever_curry method and in its guts we find the QAST::Block we want. For the statement ID, we'll grep the source for statement_id:

$ grep -FIRn 'statement_id' src/Perl6/
src/Perl6/Actions.nqp:1497:            $past.annotate('statement_id', $id);
src/Perl6/Actions.nqp:2326:                $_.annotate('statement_id', $*STATEMENT_ID);
src/Perl6/Actions.nqp:2488:                -> $b { ($b.ann('statement_id') // -1) == $stmt.ann('statement_id') });
src/Perl6/Actions.nqp:9235:                && ($b.ann('statement_id') // -1) >= $migrate_stmt_id
src/Perl6/Actions.nqp:9616:            ).annotate_self: 'statement_id', $*STATEMENT_ID;
src/Perl6/World.nqp:256:            $pad.annotate('statement_id', $*STATEMENT_ID);

From the output, we can see the ID is stored in $*STATEMENT_ID dynamic variable, so we'll use that for our annotation on the WhateverCode's QAST::Block:

my $block := QAST::Block.new(
    QAST::Stmts.new(), $past
).annotate_self: 'statement_id', $*STATEMENT_ID;

Let's compile and run our bug tests. If you're using Z-Script, you can re-compile Rakudo by running z command with no arguments:

$ z
[...]
$ ./perl6 bug-tests.t
1..1
    1..10
    ok 1 - xx inside `with`
    not ok 2 - try with block and andthen
    # Failed test 'try with block and andthen'
    # at bug-tests.t line 10
    # expected: $(2, 4, 6)
    #      got: $(2, 2, 4)
    not ok 3 - block in a sub with orelse
    # Failed test 'block in a sub with orelse'
    # at bug-tests.t line 16
    # expected: $("cc", "dd")
    #      got: $("cc", "cc")
    not ok 4 - loop + lexical variable plus chain of andthens
    # Failed test 'loop + lexical variable plus chain of andthens'
    # at bug-tests.t line 22
    # expected: $(1, 1, 1, 1, 1, 1, 1)
    #      got: $(1, 4, 3, 3, 3, 3, 3)
    not ok 5 - loop + andthen + orelse
    # Failed test 'loop + andthen + orelse'
    # at bug-tests.t line 28
    # expected: $("a", "b", "c")
    #      got: $("a", "a", "a")
    ok 6 - parentheses + xx + given
    ok 7 - postfix for + take + block in a string
    ok 8 - given + whatever code closure execution
    ok 9 - sub + given + whatevercode closure execution
    not ok 10 - sub with `with` + orelse + block interpolation
    # Failed test 'sub with `with` + orelse + block interpolation'
    # at bug-tests.t line 49
    # expected: $("meow True",)
    #      got: $("meow False",)
    # Looks like you failed 5 tests of 10
not ok 1 - thunking closure scoping
# Failed test 'thunking closure scoping'
# at bug-tests.t line 3
# Looks like you failed 1 test of 1

Looks like that fixed half of the issues already. That's pretty good!

Extra Debugging

Let's now look at the remaining failures and figure out why block migration isn't how we want it in those cases. To assists with our sleuthing efforts, let's make a couple of changes to produce more debugging info.

First, let's modify QAST::Node.dump method in NQP's repo to dump the value of in_stmt_mod annotation, by telling it to dump out the value verbatim if the key is in_stmt_mod:

if $k eq 'IN_DECL' || $k eq 'BY' || $k eq 'statement_id'
|| $k eq 'in_stmt_mod' {
    ...

Next, let's go to sub migrate_blocks in Actions.nqp and add a bunch of debug dumps inside most of the conditionals. This will let us track when a block is compared and to see whether migration occurs. As mentioned earlier, I like to key my dumps on env vars using nqp::getenvhash op, so after modifications my migrate_blocks routine looks like this; note the use of .dump method to dump QAST node guts (tip: .dump method also exists on Perl6::Grammar's match objects!):

sub migrate_blocks($from, $to, $predicate?) {
    my @decls := @($from[0]);
    my int $n := nqp::elems(@decls);
    my int $i := 0;
    while $i < $n {
        my $decl := @decls[$i];
        if nqp::istype($decl, QAST::Block) {
            nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: -----------------');
            nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: trying to grab ' ~ $decl.dump);
            nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: to move to ' ~ $to.dump);
            if !$predicate || $predicate($decl) {
                nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: grabbed');
                $to[0].push($decl);
                @decls[$i] := QAST::Op.new( :op('null') );
            }
            nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: -----------------');
        }
        elsif (nqp::istype($decl, QAST::Stmt) || nqp::istype($decl, QAST::Stmts)) &&
              nqp::istype($decl[0], QAST::Block) {
            nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: -----------------');
            nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: trying to grab ' ~ $decl[0].dump);
            nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: to move to ' ~ $to.dump);
            if !$predicate || $predicate($decl[0]) {
                nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: grabbed');
                $to[0].push($decl[0]);
                $decl[0] := QAST::Op.new( :op('null') );
            }
            nqp::atkey(nqp::getenvhash(),'ZZ') && nqp::say('ZZ1: -----------------');
        }
        elsif nqp::istype($decl, QAST::Var) && $predicate && $predicate($decl) {
            $to[0].push($decl);
            @decls[$i] := QAST::Op.new( :op('null') );
        }
        $i++;
    }
}

After making the changes, we need to recompile both NQP and Rakudo. With Z-Script, we can just run z n to do that:

$ z n
[...]

Now, we'll grab the first failing code and take a look at its QAST. I'm going to use the CoreHackers::Q tool:

$ q a ./perl6 -e '
    sub itcavuc ($c) { try {say $c} andthen 42 };
    itcavuc $_ for 2, 4, 6;'
$ firefox out.html

We can see that our buggy say call lives in QAST::Block with cuid 1, which gets called from within QAST::Block with cuid 3, but is actually located within QAST::Block with cuid 2:

- QAST::Block(:cuid(3)) <wanted> :statement_id<1>
        :count<?> :signatured<?> :IN_DECL<sub>
        :in_stmt_mod<0> :code_object<?>
        :outer<?> { try {say $c} andthen 42 }
    [...]
        - QAST::Block(:cuid(2)) <wanted> :statement_id<2>
                :count<?> :in_stmt_mod<0> :code_object<?> :outer<?>
            [...]
            - QAST::Block(:cuid(1)) <wanted> :statement_id<2>
                    :IN_DECL<> :in_stmt_mod<0> :code_object<?>
                    :also_uses<?> :outer<?> {say $c}
                [...]
                - QAST::Op(call &say)  say $c
    [...]
    - QAST::Op(p6typecheckrv)
        [...]
        - QAST::WVal(Block :cuid(1))

Looks like cuid 2 block steals our cuid 1 block. Let's enable the debug env var and look at the dumps to see why exactly:

$ ZZ=1 ./perl6 -e '
    sub itcavuc ($c) { try {say $c} andthen 42 };
    itcavuc $_ for 2, 4, 6;'

ZZ1: -----------------
ZZ1: trying to grab - QAST::Block(:cuid(1)) <wanted>
    :statement_id<2> :IN_DECL<> :in_stmt_mod<0> :code_object<?>
    :also_uses<?> :outer<?> {say $c}
[...]

ZZ1: to move to - QAST::Block  :statement_id<2>
    :in_stmt_mod<0> :outer<?>

ZZ1: grabbed
ZZ1: -----------------

We can see the theft in progress. Let's take a look at our migration predicate again:

! $b.ann('in_stmt_mod')
&& ($b.ann('statement_id') // -1) >= $migrate_stmt_id

In the dump we can see in_stmt_mod is false. Were it set to a true value, the block would not be migrated—exactly what we're trying to accomplish. Let's investigate the in_stmt_mod annotation, to see when it gets set:

$ G 'in_stmt_mod' src/Perl6/Actions.nqp
2327:                $_.annotate('in_stmt_mod', $*IN_STMT_MOD);
9206:                !$b.ann('in_stmt_mod') && ($b.ann('statement_id') // -1) >= $migrate_stmt_id

$ G '$*IN_STMT_MOD' src/Perl6/Grammar.nqp
1200:        :my $*IN_STMT_MOD := 0;                    # are we inside a statement modifier?
1328:        :my $*IN_STMT_MOD := 0;
1338:        | <EXPR> :dba('statement end') { $*IN_STMT_MOD := 1 }

Looks like it's a marker for statement modifier conditions. Statement modifiers have a lot of relevance to our andthen thunks, because $foo with $bar gets turned into $bar andthen $foo during parsing. Since, as we can see in src/Perl6/Grammar.nqp, in_stmt_mod annotation gets set for with statement modifiers, we can hypothesize that if we turn our buggy andthen into a with, the bug will disappear:

$ ./perl6 -e 'sub itcavuc ($c) { 42 with try {say $c} };
    itcavuc $_ for 2, 4, 6;'
2
4
6

And indeed it does! Then, we have a way forward: we need to set in_stmt_mod annotation to a truthy value for just the first argument of andthen (and its relatives notandthen and orelse).

Glancing at the Grammar it doesn't look like it immediatelly offers a similar opportunity for how in_stmt_mod is set for the with statement modifier. Let's approach it differently. Since we care about this when thunks are created, let's watch for andthen QAST inside sub thunkity_thunk in Actions, then descend into its first kid and add the in_stmt_mod annotation by cheating and using the past_block annotation on QAST::WVal with the thunk that contains the reference to QAST::Block we wish to annotate. The code will look something like this:

sub mark_blocks_as_andnotelse_first_arg($ast) {
    if $ast && nqp::can($ast, 'ann') && $ast.ann('past_block') {
        $ast.ann('past_block').annotate: 'in_stmt_mod', 1;
    }
    elsif nqp::istype($ast, QAST::Op)
    || nqp::istype($ast, QAST::Stmt)
    || nqp::istype($ast, QAST::Stmts) {
        mark_blocks_as_andnotelse_first_arg($_) for @($ast)
    }
}

sub thunkity_thunk($/,$thunky,$past,@clause) {
    [...]

    my $andnotelse_thunk := nqp::istype($past, QAST::Op)
      && $past.op eq 'call'
      && ( $past.name eq '&infix:<andthen>'
        || $past.name eq '&infix:<notandthen>'
        || $past.name eq '&infix:<orelse>');

    while $i < $e {
        my $ast := @clause[$i];
        $ast := $ast.ast if nqp::can($ast,'ast');
        mark_blocks_as_andnotelse_first_arg($ast)
            if $andnotelse_thunk && $i == 0;
        [...]

First, we rake $past argument given to thunkity_thunk for a QAST::Op for nqp::call that calls one of our ops—when we found one, we set a variable to a truthy value. Then, in the loop, when we're iterating over the first child node ($i == 0) of these ops, we'll pass its QAST to our newly minted mark_blocks_as_andnotelse_first_arg routine, inside of which we recurse over any ops that can have kids and mark anything that has past_block annotation with truthy in_stmt_mod annotation.

Let's compile our concoction and give the tests another run. Once again, I'm using Z-Script to recompile Rakudo:

$ z
[...]
$ ./perl6 bug-tests.t
1..1
    1..10
    ok 1 - xx inside `with`
    ok 2 - try with block and andthen
    ok 3 - block in a sub with orelse
    not ok 4 - loop + lexical variable plus chain of andthens
    # Failed test 'loop + lexical variable plus chain of andthens'
    # at bug-tests.t line 23
    # expected: $(1, 1, 1, 1, 1, 1, 1)
    #      got: $(1, 4, 3, 3, 3, 3, 3)
    ok 5 - loop + andthen + orelse
    ok 6 - parentheses + xx + given
    ok 7 - postfix for + take + block in a string
    ok 8 - given + whatever code closure execution
    ok 9 - sub + given + whatevercode closure execution
    not ok 10 - sub with `with` + orelse + block interpolation
    # Failed test 'sub with `with` + orelse + block interpolation'
    # at bug-tests.t line 50
    # expected: $("meow True",)
    #      got: $("meow False",)
    # Looks like you failed 2 tests of 10
not ok 1 - thunking closure scoping
# Failed test 'thunking closure scoping'
# at bug-tests.t line 4
# Looks like you failed 1 test of 1

We got closer to the goal, with 80% of the tests now passing! In the first remaining failure, we already know from our original examination that chained andthen thunks get nested when they should not—we haven't done anything to fix that yet. Let's take care of that first.

Playing Chinese Food Mind Games

Looking back out at the fixes we applied already, we have a marker for when we're working with andthen or its sister ops: the $andnotelse_thunk variable. It seems fairly straight-forward that if we don't want the thunks of these ops to migrate, we just need to annotate them appropriately and stick the check for that annotation into the migration predicate.

In Grammar.nqp, we can see our ops are configured with the .b thunky, so we'll locate that branch in sub thunkity_thunk and pass $andnotelse_thunk variable as a new named param to the make_topic_block_ref block maker:

...
elsif $type eq 'b' {  # thunk and topicalize to a block
    unless $ast.ann('bare_block') || $ast.ann('past_block') {
        $ast := block_closure(make_topic_block_ref(@clause[$i],
          $ast, :$andnotelse_thunk,
          migrate_stmt_id => $*STATEMENT_ID));
    }
    $past.push($ast);
}
...

The block maker will shove it into the migration predicate, so our block maker code becomes this:

 sub make_topic_block_ref(
    $/, $past, :$copy, :$andnotelse_thunk, :$migrate_stmt_id,
 ) {
    my $block := $*W.push_lexpad($/);

    # Add annotation to thunks of our ops:
    $block.annotate: 'andnotelse_thunk', 1 if $andnotelse_thunk;

    $block[0].push
        QAST::Var.new( :name('$_'), :scope('lexical'), :decl('var') );
    $block.push($past);
    $*W.pop_lexpad();
    if nqp::defined($migrate_stmt_id) {
        migrate_blocks($*W.cur_lexpad(), $block, -> $b {
               ! $b.ann('in_stmt_mod')

            # Don't migrate thunks of our ops:
            && ! $b.ann('andnotelse_thunk')

            && ($b.ann('statement_id') // -1) >= $migrate_stmt_id
        });
    }
    ...

One more compilation cycle and test run:

$ z
[...]
$ ./perl6 bug-tests.t
1..1
    1..10
    ok 1 - xx inside `with`
    ok 2 - try with block and andthen
    ok 3 - block in a sub with orelse
    ok 4 - loop + lexical variable plus chain of andthens
    ok 5 - loop + andthen + orelse
    ok 6 - parentheses + xx + given
    ok 7 - postfix for + take + block in a string
    ok 8 - given + whatever code closure execution
    ok 9 - sub + given + whatevercode closure execution
    not ok 10 - sub with `with` + orelse + block interpolation
    # Failed test 'sub with `with` + orelse + block interpolation'
    # at bug-tests.t line 50
    # expected: $("meow True",)
    #      got: $("meow False",)
    # Looks like you failed 1 test of 10
not ok 1 - thunking closure scoping
# Failed test 'thunking closure scoping'
# at bug-tests.t line 4
# Looks like you failed 1 test of 1

So close! Just a single test failure remains. Let's give it a close look.

Within and Without

Let's repeat our procedure of dumping QASTs as well as enabing the ZZ env var and looking at what's causing the thunk mis-migration. I'm going to run a slightly simplified version of the failing test, to keep the cruft out of QAST dumps. If you're following along, when looking at full QAST dump keep in mind what I mentioned earlier: with gets rewritten into andthen op call during parsing.

$ q a ./perl6 -e '.uc with +"a" orelse "meow {$_ ~~ Failure}".say and 42'
$ firefox out.html

- QAST::Block(:cuid(4)) :in_stmt_mod<0>
    [...]
    - QAST::Block(:cuid(1))  :statement_id<1> :in_stmt_mod<1>
      [...]
      - QAST::Op(chain &infix:<~~>) <wanted> :statement_id<2> ~~
        - QAST::Var(lexical $_) <wanted> $_
        - QAST::WVal(Failure) <wanted> Failure
    - QAST::Block(:cuid(2)) :statement_id<1>
        :in_stmt_mod<1> :andnotelse_thunk<1>
      [...]
      - QAST::Op(callmethod Stringy) <wanted>
        - QAST::Op(call) <wanted> {$_ ~~ Failure}
          - QAST::Op(p6capturelex) <wanted> :code_object<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Block)

$ ZZ=1 ./perl6 -e '.uc with +"a" orelse "meow {$_ ~~ Failure}".say and 42'
[...]
ZZ1: -----------------
ZZ1: trying to grab - QAST::Block(:cuid(1))
  :statement_id<1> :in_stmt_mod<1>
  [...]
ZZ1: to move to - QAST::Block
  :statement_id<1> :andnotelse_thunk<1> :in_stmt_mod<1>
  [...]
ZZ1: -----------------

Although QAST::WVal lacks .past_block annotation and so doesn't show the block's CUID in the dump, just by reading the code dumped around that QAST, we can see that the CUID-less block is our QAST::Block :cuid(1), whose immediate outer is QAST::Block :cuid(4), yet it's called from within QAST::Block :cuid(2). It's supposed to get migrated, but that migration never happens, as we can see when we use the ZZ env var to enable our debug dumps in the sub migrate_blocks.

We can see why. Here's our current migration predicate (where $b is the examined block, which in our case is QAST::Block :cuid(1)):

   ! $b.ann('in_stmt_mod')
&& ! $b.ann('andnotelse_thunk')
&& ($b.ann('statement_id') // -1) >= $migrate_stmt_id

The very first condition prevents our migration, as our block has truthy in_stmt_mod annotation, because it's part of the with's condition. At the same time, it does need to be migrated because it's part of the andthen thunk that's inside the statement modifier!

Since we already have $andnotelse_thunk variable in the vicinity of the migration predicate we can use it to tell us whether we're migrating for the benefit of our andthen thunk and not the statement modifier. However, recall that we've used the very same in_stmt_mod annotation to mark the first argument of andthen and its brother ops. We need to alter that first.

And so, the sub mark_blocks_as_andnotelse_first_arg we added earlier becomes:

sub mark_blocks_as_andnotelse_first_arg($ast) {
    if $ast && nqp::can($ast, 'ann') && $ast.ann('past_block') {
        $ast.ann('past_block').annotate: 'in_stmt_mod_andnotelse', 1;
    }
    ...

And then we tweak the migration predicate to watch for this altered annotation and to consider the value of $andnotelse_thunk variable:

migrate_blocks($*W.cur_lexpad(), $block, -> $b {
    (    (! $b.ann('in_stmt_mod_andnotelse') &&   $andnotelse_thunk)
      || (! $b.ann('in_stmt_mod')            && ! $andnotelse_thunk)
    )
    && ($b.ann('statement_id') // -1) >= $migrate_stmt_id
    && ! $b.has_ann('andnotelse_thunk')
});

Thus, we migrate all the blocks with statement_id equal to or higher than ours and are all of the following:

  • Not thunks of actual andthen, notandthen, or orelse
  • Not thunks inside a statement modifier, unless they're inside thunks of andthen or related ops
  • If we're considering migrating them inside one of the andthen's thunks, then also not part of the first argument to andthen (or related ops), .

That's a fancy-pants predicate. Let's compile and see if it gets the job done:

$ z
[...]
$ ./perl6 bug-tests.t
  1..1
    1..10
    ok 1 - xx inside `with`
    ok 2 - try with block and andthen
    ok 3 - block in a sub with orelse
    ok 4 - loop + lexical variable plus chain of andthens
    ok 5 - loop + andthen + orelse
    ok 6 - parentheses + xx + given
    ok 7 - postfix for + take + block in a string
    ok 8 - given + whatever code closure execution
    ok 9 - sub + given + whatevercode closure execution
    ok 10 - sub with `with` + orelse + block interpolation
ok 1 - thunking closure scoping

Success! Now, let's remove all of the debug statements we added. Then, recompile and run make stresstest, to ensure we did not break anything else. With Z-Script, we can do all that by just running z ss:

$ z ss
[...]
All tests successful.
Files=1287, Tests=153127, 159 wallclock secs (21.40 usr  3.27 sys + 3418.56 cusr 179.32 csys = 3622.55 CPU)
Result: PASS

All is green. We can now commit our fix to Rakudo's repo, then commit our tests to the roast repo, and all that remains is closing those 10 tickets we fixed!

Job well done.

Conclusion

Today, we learned quite a bit about QAST: the Abstract Syntax Trees Perl 6 code compiles to in the Rakudo compiler. We examined the common types of QAST and how to create, annotate, mutate, execute, and dump them for examination.

In the second part of the article, we applied our new knowledge to fix a hive of mis-scoped thunking bugs that plagued various Perl 6 constructs. We introspected the generated QAST nodes to specially annotate them, and then used those annotations to reconfigure migration predicate, so that it migrates the blocks correctly.

Hopefully, this knowledge inspires you to fix the many other bugs we have on the RT tracker as well as our GitHub Issue tracker

-Ofun