Random thoughts: July 2009

Tuesday, July 28, 2009

Debugging perl programs/internals in gdb

When it comes to perl internals, print based debugging doesn't work that well. Compilation and installation are too slow and you can not place a print and quickly see output. At some point gbd should be used. In perl world we have Devel::Peek's Dump function to look behind curtain. In C world there is sv_dump.

    # threaded perl:
    (gdb) call Perl_sv_dump(my_perl, variable)
    # not threaded perl:
    (gdb) call Perl_sv_dump(variable)

Perl_ prefix is some magic, I don't care much why, but in most cases you need prefix things.

Using breakpoints is a must. Use pp_* functions to break, for example Perl_pp_entersub. Here is simple session where we stop before entering a sub and using dumper to figure out sub's name:

    > gdb ./perl
    GNU gdb 6.3.50-20050815 (Apple version gdb-962) (Sat Jul 26 08:14:40 UTC 2008)

    (gdb) break Perl_pp_entersub
    Breakpoint 1 at 0xe512c: file pp_hot.c, line 2663.

    (gdb) run -e 'sub foo { return $x } foo()'

    Breakpoint 1, Perl_pp_entersub (my_perl=0x800000) at pp_hot.c:2663
    2663        dVAR; dSP; dPOPss;
    (gdb) n
    2662    {
    (gdb) 
    2663        dVAR; dSP; dPOPss;
    (gdb) 
    2668        const bool hasargs = (PL_op->op_flags & OPf_STACKED) != 0;
    (gdb) 
    2670        if (!sv)

    (gdb) call Perl_sv_dump(my_perl, sv)
    SV = PVGV(0x8103fc) at 0x813ef0
      REFCNT = 2
      FLAGS = (MULTI,IN_PAD)
      NAME = "foo"
      NAMELEN = 3
      GvSTASH = 0x8038f0    "main"
      GP = 0x3078f0
        SV = 0x0
        REFCNT = 1
        IO = 0x0
        FORM = 0x0  
        AV = 0x0
        HV = 0x0
        CV = 0x813eb0
        CVGEN = 0x0
        LINE = 1
        FILE = "-e"
        FLAGS = 0xa
        EGV = 0x813ef0      "foo"

Quite simple, but when you start investigating internals it's very helpful.

Monday, July 27, 2009

Proper double linked list

Double linked list is well known structure. Each element refereces prev and next element in the chain:

    use strict;
    use warnings;

    package List;

    sub new {
        my $proto = shift;
        my $self = bless {@_}, ref($proto) || $proto;
    }

    sub prev {
        my $self = shift;
        if ( @_ ) {
            my $prev = $self->{'prev'} = shift;
            $prev->{'next'} = $self;
        }
        return $self->{'prev'};
    }

    sub next {
        my $self = shift;
        if ( @_ ) {
            my $next = $self->{'next'} = shift;
            $next->{'prev'} = $self;
        }
        return $self->{'next'};
    }

    package main;

    my $head = List->new(v=>1);
    $head->next( List->new(v=>3)->prev( List->new(v=>2) ) );

Clean and simple. If you experienced in perl you should know that such thing leaks memory. Each element has at least one reference all the time from neighbor, so perl's garbage collector never sees that structure can be collected. It's called refernce cycle, google for it. As well, you may know that weaken from Scalar::Util module can help you solve this:

    use Scalar::Util qw(weaken);

    sub prev {
        my $self = shift;
        if ( @_ ) {
            my $prev = $self->{'prev'} = shift;
            $prev->{'next'} = $self;
            weaken $self->{'prev'};
        }
        return $self->{'prev'};
    }

    # similar thing for next

So we weak one group of references, in our example prev links. It's a win and loose. Yes, perl frees elements before exit, no more memory leaks, it's our win. But, there is always but, you can not leave point to the first element out of the scope or otherwise some elements can be freed without your wish. For a while I thought that it's impossible to solve this problem, but recent hacking, reading on perl internalsand a question on a mailing list ding a bell. After a short discussion on #p5p irc channel with Matt Trout, solution has been found. Actually there it's all been there and Matt even has module started that may help make it all easier, but here we're going to look at guts.

DESTROY method called on destroy we all know that, but a few people know that we can prevent actual destroy by incrementing reference counter on the object. One woe - you shouldn't do it during global destruction, but there is module to check when we're called:

    use Devel::GlobalDestruction;

    sub DESTROY {
        return if in_global_destruction;
        do_something_a_little_tricky();
    }

What we can do with this? We have two links: from the current element to next and from that next back to the current. One of them is weak and on destroy we can swap them if the element that is going to be destroyed is referenced by a weak link. It's easier in code than in my words:

    sub DESTROY {
        return if in_global_destruction();

        my $self = shift;
        if ( $self->{'next'} && isweak $self->{'next'}{'prev'} ) {
            $self->{'next'}{'prev'} = $self;
            weaken $self->{'next'};
        }
        if ( $self->{'prev'} && isweak $self->{'prev'}{'next'} ) {
            $self->{'prev'}{'next'} = $self;
            weaken $self->{'prev'};
        }
    }

That's it, now you can forget about heads of the Lists, pass around any element you like. isweak is also part of Scalar::Util module. Good luck with cool lists and other linked structures. Matt is looking for help with his module. You always can find user mst on irc.perl.org to chat about this.

Thursday, July 02, 2009

Nice article on perl internals nothingmuch wrote

If you interested in perl5's internals even for a little then will find this article useful. It doesn't describe quite well described SVs, AVs, HVs and other representations of perl structures, but introduces on examples execution of a perl code.

I know a few things about internals, but author's point of view allowed me to understand better RETURN and PUSHBACK macros, stack pointer, op_tree.

It's one tiny step towards understanding how cool things, like Devel::NYTProf, work.

Enjoy reading!