Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Alias (5735)

Alias
  (email not shown publicly)
http://ali.as/

Journal of Alias (5735)

Tuesday January 29, 2008
12:40 AM

Why Perl 6 scares the hell out of me

[ #35508 ]

(I wanted to title this journal entry "The relationship between a language and its toolchain, and why Perl 6 scares the hell out of me" but it didn't fit)

For the record, this is not an anti-Perl6 rant. It is a warning.

For language designers, one of the foundational concepts is how simple the grammar is going to be. Being a human interface device, languages go beyond math (where there is generally a single truth) and engineering (where there's generally a limited set of widely known best practices) and can get into the realm of personal preference and fashion with few clear guides for the "best" way to do something.

The reason this dimension is so important is that the language imposes limitations on the types of tools it is possible to write. Some types of tools simply are not possible to write at all if the grammar has certain features.

I hypothesize that we can break down the human "language experience" for a general-purpose language as being a combination of the language syntax itself, the tools for it, and the size/quality of available libraries. These are by no means all the success factors, but are the ones that in my opinion make up the influence on the individual user.

(I explicitly ignore languages with features like "proof-carrying" that are essential for specific domains like cryptography and make it worth the pain of learning Haskell or proof-carrying Ada)

According to my hypothesis, the danger here then is to add excessive expressiveness to a language with the intention of it being pro-user, at the cost of fatally crippling your toolchain and hurting the user more than the benefits gained at the language level.

The second risk here is what I'll call the "Personal Language Anti-Pattern". I first heard examples of this from some old-school Lisp hackers. The typical way of describing it goes something like this...

"I can write something in 4 days in Lisp that takes most people 20 in some other language. I just spend 3 days modifying Lisp to solve that type of problem, then 1 day solving the problem".

The anti-pattern here is that you end up going way beyond TMTOWTDI. You don't get two, or three, or five ways to do it. You end up with a different language for every single person and every single project.

Forget about maintaining projects written in crufty Perl. Imagine maintaining code where every single project is written in its own mini-language (although they all look a bit lispish).

In Perl 5 we achieved this depressing state with source filters, but mostly managed to keep it under control with culture. "Source filters == bad" is our cultural norm.

The risk with Perl 6 here is that the ability to safely modify the language is going to be taken as permission to modify it early and often.

I'm not talking obvious positives like "use physics;" here, I'm talking My::Project::Lang here, in the same vein as the "God Object" anti-pattern... Where I don't want to end up is "BioPerl... the Language!", where DNA sequences are a known literal (because it starts with a capitalised GTCA).

my $sequence = GATTACA;

This seems like a seductive option, but the cost is you throw away most of your developer tools.

While I had originally thought that "easier to parse" was a part of Perl 6, this has apparently been removed or was never what I thought it was.

What Perl 6 actually is is easier to IMPLEMENT.

That is to say, we won't be in the worst possible situation of having a dynamic grammar that can't be reimplemented at all, because there is no grammar beyond "what the implementation does".

So we will have A grammar, but now it is a grammar that is EXPLICITLY changable.

So consider this step 1 to toolchain bliss.

It's still removes the possibility to implement most useful tools.

So, as I see it anyways, here is the rest of the steps.

Step 2 - Deterministic

The key to the really awesome tools is that you need to have a way to READ the code, without necessarily having to RUN the code.

BEGIN blocks (and everything similar) really screw this up for us.

If you need to execute code to read code, then you need to execute arbitrary code in order to read arbitrary code. And right there the phrase "execute arbitrary code" should be more than enough to explain the problem.

In one hit it creates the limitation that you can only every create tools that run your OWN code, you can never write tools that run anyone else's code.

As an example of how this can hit in unexpected ways, in Perl 5 if you have Komodo (or anything else that does background linting) installed it's quite easy to create a totally innocent-looking link on a webpage that will delete your home directory.

So compile time string-eval has to go (and BEGIN blocks or anything else that does it). It's also the death of having a "sub import" that can do anything you like.

In exchange, you get code you KNOW is going to be parsable, if the entire program is valid. None of this BEGIN { exit if rand > 0.5 } stuff.

Since it is so trivial to implement (one line of code in PIL), for the next parrot release there should be an experimental --xdeterministic flag you can pass to perl6 to forbid compile-time execution.

None of this makes interesting tools more POSSIBLE, it just makes it safe to open up a project from a third party without wondering if it is going to install a root kit or not. Which goes a long way to creating the incentives to write the tools at all.

Step 3 - Finalized Grammar

The other ugly problem is in the idea of having a compile-time-morphing grammar AT ALL, and completely blows away the possibility of having a "PPI6" that is sufficiently complete to handle all documents (and along with it kills perlcritic6 and sqlinjectiondetector6 and perltidy6 and other stuff).

If the syntax and semantics of your document is not stable DURING the document being parsed, how are you supposed to generate any form of semantic model (ala method-name completion or SOAP API auto-generation) or even a syntactic document model (ala PPI).

You can't describe anything, simply because you have no idea what you will need to describe in advance.

Perl 5 is horrible in this respect because the grammar contains an "operator/operand" state that flips back and forth every other character (which is the underlying cause of the "/" problem, and 6 other characters).

But at least it's only a boolean flag, and I could fudge my way around the problem by heuristically guessing well enough.

Once grammar modification is easy, any reasonable likelihood of fudging goes out the window because it can change in so many more ways.

You end up with a language which is expressive as hell, but where the most sophisticated editor you can create is vi (with no syntax highlighting allowed).

Assuming such a thing can be created, my bet for the "use strict;" for Perl 6 would be something like "use v6-static;" which would guarantee the file sticks to the official primary grammar for that document and allow you to safely use source code analysis tools on the file.

This could even be useful without determinism, as it would at least let you prove that the compile-time code wouldn't modify the grammar, and so you could still safely model the source code without having to risk compiling and running the BEGIN block.

It would also mean you solve the other big stepping stone for tools, the ability to do useful things with a document BEFORE it is a legally correct program.

This covers everything from parsing "use Win32;" on Unix to "use Not::Written::Yet". The ideal for a parser is a context-free (only needs one file) symantic parser you can safely fuzz-test without it exploding.

PPI is merely a syntactic parser you can safely fuzz-test.

And be under no illusions that it can be replicated in Perl 6. It was only BARELY possible to implement it for Perl 5 and even I wasn't sure I was going to find a good enough path through all the impossible problems until a few months before it was finished.

Now if we are indeed walking into this trap, there's certainly ways to avoid it.

Removing determinism would be useful (and easy to implement) but possibly impractical as it removes the ability to do platform-adaptive code that picks dependencies at compile-time. Goodbye File::Spec...

Finalizing the grammar by default somehow is far more interesting in my opinion. It enables all syntactic and some of the semantic tools, and will discourage flippant grammar modifications.

If you REALLY need to do something exotic, you should be willing to pay the price for it by doing something like "no determinism;" or "no tools;" so that it's clear to anything parsing your code that it should stay the hell away unless it is either running the code, or willing to take the risk or exploding violently.

Or at least the tool is CONFIGURABLE to act safely, instead of being vulnerable to exploit by default. And before you mention Safe.pm, consider "BEGIN { while(1) { $_++ } }" or see a great talk by an Australian University lecturer that teaches a Perl course called "Safe isn't" in which she explores all the ways her students violate her computer from inside Safe containers.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • You are working under the assumption that you won't be able to reuse the existing toolchain to write these tools.

    The approach that everyone seem to be aiming for is that instead of writing PPI all over again for Perl 6, you are supposed to get decent enough support form the actual grammar that ships with the compiler in order to do your own interesting things with it.

    Secondly, since Perl 6 supports separate compilation units in many ways this is actually much simpler than Perl 5 - there is no more possibili
    • in the absense of any BEGIN { } declarations or importing of macros/grammars from other compilation units (this is something you can statically check for if the unit is already compiled)


      I am probably being thick, but once they code is compiled, haven't you already run the BEGIN block and all its arbitrary contents?
      --
      rjbs
      • The begin block itself has to be fully parsed before it's run.

        Similarly a macro or grammar extension coming from another compilation unit has already been compiled.

        At this point you can examine their code in a manner much like Safe does (existing problems in safe are an implementation issue, not a conceptual one), and run the code with some resource limitation if necessary (if this wasn't possible then we wouldn't have javascript ;-)

        Furthermore, if you deduced by static analysis that these blocks cannot aff
        • > At this point you can examine their code in a manner much like Safe does (existing problems in safe are an implementation issue, not a conceptual one), and run the code with some resource limitation if necessary (if this wasn't possible then we wouldn't have javascript ;-)

          You don't need to run Javascript in order to parse it, since it has (I think) a static syntax.

          Also, this comes down to practicality.

          "What percentage of CPAN can this parsing strategy handle?"

          As a context-free document parser, PPI can
          • With respect to dependencies that will indeed fail to work, but for use Foo that's untrue - Perl 6's importing semantics will support real linkage of symbols for the benefit of compilation units. The method 'import' and glob assignments are not supposed to be the only way to actually import symbols anymore. This solves a lot of issues.

            As for reading files etc in BEGIN - that's also handled differently - there is no guarantee that a BEGIN block will run every single time you run the program, it is fair game
    • > In other cases, if it's safety you're after, in not running the compile time code, then theoretically you just use something like perl 5's Safe on all the macros and grammar extensions.

      Only if you can solve the Halting Problem.

      In Perl 5, even trivial Perl examples involve BEGIN blocks (use strict) and grammar modification (operator/operand switching).

      This problem applies to Perl 5 to.

      Simon Cozens has a never-released parser based on the Perl internal parser.

      It works just fine, as long as the code compi
      • Only if you can solve the Halting Problem.

        No version of the Perl compiler or processor for any version of the language attempts to solve the Halting Problem. They tend to do a pretty good job on most reasonably correct code (and plenty of unreasonably incorrect) code as well. You don't need to solve the Halting Problem. You only need to decide if it's worth it at any particular point to Halt and say "Sorry, I'm not going to continue processing from here," and you can do that if you control the runloo

      • The halting problem only applies to code you cannot introspect.

        If you have a function, and that's all, then you can't find out what's in it.

        But given a compiled optree, you have much more information.

        If you parse the BEGIN { } block under the current rules, then you wind up with an optree which you can then examine, to see what it does.

        As for simon's project - perl 5's parser was never designed to make this easy, it was designed to emit an interpreter optimized optree. This is very different from the design
        • We probably need to escalate this to a formally trained mathematician here, but as I understand it, it applies to any case with "arbitrary" code whether introspectable or not.

          You CAN prove something will finish in finite time, you just can't prove how long that finite time is, which may be longer than the heat death of the universe.
  • The mutable Perl 6 grammar scares me too, but I'm hoping that people never use it and it doesn't become a problem.

    In my "Bird's Eye View of Perl", the talk I give to managers, I talk about Perl being a single language that comes from the same source. The idea of multiple implementations looks good on paper, but it doesn't work out in practice. Besides knowing the core language and its libraries, now the mere mortal users have to wrestle with pecularities of each implementation and grammar. It the reason I s
    • The mutable Perl 6 grammar scares me too, but I'm hoping that people never use it and it doesn't become a problem.

      Yet, whenever I raise my biggest objection I have against Perl6 (meaningful whitespace), I always get thrown back "well, you can change the grammar you know...".

      • Yeah, I was just going to say that. "Just change the grammar" was used to end Perl 6 language debate sort of like "God works in mysterious ways" is used to end religious debate.

        Thankfully I haven't seen it come up as much lately. Maybe folks are starting to realize that easily mutable grammars are a powerful and awesome tool but not the sort of thing you want every kid on the block to use.
    • I see multiple implementations as a strength.

      At some point you simply HAVE to be able to have an IronPerl6 and JPerl6 simply for long term language flexibility and health.
      • I thought that was the point of parrot---you didn't need different implementations if you had the byte code.

        I don't see different implementations as necessary to anything. Some people might like it, but in reality people will code to the implementation's features. It happens in Java, Javascript, Lisp, Smalltalk, and probably a lot of others that I haven't used. The conversations at the pub are about who supports what and what you have to do to make good code on one implementation work on another.

        It's not so
          • Catalyst dealt with the Class::DBI disaster by using something different. That's not how I want people to deal with Perl. :)
  • Seems to me the trick is to separate grammar changes from BEGIN blocks. That is, you have explicit blocks which do nothing but change the grammar. They can't execute code, they can't call eval, they can't declare variables, they can't load modules (except other grammar-only modules). They would then allow two critical things for tools:

    1) They're safe to execute.
    2) The tools can be made aware of grammar changes.

    It might not even need to be as restrictive as all that, maybe just that grammar changes happ
    • The problem is that the grammar is code itself.
      • That may not be a problem if the code is never run...

        If "=" is mapped to sub equals, you shouldn't need to run equals while parsing, right?
        • You don't understand. The grammar itself is a class. The parsing primitives are methods. It is actual Perl 6 code that does the parsing, you don't get a BNF for it.
  • I'm afraid that I can't restate your concern without putting words in your mouth - so I won't try. So I will look at this from my own perspective.

    I think it would be great to be able to have parsers/editors/refactor-ers that can "statically" (whatever that means) analyze Perl 6 code and do neat(tm) things with the output. I will write the majority of my Perl 6+ code in the standard grammar using the future best practices for doing so because I think there will be great tools that will give great insigh
    • > I see no reason why there can't be a standard PPI6 - it is just the standard grammar. Done.

      There is a VAST gap between anyone thinks is true and what they can prove.

      Personally I DO see reasons why, plenty. Because I spent three years wrestling with them in Perl 5's grammar, several of which are based on mathematically provable impossibilities. And these grammar problems remain in Perl 6 unchanged.

      You can't just invoke the "standard grammar" as some kind of magic cure-all.

      SOMEONE has to eventually write
      • I am fully aware of your work with PPI on Perl 5. I am aware of the "proof" that it is impossible to statically parse Perl 5. I think they are amazing accomplishments. That said, I haven't had any need for PPI on Perl 5 and I think that it is obvious that BEGIN blocks potentially make it impossible to parse Perl 5. I believe your zero-ary sub followed by regex issue is actually taken care of in Perl 6 - at least the standard parser will be able to handle it.

        You can't just invoke the "standard grammar"

  • Actually, because Perl 6 does have a grammar, a tool could include it's own parser which would could lex the code including any grammar changes without doing any of the calls which don't change the language. That wasn't really possible when filters were reaching in and twisting about the compiler innards. Consequently, tools like vim with it's own syntax coloring language which is not dynamic, will be less useful :( while others with plug-in token labelers will flourish -- well until someone replaces the cu
    • You've outlined the situation perfectly.

      Perl 6 breaks a ton of existing tools, while relying on the existance of new tools which everybody assumes will exist but nobody has actually proven can be written.
      • Perl 6 breaks a ton of existing tools,

        Besides the fact that the existing tools were written before Perl 6, there is no guarantee they would work with Perl 6 if grammar modifications were disabled. There is no existing tool today that works with perl 6 (other than basic syntax highlighting of various editors). Your entire basis in this thread is about the creation of "new tools which everybody assumes will exist but nobody has actually proven can be written.