Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Just a quibble, but I believe it is more correct to say a string in Perl 6 is made up of graphemes rather than characters. The two Perl 5 strings "\x{F6}" (LATIN SMALL LETTER O WITH DIAERESIS) (and "\x{6F}\x{308}" (LATIN SMALL LETTER O and COMBINING DIAERESIS) should be the same string in Perl 6 (unless the codes pragma is turned on).

    • Yes, you are right. The Perl 6 spec is full of references to "characters", but in a few places it mentions that this term defaults to being the same as "graphemes".

      I think my growing familiarity with Unicode is not yet at the stage where I immediately reach for the term "graphemes". :) Maybe some day.

  • This "you can't think the wrong thoughts" target reminds me of the novel Babel-17 by Samuel Delaney which uses the same sort of concept but in a human language that causes the person who thinks in that language to automatically think the right thoughts in a very powerful way.
    • I think the idea is sufficiently old. Umberto Eco details various attempts made during the years in his book The Search for a Perfect Language [amazon.com]. Newspeak in 1984 [amazon.com] has words selectively pruned from it so that dangerous thought becomes difficult or impossible.

      • I think it is strongly related to the Sapir-Whorf Hypothesis - http://en.wikipedia.org/wiki/Linguistic_relativity
  • So if I want to find a “magic number” byte sequence in a binary file, how do I do that in Perl 6?

    • According to S32/IO [perlcabal.org], the return type of slurp (which reads a whole file at once) is Str|Buf. A Buf is returned when a parameter :bin is passed to slurp. After that, you can treat the Buf you get as an array (because Buf does Positional, and do as advanced indexing operations as you need to find your byte sequence.

      I wish I could show this with real, working code, but Buf isn't implemented just yet in Rakudo.

      • What kind of pattern matching facilities does Buf support?

        • The spec is a bit silent on that point, so I asked on #perl6 [perlgeek.de]. The conclusion seems to be "convert it to a string if you want to pattern match".

          Then again, if smartmatching with list semantics is what you're after, that should work. Something like $buf ~~ (*, 104, 101, 108, 108, 111, *) to find "hello" in an ASCII-encoded Buf.

  • In Java, a java.lang.String and a byte[] have nothing in common, and there are no (non-deprecated) ways of converting between them without specifying an encoding. It's one of the very few features I like about Java…
    • There are (at least) two things wrong with Java's encoding support:
      1. No way to avoid UnsupportedEncodingException, even for UTF-8, which is guaranteed to be present.
      2. The concept of a "system default encoding" is flawed and leads to bugs in portability. You should be forced to always specify an encoding.
        1. true
        2. I thought all methods that converted without specifying an enconding were deprecated… anyway, yes, implicit encondings are a very bad idea

        and while we're at it,

        3. internal enconding is utf-16, and it's visible at the language level, so that the "length" method gives you completely useless information

        • Very true about the 16 bit character. Thankfully, it's less of a problem for me right now.