Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ChrisDolan (2855)

ChrisDolan
  (email not shown publicly)
http://www.chrisdolan.net/

Journal of ChrisDolan (2855)

Tuesday October 03, 2006
02:16 AM

Perl syntax for numbers

[ #31201 ]

I've been working on implementing $token->literal() support for PPI::Token::Number subclasses. This method takes a string representing a number and tries to parse it like Perl does. Thus:

  '-.00_1' -> -0.001

Egads, Perl has a lot of numeric formats with a lot of special parsing! Tonight figuring out where Perl allows underscores in numbers is driving me a little batty. I've got PPI's tests working with valid underscore placement, but I need to write some tests for invalid underscore placement. For example:

  1_000          # valid
  1__000         # valid, but warns
  100_           # valid, but warns
  1_000.000_001  # valid
  1_.000         # valid, but warns
  1._000         # valid, but warns
  0xdead_beef    # valid
  0_xdeadbeef    # syntax error
  0_755          # syntax error
  0b1010_1010    # valid
  0b_10          # valid, but warns
  0_b10          # syntax error
  1e1_0          # valid
  1e_10          # valid, but warns
  1e_-10         # valid, but warns
  6_0.6_0.6_0    # valid

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • You might want to look at the perl API function looks_like_number(), accessible from XS. (see perldoc perlapi.)
      • Aristotle, Rafael,

        Much appreciated! I was not aware of that function. In this particular case, PPI is designed to be more lenient than Perl (it must be round-trip-safe even on invalid syntax) so we'll stick to our custom tokenizing. That said, I'll probably look deeply at looks_like_number to see if I can find inconsistencies in our tokenizer.
      • Hmm, I just looked at looks_like_number() in Scalar::Util and it's a different beast entirely. That's used for numification, not tokenization. looks_like_number() does not support '_', '[eE]', binary/octal/hex numbers nor version strings. The internal grok_number() in numeric.c is similarly limited.

        Instead, I've discovered scan_num() in toke.c. That's what I want to emulate (leniently).
  • Prior to 5.6 the placement of _ was restricted something like every third character. Now it's completely open. Isn't it? I think so. Probably.
  • As a summary, am I right in interpreting that an underscore is only accepted, without warning, if there is a digit (in the current base) on either side? And no, the "0" prefix in "0x" or "0b" doesn't count. Ditto for the leading "0" for octal.