Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Java finally catches up to Perl (and Python, Tcl)

posted by pudge on 2002.02.14 18:07   Printer-friendly
rjray writes "Over at /. they're reporting that Sun has finally released the first official version of their Java 2 SDK version 1.4. Read the release notes here. Java finally has native support of regular expressions, one of the first things I found lacking when I took a shot at Java programming some time ago. The regex's are even referred to as "Perl-like" in at least one place, maybe more!"
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  •'s still dirt slow

    ...and overly verbose

  • The example program [] takes 92 lines to implement a file grep. It implements the pattern .*\r?\n, and if you wonder why that's bad, read Ovid's Death to Dot Star post [] over at PerlMonks. (I'm presuming the Java regex engine backtracks, like other regex engines I know of.)

    • Hmmm... now I'm rather curious as to what's going on here. Here's the regex code to match a line:

      // Pattern used to parse lines
      private static Pattern linePattern
      = Pattern.compile(".*\r?\n");

      I don't think VSarkiss' criticism of the dot star is appropriate in this case as Java documentation [] states that the dot does not match a line terminator. However, they appear to have goofed up the line terminator! What about \r on Macs? From what I can tell from their docs, the carriage return/newli

    • Argh. Don't have to convince me that Java is a big fat slug, but let's be fair: The first 43 lines of that are comments. A far cry from Perl, but only half as far ...
  • Severely b0rken (Score:3, Informative)

    by Matts (1087) on 2002.02.15 2:08 (#4557) Journal
    I believe there's something broken about Java's split() fuction from the regexp class. If you do split(/=/, "foo=bar=20", 2) in Java, then you get two return values, as you would expect. What you might not expect is those return values to be "foo" and "bar". I know this was the case in the betas, but I haven't checked if they fixed this or not. I'd love it if someone could confirm that.
    • This was fixed between beta 2 and beta 3. eg:import java.util.regex.*; public class Regexp { public static void main(String[] args) { Pattern p = Pattern.compile("="); String[] theStrings = p.split("foo=bar=20", 2); for (int i=0; itheStrings.length; i++) { System.out.println(i + " : " + theStrings[i]); } } } Gives 0 : foo 1 : bar=20 as expected on 1.4.0 beta3 (and final release), but 0 : foo 1 : bar on the beta 2.
    • I don't understand your example. split(/=/, "foo=bar=20", 2) I presume the "2" means "only give me two return values," because you say we'd expect to get two return values. Without the 2 I'd expect to get a list of {foo bar 2}, with the 2 I'd expect to get what you say I wouldn't expect: {foo bar}. Huh?

      J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
      • Oh, wow! I just learned something about Perl.

        I've rarely used the final parameter to split, so I didn't know.

        J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
        • That final parameter is *incredibly* useful. Imagine parsing email headers, or cookies, or config files, or... the list goes on. Glad to hear they fixed it though.
  • Perl has other features besides Regexps, which I like very much. Stuff like nested data-structures, dynamic typing, functions as first-order values, closures, eval, multiple-inheritance, etc. etc.

    I don't think Java has all that, or should. That's why I still prefer Perl for most purposes.

  • Yes, the new Java regex package is almost identical to Perl regexes. The description is here [], including a description of the differences, most of which seem to be insignifigant (although I like having character classes be an allowable part of a character class.)
  • The Jakarta subproject Oro [] offers perl5-compatible regexps. I never really stress-tested it, but it did what I needed to do (which was doing basic sanity testing on email addresses).