Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

cog (4665)

Journal of cog (4665)

Thursday October 14, 2004
05:41 AM

Regex for a word

[ #21338 ]

How would you define a regex for a word?

No, it wouldn't be [a-z]+, as that would get things such as "z", which I don't think is a word in any language (am I wrong?)

So... do all words in every language contain at least one vowel? I think it would be too simple if it were so, but I can't think of any example...

Any other rules?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • In French many single-syllable words can be elided before a wovel, so d', l', j', m', t' are valid words. t is a fake word used to adapt some ligatures (comment va t-il ?). y is an adverb of location (j'y vais), it counts as a semi-vowel.
    • Hmm... this is *very* useful information to what I'm doing :-) Thanks :-)

      Anyone else would like to say something about his/her own language? :-) Or any other, for that matter :-)
      • There are several "words" that do not contain what is normally called a vowel.

        In english, y is not a vowel (I learnt that in school in Ireland, so I might be wrong), so the word rhythm does not contain a vowel.

        I had a jugoslav friend a long time ago, whose last name was Hrs, which he claimed was perfectly pronounceable (somewhat like 'hearse'), but I can't find any vowels there either.

        • We were spending our holiday on the island of "Krk" in Coratia.

          One of the things I remember from my ancient greek classes in school is that there is a family of consonants called something like "muta cum liquida" which can behave like vowls. I think they are:

          r, m, n, l

          You can say those consonats for a prolounged period of time ("mmmmmmmmmmmmmmmmmm") just like a vowel ("aaaaaaaaaaaaaa"), which is something you cannot do with 'proper' consonants ("t-t-t-t-t-t-t"). Hence they are called 'con-sonants'.

          I gue

  • use File::Slurp qw(read_file);
    my @words = read_file '/usr/share/dict/words';
    chomp @words;
    my $regex = join '|', map quotemeta, @words;
    $regex = qr/$regex/;
    • OK, that was a good answer... what I forgot to say was: I don't know the language nor do I have a list of its words :-)

      I'm doing this for something like fifteen different languages.