Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

cog (4665)

Journal of cog (4665)

Wednesday July 28, 2004
05:50 AM

Lingua::Identify

[ #20101 ]
I better get on with this, or else it will never be finished :-)

I think I've made up my mind in regard to this:

Lingua::Identify
Lingua::Identify::Standard::Words
Lingua::Identify::Standard::Ngrams
Lingua::Identify::Standard::...

That will hold, respectively, the main program, the information on words, on ngrams, etc.

It also leaves room for something such as

Lingua::Identify::Extended::FI
Lingua::Identify::Extended::KL

for new languages taught by the user :-)

I guess I'll do it this way :-)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Lingua::Identify::EN
    Lingua::Identify::PT
    where the EN and PT modules include words, ngrams and such. New languages would appear as
    Lingua::Identify::FI
    and so, following the same idea as the builtin languages.

    This is my quick reflection about this issue.

    • My approach was so that the user wouldn't get 20 modules by default :-|
      • True, but in the other hand, this way you sepparate completely languages. That can give some benefit, I think.
        • Not when we talk about maintaining those modules... :-)
          • The idea is to have in those modules ONLY the language information. Those information you will need to maintain. In one, two or hundred modules that information will be allways the same. The real identification code should be generic and use any of the other modules data.
            • No, no, no.

              Suppose I only have words and ngrams, and now I want to add prefixes... Instead of adding a new module, this way I'm going to have to dwell through 20 something different ones :-| or even more, in the future! :-|
              • Lingua::Identify::EN::ngrams
                Lingua::Identify::PT::words
                ...
                ??
                • 20 * 5, for instance, makes 100 modules... is that worth it? :-|
                  • Not 5... you don't have them all for all languages. Not 20, you don't have all those languages. But if you do, why not? They are small modules, quite well organized in the directory hiearchy, and it will be very easy to a new user/hacker to look at it and find what he/she/it is looking for.
                    •   Not 5... you don't have them all for all languages.

                      Sure I do... For every single language in the tool, I have that information (words, ngrams, prefixes, sufixes, etc).

                        Not 20, you don't have all those languages.

                      But I will...

                        But if you do, why not? They are small modules, quite well organized in the directory hiearchy, and it will be very easy to a new user/hacker to look at it and find what he/she/it is looking for.

                      That's true... but changes will be harder this w