Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ethan (3163)

ethan
  reversethis-{ed. ... rap.nov.olissat}

Being a 25-year old chap living in the western-most town of Germany. Stuying communication and information science and being a huge fan of XS-related things.

Journal of ethan (3163)

Wednesday November 13, 2002
06:34 AM

Stupid crippled HTML, stupid vim, stupid regexen

[ #8927 ]

Moving on with my blogger things have gotten much more complicated than I had ever wished.

1st nuisance: Entries can be made in several formats. When retrieving some entries made by other people I had to realize that some make them in Plain Old Text while others prefer HTML. My initial thought was letting the raw text run through html2text. This was quite horrible in several respects. It uses some odd backspace escapes to highlight text. Naturally, vim does not know any of them. Those could be turned off fortunately. More tricky is the newline thing. Newlines don't mean a lot in HTML and html2text expects HTML. So, I had to replace newlines with <br>. Fair enough. Next issue was this proprietary <ecode> thingy. It seems to be a some form of preformatting tag to preverse indenting. Now I first convert them to <code> and afterwards replace whitespaces with the nonbreaking whitespace entity. The whole sequence now looks as follows (there is more to come, undoubtedly):

        $entry->{body} =~ s"<(/?)ecode>"<$1code>"g;
        $entry->{body} =~ s"\n"<br>"g;
        $entry->{body} =~
                s"<code>(.*?)</code>"
                    '<code>' . do { (my $s = $1) =~ s/\s/&nbsp;/g;$s } . '</code>'"gsex;

That should leave indenting intact and is also understood by html2text. Of course, other deficiencies remain: URLs are currently lost and there is this Damocles sword above me insofar as I tackle these HTML issues with regexen.

2nd nuisance: There doesn't seem to be a vim-script or macro or plugin or whatsoever available that does basic HTML rendering in a buffer. Disappointing.

3rd nuisance: Regexen are stupid. Perl's regex engine did not like my fancy first attempt at doing it with look-ahead and look-behind. This one might be arguable: Perhaps I was the stupid one here.

None the less, the most recent version of the blogger is as always available through here. The user interface is quite consistent now with the activated window always maximized, easy toggling between compose-window and index-window. Entries from other people can be retrieved by their nickname and are at least rendered in a readable fashion.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I don't know why you're not having any luck with HTML syntax highlighting in vim. Try
    :set ft=html
    --

    --
    xoa

    • Oh, that always worked. This is what I do for the compose-window. But when reading an entry I'd rather not have the highlighted HTML-source but a nicely layed-out text document. I would have especeted that something like that already exists for vim. I couldn't find anything suitable though.
      • Oh, oh, I see. Could you pipe the file to lynx? Something like:
        :%!lynx -
        --

        --
        xoa

        • Could you pipe the file to lynx?

          Hmmh, looks as though lynx can't read HTML stream-wise. Same limitation applies to links. w3m however can do it but doesn't appear to understand this sort of HTML-bastardism. The result is basically identical to what got piped in. ":%!html2text -nobs" works but then there are no linebreaks for Plain Old Text format.

          Currently, I use IPC::Open2 to do it inside my module:


          sub get_entry {
                  my $id = shift or return;
                  my $