I recently had a situation come up where I had to whip up some code to split up a huge (1 GB) mbox file. I KNOW I should be using mdir, but com'on, people
So I whipped this up - feel free to use/tweak this for your own use:
#!/usr/bin/perl -wT
# Process:
# 1) cp
# 2) Run this script
# 3) chmod/chown the INBOX.GigSplitNN files
# chown person:users
# chmod 0600
# 4) mv
# 5) mail the person and see if the
# 6) diff
# i ended up just tailing the file with the right number of differing lines
# and >>'ing that into
# b/c diff'ing two 1GB files takes WAY too long!
use strict;
open( MBOX, '/var/mail/person.bak' ) || die "Cannot open person.bak: $!";
# go through the mbox file
my $message = '';
my $line_count = 0;
my $message_count = 0;
my $file_base = '/home/person/INBOX.GigSplit';
my $file_i = 1;
my $line_count_limit = 580000; # this ends up with ~40MB files, which are more tolerable
my $need_to_write_init = 1;
while( <MBOX> ) {
$line_count++;
if (
if ( length( $message ) > 0 ) {
$message_count++;
my $file = $file_base . sprintf( "%02d", $file_i );
print "Got message # $message_count - appending to $file
if ( $need_to_write_init ) {
write_initial_msg( $file );
$need_to_write_init = 0;
}
open( SPLIT, ">>$file" ) || die "Cannot append to $file: $!";
print SPLIT $message;
close( SPLIT );
if ( $line_count > $line_count_limit ) {
print "Line Count exceeded $line_count_limit, so incrementing \$file_i...\n";
$file_i++;
$line_count = 0;
$need_to_write_init = 1;
}
}
$message = $_;
} else {
$message
}
}
close( MBOX );
print "All done!\n";
sub write_initial_msg {
my $file = shift;
open( FILE, ">$file" ) || die "Cannot open $file to put in initial msg: $!";
print FILE <<"_EOF_";
From MAILER-DAEMON Mon Aug 14 13:00:31 2006
Date: 14 Aug 2006 13:00:31 -0400
From: Mail System Internal Data <MAILER-DAEMON\@mail.example.com>
Subject: DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA
Message-ID: <1155574831\@mail.example.com>
X-IMAP: 1134739889 0000025473
Status: RO
This text is part of the internal format of your mail folder, and is not
a real message. It is created automatically by the mail system software.
If deleted, important folder data will be lost, and it will be re-created
with the data reset to initial values.
_EOF_
close( FILE );
}
So that will create INBOX.GigSplit01
Yes, I KNOW that could be optimized and probably even one in one line (go for it, golfers!)
That's just the way I roll!*
Speaking of coding, Google has their Code Jam going on, but where's the love for Perl? You can program in C++, C#, Java, Python and VB.NET, but not Perl. It probably has to do with what TopCoder supports, but something should really be done to get Perl in that list, for longevity sake.
Peace,
Jason
* = My new favorite saying
formail (Score:1)
DESCRIPTION
formail is a filter that can be used to force mail into mailbox format,
perform ‘From ’ escaping, generate auto-replying headers, do simple
header munging/extracting or split up a mailbox/digest/articles file.
The mail/mailbox/article contents will be expe
Re: (Score:2)
- Jason
Last mail not saved (Score:1)