Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of jjore (6662)

Wednesday September 09, 2009
06:32 PM

Starting to chart a map of where perl locates data

[ #39604 ]

At work, I've got a problem on the back burner which is kind of interesting. We've got some mod_perl processes with big data sets. The processes fork and then serve requests. I've heard from Operations that they're not using Linux's Copy-on-Write feature to the extent desired so I'm trying to understand just what's being shared and not shared.

To that end, I wanted to map out where perl put its data. I made a picture (http://diotalevi.isa-geek.net/~josh/090909/memory-0.png, a strip, showing the visible linear memory layout from 0x3042e0 to 0x8b2990. The left edge shows where arenas are. The really clustered lines to the middle show the pointers from the arenas to the SV heads. The really splayed lines from the middle to the right show the SvANY() pointer from the SV heads to the SV bodies.

I kind of now suspect that maybe the CoW unshared pages containing SV heads because of reference counting are maybe compact or sparse. They sure seem to be highly clustered so maybe it's a-ok to go get a bunch of values between two forked processes and not worry about reference counts. Sure, the SV head pages are going to be unshared but maybe those pages are just full of other SV heads and it's not a big deal. If SV heads weren't clustered then reference count changes could have affected lots of other pages.

Anyway, there's a nice little set of pics at http://diotalevi.isa-geek.net/~josh/090909/. I started truncating precision by powers of two to get things to visually chunk up more. So when you look at memory-0.png, there's no chunking but when you look at memory-4.png, the bottom 4 bits were zeroed out.

There's a github repo of this at http://github.com/jbenjore/Internals-GraphArenas/tree/master for the interested.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • It would be interesting to see how many of these are reused storage allocated on pad variable introduction.

    In principle the static overhead of PADLISTS, etc could be completely shared (once allocated they never change) but the actual SV bodies they store change all the time.

    Maybe priming the callstack by invoking all of the CVs in order to share that stuff could be worth while, though it's probably not that much storage at the end of the day.

    Secondly, Stefan O'Rear has been working on memory compaction, loo

    • Sorear's work is interesting. I've used http://search.cpan.org/dist/Judy [cpan.org] to get compact data as well. While writing the scripts for http://github.com/jbenjore/runops-movie/tree/master/scripts [github.com] I found I'd often write the code in Perl, then would occasionally share bits of data with some Inline::C.

      But separately, my interest right now is in what happens to Linux's CoW. I've got data that is theorized to be both large and unshared between mod_perl processes. I want it both compact and shared.

      • (Note: I know nothing about this, so this may make no sense at all)

        As I understand it, the heap in Perl contains both code and variables, so if, in a forked process, a variable (which happens to share a page with some code) is changed, then that entire page becomes unshared.

        Code seldom needs to change in a new fork. Would it not be possible to separate code and variables, so that the pages occupied by your code would remain shared?

        I'd imagine that this would be a net win for memory usage in mod_perl proces

        • Yes, that would be nice. I didn't map out where compiled perl goes in memory and how much it shares pages with things likely to change. It's likely to be intermingled because it's also on the heap.