As you may or may not know, people on CPAN own modules (technically they own the namespace). Each Foo::Bar is owned by one or more CPAN accounts. Usually you gain ownership on a "first-come" basis, but it can also be transferred. Only the "official" tarball for a given namespace is indexed. So if the owner of Foo::Bar uploads Foo-Bar-1.23.tar.gz Foo::Bar will point at Foo-Bar-1.23.tar.gz. If I (presumably unauthorized) upload Foo-Bar-1.24.tar.gz the index will still point at Foo-Bar-1.23.tar.gz.
Here's the rub. Not owning a module doesn't stop you from uploading. It also says nothing about who owns the distribution. gitpan is by distribution. Now it gets a little more difficult to figure out who owns what. For example, look at MQSeries-1.30. All but two modules are unauthorized. BUT notice that MQSeries.pm is authorized. The CPAN index does point MQSeries at M/MQ/MQSERIES/MQSeries-1.30.tar.gz (everything else is at 1.29). Likely what we have here is a botched ownership transfer.
How do you mark that? search.cpan.org seems to take the strict approach, if anything's unauthorized its out. The CPAN uploads database I have available is the opposite, if anything is authorized its in. What to do?
Then there's stuff like lcwa. Looks like junk, but here's the thing. CPAN has a global module index to worry about, gitpan doesn't. Each distribution is its own distinct unit. So lcwa does no harm on gitpan, it can be recorded.
What does matter? The continuity of a distribution's releases, and this is precisely what CPAN does not track. It doesn't even have a concept of a distribution, just modules inside tarballs. CPAN authors playing nice with tarball naming conventions gives the illusion of a continuous distribution.
So... for a given release of a distribution (ie. a tarball), how does gitpan determine if the release should be included in the distribution's history? If we go strict, like search.cpan.org, we're going to lose legit releases and even entire distributions (like lcwa). If we let anything in gitpan is not showing an accurate history.
Add the complication that authorization changes. For example, the MQSeries module ownership will eventually be fixed. What then?
First pass through, gitpan is ignoring this problem. Its just chucking everything from BackPAN in. Second pass will rebuild individual repos with collected improvements. This is the first thing I'm not sure what to do about.