months and months of PAQ

Exciting stuff... PAQ8M has narrowed the gap with StuffIt in terms of lossless JPEG compression. For those unfamiliar with PAQ compression, I highly recommend checking it out. For the uninitiated you're probably best off checkout out LPAQ1 on your non-JPEG data first as it can run in a sane amount of time.

Google Calculator... an excellent desktop application

Ages ago I decided to come up with the craziest unit conversion request that google could answer. You'll all be pleased to know that 80 cubic light years in half teaspoons is a valid conversion.... Anyone have any better? Also... Cubic light years. Anyone beat me to typing that into Google?

CBC Radio 3 Podcast CUE files...

So I've been recommending CBC Radio 3 Podcast to everyone I know. The only thing they're missing for their multi-format podcast (they even have OGGs!) would be CUE files. So here's my contribution: CBCR3_2007-06-01.cue Cue file for #106?

Strange Research Encounters

Ran into this strange link while digging up references. Also, nerd lolz: "Do not plan a bridge capacity by counting the number of people who swim across the river today" - Heard at a presentation.

WRT PAQ XML Compression

While brainstorming ideas for testing the metascheduler I'm building I thought to look into data compressors again... Specifically the PAQ family of compressors. The latest update is PAQ8jc (fixed tarball). I whipped up an ebuild and took it town using Intel's C++ library. I tested it out on a 1.8M XML file:

   reference   1.8M
   gzip -9     168K
   bzip -9     108K
   PAQ8jc -5   61K
   PAQ8jc -7   61K (2 bytes smaller, but longer runtime/memusage)
   

Okay... So this shows that if I feel like getting my hands dirty with C++, there's actually some value in parallelizing this algorithm.

Something that caught my eye while looking into this is XML-WRT. It's a fantastic project which scratches an itch I developped in the middle of a lecture on WebServices some time ago. XML-WRT can be thought to work in two distinct steps; substitute common tagnames, attributes etc with shortened tokens; run result through zlib or FastPAQ depending on user preference. I tested its WRTified zlib/fastpaq targets on a the 15M Locations.xml file from gnome-applets (wow that's big):

   reference   15M 
   gzip -9     2.0M
   bzip2 -9    1.2M
   xml-wrt -2  1.8M (zlib default after wrt)
   xml-wrt -3  1.7M (zlib best after wrt)
   xml-wrt -10 693K (FastPAQ normal)
   xml-wrt -11 693K (FastPAQ best)
   

I also tested it on a 684M XML database (the default buffer size is too small for dictionary generation on this particular file):

reference           684M
gzip -9             102M
bzip2 -9             74M
xml-wrt -l10        ----
xml-wrt -l10 -b100   51M

What I want you to take away from this is that xml-wrt/PAQ is pretty slick and actually quite usable. xml-wrt -10 will actually complete in a sane timeframe. PAQ8jc on the same file however will take literally ages and probably won't serve any practical purpose for you...

Silly typo

If you get the following obscure error message while submitting a seemingly simple service request to a Globus service it's because you're attempting to speak HTTP to an HTTPS container as the last reply in this bug tells us...

java.io.IOException: Token length 1347375956 > 33554432

lo disappeared you say?

sigh and lol:

%ifconfig
?_        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:176 errors:0 dropped:0 overruns:0 frame:0
          TX packets:176 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:31442 (30.7 Kb)  TX bytes:31442 (30.7 Kb)