Team Canada needs some more muscle over at Project Euler. There are no Canadians at 100% genius level. Currently rayfil is at 96% but that's the closest we have. Although you have to give him extra points for writing his solutions in Assembler. I answered a few this weekend myself but I'm back to the grind again now so I'll have to put that pass time on the back burner for a while.
I've been running wallpapers I find online through pngout for quite some time now and am thoroughly impressed with the results. I unleash it upon /usr/share/pixmaps/*png yearly as well. How can one not get satisfaction out of making a file smaller and losing nothing in the process? This habit has left me a little shocked with the results I've been gathering. The subjective observation I'm left with is that over 90% of the images I find lying around can be recompressed losslessly with a fairly sizable saving. It should be standard procedure for a publish-once, subscribed-many medium to ensure optimal size to save on bandwidth, but this just isn't happening....
I don't use browser disk caches so I searched my homedir for a cache of PNGs. It turns out the only place I hadn't unleashed pngout on was my .themes directory. There weren't too many files in there but I only have a few minutes to whip this together so someday I'll post more convincing results, but for the moment here's what I have...
299 files. Before pngout, took 468473 bytes of disk space, after pngout, took 164470 bytes. 304003 bytes saved. (64% savings) If you want to duplicate these tests, here's my terrible sample data. (Compressed paq8o2 tarball)
Multiply every unoptimized png downloaded from the web by every uncached browser hit it receives and realize how big of a deal this is...
Exciting stuff... PAQ8M has narrowed the gap with StuffIt in terms of lossless JPEG compression. For those unfamiliar with PAQ compression, I highly recommend checking it out. For the uninitiated you're probably best off checkout out LPAQ1 on your non-JPEG data first as it can run in a sane amount of time.
Ages ago I decided to come up with the craziest unit conversion request that google could answer. You'll all be pleased to know that 80 cubic light years in half teaspoons is a valid conversion.... Anyone have any better? Also... Cubic light years. Anyone beat me to typing that into Google?
So I've been recommending CBC Radio 3 Podcast to everyone I know. The only thing they're missing for their multi-format podcast (they even have OGGs!) would be CUE files. So here's my contribution: CBCR3_2007-06-01.cue Cue file for #106?
Ran into this strange link while digging up references. Also, nerd lolz: "Do not plan a bridge capacity by counting the number of people who swim across the river today" - Heard at a presentation.
While brainstorming ideas for testing the metascheduler I'm building I thought to look into data compressors again... Specifically the PAQ family of compressors. The latest update is PAQ8jc (fixed tarball). I whipped up an ebuild and took it town using Intel's C++ library. I tested it out on a 1.8M XML file:
reference 1.8M gzip -9 168K bzip -9 108K PAQ8jc -5 61K PAQ8jc -7 61K (2 bytes smaller, but longer runtime/memusage)
Okay... So this shows that if I feel like getting my hands dirty with C++, there's actually some value in parallelizing this algorithm.
Something that caught my eye while looking into this is XML-WRT. It's a fantastic project which scratches an itch I developped in the middle of a lecture on WebServices some time ago. XML-WRT can be thought to work in two distinct steps; substitute common tagnames, attributes etc with shortened tokens; run result through zlib or FastPAQ depending on user preference. I tested its WRTified zlib/fastpaq targets on a the 15M Locations.xml file from gnome-applets (wow that's big):
reference 15M gzip -9 2.0M bzip2 -9 1.2M xml-wrt -2 1.8M (zlib default after wrt) xml-wrt -3 1.7M (zlib best after wrt) xml-wrt -10 693K (FastPAQ normal) xml-wrt -11 693K (FastPAQ best)
I also tested it on a 684M XML database (the default buffer size is too small for dictionary generation on this particular file):
reference 684M gzip -9 102M bzip2 -9 74M xml-wrt -l10 ---- xml-wrt -l10 -b100 51M
What I want you to take away from this is that xml-wrt/PAQ is pretty slick and actually quite usable. xml-wrt -10 will actually complete in a sane timeframe. PAQ8jc on the same file however will take literally ages and probably won't serve any practical purpose for you...