So I'm blogging, again...

Brasso

Team Canada needs some more muscle over at Project Euler. There are no Canadians at 100% genius level. Currently rayfil is at 96% but that's the closest we have. Although you have to give him extra points for writing his solutions in Assembler. I answered a few this weekend myself but I'm back to the grind again now so I'll have to put that pass time on the back burner for a while.

PNG Rant

I've been running wallpapers I find online through pngout for quite some time now and am thoroughly impressed with the results. I unleash it upon /usr/share/pixmaps/*png yearly as well. How can one not get satisfaction out of making a file smaller and losing nothing in the process? This habit has left me a little shocked with the results I've been gathering. The subjective observation I'm left with is that over 90% of the images I find lying around can be recompressed losslessly with a fairly sizable saving. It should be standard procedure for a publish-once, subscribed-many medium to ensure optimal size to save on bandwidth, but this just isn't happening....

Quickly, pngout is a closed-source png written by Ken Silverman (yep, of Ken's Labyrinths fame) recompressor which pretty much always beats optipng -o7, pngcrush and advpng in terms of filesize.

I don't use browser disk caches so I searched my homedir for a cache of PNGs. It turns out the only place I hadn't unleashed pngout on was my .themes directory. There weren't too many files in there but I only have a few minutes to whip this together so someday I'll post more convincing results, but for the moment here's what I have...

299 files. Before pngout, took 468473 bytes of disk space, after pngout, took 164470 bytes. 304003 bytes saved. (64% savings) If you want to duplicate these tests, here's my terrible sample data. (Compressed paq8o2 tarball)

Multiply every unoptimized png downloaded from the web by every uncached browser hit it receives and realize how big of a deal this is...

months and months of PAQ

Exciting stuff... PAQ8M has narrowed the gap with StuffIt in terms of lossless JPEG compression. For those unfamiliar with PAQ compression, I highly recommend checking it out. For the uninitiated you're probably best off checkout out LPAQ1 on your non-JPEG data first as it can run in a sane amount of time.

Google Calculator... an excellent desktop application

Ages ago I decided to come up with the craziest unit conversion request that google could answer. You'll all be pleased to know that 80 cubic light years in half teaspoons is a valid conversion.... Anyone have any better? Also... Cubic light years. Anyone beat me to typing that into Google?

CBC Radio 3 Podcast CUE files...

So I've been recommending CBC Radio 3 Podcast to everyone I know. The only thing they're missing for their multi-format podcast (they even have OGGs!) would be CUE files. So here's my contribution: CBCR3_2007-06-01.cue Cue file for #106?

Strange Research Encounters

Ran into this strange link while digging up references. Also, nerd lolz: "Do not plan a bridge capacity by counting the number of people who swim across the river today" - Heard at a presentation.

WRT PAQ XML Compression

While brainstorming ideas for testing the metascheduler I'm building I thought to look into data compressors again... Specifically the PAQ family of compressors. The latest update is PAQ8jc (fixed tarball). I whipped up an ebuild and took it town using Intel's C++ library. I tested it out on a 1.8M XML file:

   reference   1.8M
   gzip -9     168K
   bzip -9     108K
   PAQ8jc -5   61K
   PAQ8jc -7   61K (2 bytes smaller, but longer runtime/memusage)

Okay... So this shows that if I feel like getting my hands dirty with C++, there's actually some value in parallelizing this algorithm.

Something that caught my eye while looking into this is XML-WRT. It's a fantastic project which scratches an itch I developped in the middle of a lecture on WebServices some time ago. XML-WRT can be thought to work in two distinct steps; substitute common tagnames, attributes etc with shortened tokens; run result through zlib or FastPAQ depending on user preference. I tested its WRTified zlib/fastpaq targets on a the 15M Locations.xml file from gnome-applets (wow that's big):

   reference   15M 
   gzip -9     2.0M
   bzip2 -9    1.2M
   xml-wrt -2  1.8M (zlib default after wrt)
   xml-wrt -3  1.7M (zlib best after wrt)
   xml-wrt -10 693K (FastPAQ normal)
   xml-wrt -11 693K (FastPAQ best)

I also tested it on a 684M XML database (the default buffer size is too small for dictionary generation on this particular file):

reference           684M
gzip -9             102M
bzip2 -9             74M
xml-wrt -l10        ----
xml-wrt -l10 -b100   51M

What I want you to take away from this is that xml-wrt/PAQ is pretty slick and actually quite usable. xml-wrt -10 will actually complete in a sane timeframe. PAQ8jc on the same file however will take literally ages and probably won't serve any practical purpose for you...