Log in

No account? Create an account

Previous Entry | Next Entry

MLP:RiM Episode 2

Protip: when modifying values returned by a memoized function, make sure you're appending copies of the table/cache values, and not references or pointers into the table. I managed to get burned by this in two different languages. Notably, having your core function return a copy of what it's been working on won't help, if its memoized cousin is returning references that you call ++ or .append() on.

Relatedly, the list version of my Python edit-finding program works now. It's 100x slower than the string version, though. This despite the fact that the string version is actually repeatedly copy-modifying strings it might not need to, while the list defers any copying until its made its final choice. I'm guessing slicing lists sucks. It was also using tons of memory earlier than other versions.

I'd wondered if the Python optimization problems in the previous RiM post were actually algorithm problems. Happily I had a C version right there to check. Or after a bit of work, multiple C versions.

Without memoization, C starts off as 3^n, as analysis expects, then seems to slide to the 5^n of the unmemoed Python versionn. With memo, simple distance finding goes up 3x for every 2x in input, which is better than the 4x expected... up to a point, when it slides to 5x. I suspect CPU cache overflow, but that's a cargo cult answer, I can't visualize what's going on. Python distance finding is 14x slower than C for N=200, 150x slower for N=1600.

Getting the C version to generate paths led to some exciting hours of debugging. This problem has weird boundary conditions (that's not just me being lazy: Skiena's version assumes you've padded your input strings with a beginning space) so there's a lot of -1 in my code. Which was fine for simple distance finding, where I could just return a number when the input was -1, but when I ended up trying to use -1 as an index into a C array... not so good. Sadly, it *does* work often enough to be hard to pin down.

And then, I actually had been copying my return values (see the beginning), but changed that in my flailing attempts to find the mysterious segfaults, which combined with what had been concise and elegant code, led to incrementing my table values.

But all that was eventually fixed, and it scales roughly quadratically. The last doubling takes 6.5x longer than the previous, which is still better than O(n^3). It also takes 3.6G of RAM which is why it's the last step; *memory* use certainly is cubic in this implementation, with N^2 entries of char[N] arrays. At modest N it's 5x slower than not finding strings, and 5x faster than the fastest Python version that does.

Of course, I've just been trusting that lru_cache is actually smart. Might be worth trying an manually memoed Python version, using a list of lists. *Pre-allocated* list of lists, probably.

* Memoization is definitely an easy way of turning a slow algorithm into a fast one
* ...not so easy in pure C, though my final code is still pretty simple (but not re-entrant, if I tried making a library out of it.)
* It's definitely not a sure path to being optimal, or even a constant factor of optimal
* ...especially if using a language that makes it easy, at the cost of who knows what mechanisms chewing up time or memory.

I still think it'd be useful at least as a stepping stone to testing more optimal versions, being able to run on much bigger input than naive code while still being high-probability correct. Though in this problem, there can be multiple optimal-length edits, so the path returned by two correct programs needn't be the same. Ah well.

See the comment count unavailable DW comments at http://mindstalk.dreamwidth.org/439212.html#comments


Damien Sullivan

Latest Month

September 2017


Powered by LiveJournal.com
Designed by Lilia Ahner