Lucene and the App Engine Datastore, They Play Nice After All
As it turns out, most of my troubles to date with getting Lucene to save “files” to the datastore have actually not been troubles with the datastore interaction at all. The problem was in fact more basic, I was saving the “file” in Lucene’s pre-commit phase and not in it’s actual-commit phase. This caused me all kinds of craziness because I could tell the Array of Bytes being written to the “file” and then subsequently read from the “file” were exactly the same and yet the checksums didn’t match. Huh? Yeah, I know, So here’s the skinny.
Lucene when initializing an IndexWriter, at runtime, tests to make sure it is able write a Long to the IndexOutput. In order to do this it intentionally saves the checksum - 1 and then flushes the IndexOutput. I’m still not totally clear on the intention of IndexOutput::flush, because I thought it was flush to file system which was way off. Changing IndexOutput so the write to file system action happens in close rather than in flush solved this mind bender. The best part of debugging this was definitely finding the line with:
output.writeLong(checksum - 1);
After this I performed a few small refactorings so DatastoreFiles are backed by Lists rather than Arrays and only send Arrays with the datastore Entitys for persisting.