Quick and dirty file-based database (for Java)

That was harder than I expected. Background: While mining for a collaborative recommendation tool from my delicious links, I ran into some performance issues trying to cram a few megs of data into a java Preferences file – especially when I had it saving every time I added something.  Ok, time to stop using a hack, time to start using a “real” solution.  Saga follows.
  1. I’m spoiled by PHP: mysql is there if you need something fast and robust, and you can always use sqlite if you want something fast and thrown togther with no relational integrity.  Plus it saves to a local small file, perfect for distribution!  So I start out looking for a small, native, easy persistent storage solution.
  2. Java… database… how about JavaDB?  Do you need the person you send your file to have already installed JavaDB?  Can’t find out.  Still searching.
  3. Ah-ha.  It seems that the standard is hsqldb.     Ok, trying hsqldb, it is “what they use for open office” so it can’t be that bad, right?  I give it a run – except, my data keeps vanishing!  Where did the data go once the program runs?  (Hello, data? are you there?)  Is it because the default table type is “Memory?”  Turns out that is a red herring – “Memory” just means an efficient type, if you want to save anything (why would you ever want to do that?)  you have to explicitly close the DB.    Grr.  And it dumps the data out to a huge SQL text file with bulky inserts for every row.  Not thrilled.  Doesn’t seem designed for persistence.  And it is a DB.  I’m confused.
  4. Digging through posts – how about hsqldb attached to CSV files?  Crashes the machine.  Ok, frustrated with hsqldb.
  5. I always liked sqlite for PHP, giving it another shot for Java.  There are several drivers all of which say they are the “better” one, and some say native code, which scares this Java coder.  I’m not putting a DLL in some random directory.   Finally settled on the first hit in google.
  6. Prepared statements work great, until they have an error, at which point they stop working.  Which isn’t so hot if you are intending for the DB to catch “is unique” constraint violations.   Can’t find a bug tracker on the home page.  Email the owner.
  7. Ok, doing it the “right” way and searching if a value exists before inserting it.   Slowest… inserts… ever.   Averaging 1 per second, not cool.
  8. Dig around in the forums.  Some people suggested a few optimizations involving dividing my total ram by… er, no.  I want to hand this Jar file to someone, no basing it on my computer!
  9. Try some journaling/locking pragma statements.  Data vanishes completely from the sqlite file.  Echos of hsqldb.  Starting to lose hope.
  10. Finally settle on what feels like a hack: Turn off Auto-Commit, and then manually call commit after a batch of inserts.  Seems to help enough to keep sqlite usable, but NOT thrilled that I have to weave in commit statements in the code.
  11. Back to the real task of collaborative filtering!
Hoping delicious.com doesn’t notice me, keeping the scraping to under 1 hit per second like their API in an attempt to stay under the radar/be nice.

Originally Posted: December 15th, 2008

Comments

Popular posts from this blog

Visualizing the user post migration paths across Reddit to extract linked communities

Why people don't like Product Managers

Inability to visualize the scale of our national debt