They aren’t really frontiers anymore, but I’m learning more about NoSQL databases and use cases–specifically, what the best way to create a better fossil specimen database is. The use case would be a simple “what do we have and where is it” database for the University of North Dakota paleo collections.
Ideal features are
- Easy replication or download, in case someone wants to run an analysis on all or part of the database.
- Easy and or flexible query capabilities. I’m getting the impression I get one or the other.
- Easy import.
- Versioning would be nice.
- Scalability (in the end, 500,000 specimens).
So far CouchDB and MongoDB are my frontrunners, and even though I think CouchDB has more of the features I’d like to use, the supposedly easier querying in MongoDB has me intrigued. This is definitely not going to be a post comparing the two. It’s possible that MySQL is the right choice, and I am re-exploring that possibility. I’m beginning to shy away from NodakPaleo as it exists right now, because Drupal adds a layer of complexity between researchers and the data. Sure, it’s nice to have all the extras and be able to build a View that you can query a certain way via GUI, but ideally the databse would have an API that could be used by researchers for more interesting things–mapping exercises, mashups with different datasets, etc. For that to happen, I think a JSON/BSON/XML document database may be the best choice in the end.
I just want people to be able to talk to it and use it in a way that makes sense to them. Continuing on, I hope to add more comprehensible thoughts about each option and how I am learning (because I know very little about these systems).
Finally, if you want to see where my free time went during 2007, check out SpecimenDB, the PHP/MySQL version I created. Much of the backend I scraped from ‘PHP and MySQL for Dummies,’ but apparently I learned a lot that summer. This is a new installation, so it obviously doesn’t have any information in it.
