Scanrail - Fotolia
A database legend, who authored core aspects of the traditional relational database, is now looking into the database future, beyond SQL, to improve the way databases fetch data for websites.
Jim Starkey recently founded a startup, which claims to have solved the thorny problem of running a distributed database under a familiar transactional interface. Now, he has another project, dubbed Amorphous, running in a unique type of home lab.
SearchITOperations caught up with Starkey to ask about this latest frontier in database technology, how it could change Web application development and what it all has to do with Edward Snowden.
Are there newer database projects you're working on?
Jim Starkey: I'm doing a private project, where I'm going to the opposite end. What I'm looking at is something that scales even to a greater degree than NuoDB, but doesn't use the SQL data model -- but still gives you ACID concurrency control on a lot more nonstandard data, so it can handle documents as well as it handles a table. And I think that's the direction the world needs to go.
What problem are you trying to solve there?
Starkey: It's pushing what I consider the state of the art in at least three different dimensions. One is that it's not relational -- it doesn't have tables; it just has records. Each record is self-describing, so you can just pour your data into it, and still be able to do regular transactional queries and updates.
Two, it's not a SQL language at all. Take, for example, an Amazon webpage. If that were being done against a SQL database, just about every link would require several SQL queries -- you're probably talking four or five dozen SQL queries per page. Even assuming the database system can answer all queries in zero time, the amount of latency between the virtual machine that's generating the page over the network to the database server and back, times four or five dozen queries, would make it completely untenable.
The access language and the API [are] designed so that you could get an arbitrary set of data -- everything you would need to generate that page in one round trip to the database server. You'd get back a hierarchy of results, everything in one hunk, which would have a huge effect on how websites are designed. Right now, you have a huge amount of caching locally, which is inefficient.
The third dimension is hard -- this is the hardest part about it. NuoDB has two types of computing nodes: transaction engines, which pump SQL transactions; and storage nodes, which store persistent copies of atoms. As every transaction engine will have a whole bunch of atoms in memory to replicate to everyone else who wants a copy, if they need an atom to execute a transaction, they'll try to find a node which has it in memory. If they don't have it in memory, they'll go on and ask the storage manager to suck it off its disk. This model basically involves bringing the data to the transaction.
For the type of data that NuoDB is designed for, that works very well. When you go beyond SQL and say, 'We're going to put Word documents, spreadsheets, any kind of data you can think of -- it's all going to be indexed, with search engine-type semantics, but all transactional.' You're now talking potential petabytes and exabytes of data, and the idea that you can bring everything to one site to execute a transaction just doesn't cut it.
So, what Amorphous has to do is keep track of all of the data, and rather than asking that node to send [data], [it] will say, 'Here's part of my larger statement; do what you can, send it off to the next guy and then return it back to me.' It's a very hard problem, and I'm in the middle of it, where I have stuff working, but that's the problem I'm taking on.
Where are you working on this -- in a home lab?
Starkey: Here's the interesting thing: I decided I couldn't really stand buying another rack full of pizza-box servers to put in my house ... so the development environment is Raspberry Pis. It's really cute. There's no cooling, no fans, you don't need anything but the wires to the power supplies and we're in business. [Laughs.]
So, how do you bring the transaction to the data?
Starkey: That is a very complicated question. That was the first problem I had to solve. I'm kind of embarrassed to say that when the news came out about [Edward] Snowden, and he started revealing all this stuff that the [National Security Agency] was collecting, my first reaction was, 'That's an infinite amount of data, how can they possibly make sense of so much data?' I thought about that problem for a while, and then thought, 'Oh, that's how you do it.' That's where Amorphous came from. You basically have to export the transaction environment with the query, so the other guy can know the rules that would be applied if it were running locally. You export a very carefully chosen part of the transaction state to the other nodes, about how to interpret the data in this machine in the context of a transaction running in another node. Someday, I hope to get this technology into NuoDB -- but I hope to make it work first.
Want more forecasts for the database future?
In part one of this interview, read Starkey's take on why traditional transactional databases and newer NoSQL designs aren't up to the task of supporting modern applications.
What are the top database security tools on the market?
How NoSQL databases can help big data users
Choosing which open source database will work for you