how-fast-is-berkeley-db-sql-compared-to-sqlite

Solution

I participated in the beta evaluation of the BDB SQLite code and one of the things I tried to get a handle on was the performance difference. At this point, I cannot publish exactly what I found until I have at least one other person evaluate my code, run the tests, and confirm the numbers I got (which is being done). However, I can generalize here and say that there are cases where BDB offers significant performance improvements over SQLite, specifically in the area of handling heavy loads that involve write-concurrency.

There are, generally, two measures of "fast" right -- (1) efficiency: how long does it take for a single process to do XYZ vs. (2) concurrency: how many times can many processes do XYZ per unit time. The main problem BDB addresses is concurrency -- large-scale transaction processing. Thus you think of many concurrent connections writing to and/or modifying the contents of the database.

SQLite by design uses database-level locking so there is a maximum of one writer who can be working in the database at a time. Thus, SQLite's transaction rate stays more or less constant with the number of concurrent connections, so it's scalability in write-intensive applications is really measured by its efficiency (1).

BDB on the other hand uses page level locking, which allows multiple writers to be working in the database at a given time (provided that they are working on separate pages). Thus BDB's rate potentially increases with the number of connections and so its scalability is both a matter of efficiency (1) and concurrency (2), which can add up.

Mainly what it boils down to is (write) concurrency. BDB can push more TPS than SQLite for multiple writers. By transaction, I mean something that modifies the database (how are they of any real help for read-only operations?). That said, for read concurrency (apps that mainly do SELECTs), SQLite could very well go head to head with BDB because locking is no longer a critical issue.

As for the size of the dataset, I am not sure. I've not looked into that. Ultimately, they both use B-trees for storage. There may be factors in their respective implementations to consider, but I've not investigated that. I know that SQLite can gracefully handle data sets into the hundreds of MBs and double digit GBs (and perhaps more now that the dirty page map implementation has been changed).

Therefore, if you have an application which employs many connections that modify a given database and page contention is relatively low, then BDB can offer significant performance improvements. But page contention is a critical variable. In the limit, if you had a BDB database whose data consisted of a single page, then its performance would match that of SQLite in all cases because page-level locking here effectively degenerates into the equivalent of database level locking -- everybody is fighting over one thing. However, as the number of pages increases in BDB (and page contention decreases), then the maximum TPS will start to grow with the number of concurrent connections. Then from that point, memory becomes the next limiting factor. But that's another story.

BTW, I am in the process of writing an article about using BDB for those coming from SQLite.

Article links:

Oracle Berkeley DB SQL API vs. SQLite API – A Technical Evaluation

Oracle Berkeley DB SQL API vs. SQLite API – Integration, Benefits and Differences

2010-05-18

How fast is Berkeley DB SQL compared to SQLite?