I participated in the beta evaluation of the BDB SQLite code and one of the
things I tried to get a handle on was the performance difference. At this point,
I cannot publish exactly what I found until I have at least one other person
evaluate my code, run the tests, and confirm the numbers I got (which is being
done). However, I can generalize here and say that there are cases where BDB
offers significant performance improvements over SQLite, specifically in the
area of handling heavy loads that involve write-concurrency.
There are, generally, two measures of "fast" right -- (1) efficiency: how long
does it take for a single process to do XYZ vs. (2) concurrency: how many times
can many processes do XYZ per unit time. The main problem BDB addresses is
concurrency -- large-scale transaction processing. Thus you think of many
concurrent connections writing to and/or modifying the contents of the database.
SQLite by design uses database-level locking so there is a maximum of one
writer who can be working in the database at a time. Thus, SQLite's transaction
rate stays more or less constant with the number of concurrent connections, so
it's scalability in write-intensive applications is really measured by its
efficiency (1).
BDB on the other hand uses page level locking, which allows multiple writers to
be working in the database at a given time (provided that they are working on
separate pages). Thus BDB's rate potentially increases with the number of
connections and so its scalability is both a matter of efficiency (1) and
concurrency (2), which can add up.
Mainly what it boils down to is (write) concurrency. BDB can push more TPS than
SQLite for multiple writers. By transaction, I mean something that modifies the
database (how are they of any real help for read-only operations?). That said,
for read concurrency (apps that mainly do SELECTs), SQLite could very well go
head to head with BDB because locking is no longer a critical issue.
As for the size of the dataset, I am not sure. I've not looked into
that. Ultimately, they both use B-trees for storage. There may be factors in
their respective implementations to consider, but I've not investigated that. I
know that SQLite can gracefully handle data sets into the hundreds of MBs and
double digit GBs (and perhaps more now that the dirty page map implementation
has been changed).
Therefore, if you have an application which employs many connections that modify
a given database and page contention is relatively low, then BDB can offer
significant performance improvements. But page contention is a critical
variable. In the limit, if you had a BDB database whose data consisted of a
single page, then its performance would match that of SQLite in all cases
because page-level locking here effectively degenerates into the equivalent of
database level locking -- everybody is fighting over one thing. However, as the
number of pages increases in BDB (and page contention decreases), then the
maximum TPS will start to grow with the number of concurrent connections. Then
from that point, memory becomes the next limiting factor. But that's another
story.
BTW, I am in the process of writing an article about using BDB for those coming
from SQLite.
Article links:
Oracle Berkeley DB SQL API vs. SQLite API – A Technical Evaluation
Oracle Berkeley DB SQL API vs. SQLite API – Integration, Benefits and Differences