🔍 Introduction
SQLite is a popular lightweight database engine often used in embedded systems, mobile apps, and local desktop applications. But is it actually faster than plain file I/O on Linux filesystems like ext4?
In this blog post, I’ll benchmark SQLite vs. raw CSV file operations (read/write/filter) on an Ubuntu system running ext4. I’ll also explain how SQLite interacts with the filesystem, why it might be slower or faster, and under what conditions its advantages become obvious.
📦 Filesystem and Setup
- Operating System: Ubuntu 22.04 LTS
- Filesystem: EXT4 (journaling filesystem)
- Disk: SSD
- Python: 3.10+
- SQLite version: 3.37+
🧠 How SQLite Works with the File System
SQLite stores all its data in a single disk file, which it manages internally using a custom B-tree structure. When you insert or query data, SQLite:
- Opens a file descriptor to the
.db
file - Reads and writes data pages (typically 4KB each)
- Uses journaling or WAL (Write-Ahead Logging) to ensure atomicity and durability
- Issues fsync() calls to flush changes to disk safely
🔍 What About ext4?
- EXT4 is a journaling filesystem.
- When you write a text file (e.g. CSV), it’s written sequentially into a file.
- There’s no structure, indexing, or optimization for queries or concurrent access.
- Ext4 doesn’t understand your data — it’s just a byte stream.
⚔️ SQLite vs. ext4 CSV File: When Is SQLite Faster?
Operation | SQLite | CSV + ext4 |
---|---|---|
Sequential Write | Slower (has transaction overhead) | Faster (direct write) |
Full Read | Comparable | Comparable |
Filtered Query | ✅ Much Faster (uses SQL engine) | ❌ Slow (has to scan everything) |
SQLite becomes faster when structured access is needed — for example:
- Filtered queries (
WHERE
,LIMIT
) - Joins
- Concurrent readers/writers
- Frequent inserts + reads
- Querying subsets of large data
🧪 Benchmarking on ext4 with Python
Let’s test 3 operations:
- Full write of 100,000 records
- Full read
- Filtered query (
WHERE id > 99000
)
🧬 Benchmark Code
import sqlite3
import time
import csv
import os
N = 100000
DATA = [(i, f"Name_{i}") for i in range(N)]
# --------- WRITE BENCHMARKS ---------
def benchmark_file_write():
filename = "test_data.csv"
with open(filename, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(["id", "name"])
writer.writerows(DATA)
def benchmark_sqlite_write():
db = "test.db"
conn = sqlite3.connect(db)
conn.execute("PRAGMA journal_mode=WAL;")
conn.execute("PRAGMA synchronous=NORMAL;")
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS users")
c.execute("CREATE TABLE users (id INTEGER, name TEXT)")
c.executemany("INSERT INTO users VALUES (?, ?)", DATA)
conn.commit()
conn.close()
# --------- READ BENCHMARKS ---------
def benchmark_file_read():
filename = "test_data.csv"
start = time.time()
with open(filename, 'r') as f:
reader = csv.reader(f)
next(reader)
data = [row for row in reader]
end = time.time()
return end - start
def benchmark_sqlite_read():
db = "test.db"
start = time.time()
conn = sqlite3.connect(db)
c = conn.cursor()
c.execute("SELECT * FROM users")
data = c.fetchall()
conn.close()
end = time.time()
return end - start
# --------- FILTERED QUERY BENCHMARKS ---------
def benchmark_file_query():
filename = "test_data.csv"
start = time.time()
with open(filename, 'r') as f:
reader = csv.reader(f)
next(reader)
data = [row for row in reader if int(row[0]) > 99000]
end = time.time()
return end - start
def benchmark_sqlite_query():
db = "test.db"
conn = sqlite3.connect(db)
c = conn.cursor()
start = time.time()
c.execute("SELECT * FROM users WHERE id > 99000")
data = c.fetchall()
end = time.time()
conn.close()
return end - start
# --------- RUN ALL BENCHMARKS ---------
benchmark_file_write()
benchmark_sqlite_write()
file_read_time = benchmark_file_read()
sqlite_read_time = benchmark_sqlite_read()
file_query_time = benchmark_file_query()
sqlite_query_time = benchmark_sqlite_query()
# --------- CLEANUP ---------
os.remove("test_data.csv")
os.remove("test.db")
# --------- PRINT RESULTS ---------
print(f"\n=== Read Benchmark ===")
print(f"File read time: {file_read_time:.4f} seconds")
print(f"SQLite read time: {sqlite_read_time:.4f} seconds")
read_diff = file_read_time - sqlite_read_time
read_percent = (read_diff / file_read_time) * 100
if read_percent > 0:
print(f"SQLite is {read_percent:.2f}% faster than file system (read)")
else:
print(f"File system is {-read_percent:.2f}% faster than SQLite (read)")
print(f"\n=== Filtered Query Benchmark ===")
print(f"File query time: {file_query_time:.4f} seconds")
print(f"SQLite query time: {sqlite_query_time:.4f} seconds")
query_diff = file_query_time - sqlite_query_time
query_percent = (query_diff / file_query_time) * 100
if query_percent > 0:
print(f"SQLite is {query_percent:.2f}% faster than file system (filtered query)")
else:
print(f"File system is {-query_percent:.2f}% faster than SQLite (filtered query)")
📊 Benchmark Results (on EXT4)
=== Read Benchmark ===
File read time: 0.0305 seconds
SQLite read time: 0.0304 seconds
SQLite is 0.25% faster than file system (read)
=== Filtered Query Benchmark ===
File query time: 0.0188 seconds
SQLite query time: 0.0028 seconds
SQLite is 85.12% faster than file system (filtered query)
📌 Interpretation
- Write performance: Flat file wins — it’s raw and simple.
- Full read: No major difference. SQLite and CSV both stream quickly.
- Filtered query: SQLite dominates — thanks to efficient query execution and internal optimizations.
🧠 Final Thoughts
SQLite isn’t faster in all cases. In fact, flat files on ext4 are faster for simple write-heavy tasks. But when structure, queries, or filtered access matter, SQLite wins by a large margin.
When to Use CSV + ext4:
- Append-only logs
- Archival exports
- Simplicity and portability
When to Use SQLite:
- Filtering/querying data
- Frequent reads + writes
- Need for data integrity
- Multi-user access
✅ Summary
Feature | CSV on ext4 | SQLite |
---|---|---|
Write speed | ✅ Very fast | ❌ Slower (WAL/ACID) |
Full read | ⚖️ About equal | ⚖️ About equal |
Filtered queries | ❌ Very slow | ✅ Very fast |
ACID compliance | ❌ No | ✅ Yes |
Query language | ❌ None | ✅ SQL |
Binary format | ❌ Plaintext | ✅ Compact/efficient |
If you’re running on ext4 and need structured queries, go with SQLite. If you just need to dump and store data as fast as possible, CSV may be fine.