SQLite3 versus Ext4 - A Simple Benchmark on Ubuntu

2023-06-05 | 📖 8 min read | 阅读：次

🔍 Introduction

SQLite is a popular lightweight database engine often used in embedded systems, mobile apps, and local desktop applications. But is it actually faster than plain file I/O on Linux filesystems like ext4?

In this blog post, I’ll benchmark SQLite vs. raw CSV file operations (read/write/filter) on an Ubuntu system running ext4. I’ll also explain how SQLite interacts with the filesystem, why it might be slower or faster, and under what conditions its advantages become obvious.

📦 Filesystem and Setup

Operating System: Ubuntu 22.04 LTS
Filesystem: EXT4 (journaling filesystem)
Disk: SSD
Python: 3.10+
SQLite version: 3.37+

🧠 How SQLite Works with the File System

SQLite stores all its data in a single disk file, which it manages internally using a custom B-tree structure. When you insert or query data, SQLite:

Opens a file descriptor to the .db file
Reads and writes data pages (typically 4KB each)
Uses journaling or WAL (Write-Ahead Logging) to ensure atomicity and durability
Issues fsync() calls to flush changes to disk safely

🔍 What About ext4?

EXT4 is a journaling filesystem.
When you write a text file (e.g. CSV), it’s written sequentially into a file.
There’s no structure, indexing, or optimization for queries or concurrent access.
Ext4 doesn’t understand your data — it’s just a byte stream.

⚔️ SQLite vs. ext4 CSV File: When Is SQLite Faster?

Operation	SQLite	CSV + ext4
Sequential Write	Slower (has transaction overhead)	Faster (direct write)
Full Read	Comparable	Comparable
Filtered Query	✅ Much Faster (uses SQL engine)	❌ Slow (has to scan everything)

SQLite becomes faster when structured access is needed — for example:

Filtered queries (WHERE, LIMIT)
Joins
Concurrent readers/writers
Frequent inserts + reads
Querying subsets of large data

🧪 Benchmarking on ext4 with Python

Let’s test 3 operations:

Full write of 100,000 records
Full read
Filtered query (WHERE id > 99000)

🧬 Benchmark Code

import sqlite3
import time
import csv
import os

N = 100000
DATA = [(i, f"Name_{i}") for i in range(N)]

# --------- WRITE BENCHMARKS ---------
def benchmark_file_write():
    filename = "test_data.csv"
    with open(filename, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(["id", "name"])
        writer.writerows(DATA)

def benchmark_sqlite_write():
    db = "test.db"
    conn = sqlite3.connect(db)
    conn.execute("PRAGMA journal_mode=WAL;")
    conn.execute("PRAGMA synchronous=NORMAL;")
    c = conn.cursor()
    c.execute("DROP TABLE IF EXISTS users")
    c.execute("CREATE TABLE users (id INTEGER, name TEXT)")
    c.executemany("INSERT INTO users VALUES (?, ?)", DATA)
    conn.commit()
    conn.close()

# --------- READ BENCHMARKS ---------
def benchmark_file_read():
    filename = "test_data.csv"
    start = time.time()
    with open(filename, 'r') as f:
        reader = csv.reader(f)
        next(reader)
        data = [row for row in reader]
    end = time.time()
    return end - start

def benchmark_sqlite_read():
    db = "test.db"
    start = time.time()
    conn = sqlite3.connect(db)
    c = conn.cursor()
    c.execute("SELECT * FROM users")
    data = c.fetchall()
    conn.close()
    end = time.time()
    return end - start

# --------- FILTERED QUERY BENCHMARKS ---------
def benchmark_file_query():
    filename = "test_data.csv"
    start = time.time()
    with open(filename, 'r') as f:
        reader = csv.reader(f)
        next(reader)
        data = [row for row in reader if int(row[0]) > 99000]
    end = time.time()
    return end - start

def benchmark_sqlite_query():
    db = "test.db"
    conn = sqlite3.connect(db)
    c = conn.cursor()
    start = time.time()
    c.execute("SELECT * FROM users WHERE id > 99000")
    data = c.fetchall()
    end = time.time()
    conn.close()
    return end - start

# --------- RUN ALL BENCHMARKS ---------
benchmark_file_write()
benchmark_sqlite_write()

file_read_time = benchmark_file_read()
sqlite_read_time = benchmark_sqlite_read()

file_query_time = benchmark_file_query()
sqlite_query_time = benchmark_sqlite_query()

# --------- CLEANUP ---------
os.remove("test_data.csv")
os.remove("test.db")

# --------- PRINT RESULTS ---------
print(f"\n=== Read Benchmark ===")
print(f"File read time:   {file_read_time:.4f} seconds")
print(f"SQLite read time: {sqlite_read_time:.4f} seconds")
read_diff = file_read_time - sqlite_read_time
read_percent = (read_diff / file_read_time) * 100
if read_percent > 0:
    print(f"SQLite is {read_percent:.2f}% faster than file system (read)")
else:
    print(f"File system is {-read_percent:.2f}% faster than SQLite (read)")

print(f"\n=== Filtered Query Benchmark ===")
print(f"File query time:   {file_query_time:.4f} seconds")
print(f"SQLite query time: {sqlite_query_time:.4f} seconds")
query_diff = file_query_time - sqlite_query_time
query_percent = (query_diff / file_query_time) * 100
if query_percent > 0:
    print(f"SQLite is {query_percent:.2f}% faster than file system (filtered query)")
else:
    print(f"File system is {-query_percent:.2f}% faster than SQLite (filtered query)")

📊 Benchmark Results (on EXT4)

=== Read Benchmark ===
File read time:   0.0305 seconds
SQLite read time: 0.0304 seconds
SQLite is 0.25% faster than file system (read)

=== Filtered Query Benchmark ===
File query time:   0.0188 seconds
SQLite query time: 0.0028 seconds
SQLite is 85.12% faster than file system (filtered query)

📌 Interpretation

Write performance: Flat file wins — it’s raw and simple.
Full read: No major difference. SQLite and CSV both stream quickly.
Filtered query: SQLite dominates — thanks to efficient query execution and internal optimizations.

🧠 Final Thoughts

SQLite isn’t faster in all cases. In fact, flat files on ext4 are faster for simple write-heavy tasks. But when structure, queries, or filtered access matter, SQLite wins by a large margin.

When to Use CSV + ext4:

Append-only logs
Archival exports
Simplicity and portability

When to Use SQLite:

Filtering/querying data
Frequent reads + writes
Need for data integrity
Multi-user access

✅ Summary

Feature	CSV on ext4	SQLite
Write speed	✅ Very fast	❌ Slower (WAL/ACID)
Full read	⚖️ About equal	⚖️ About equal
Filtered queries	❌ Very slow	✅ Very fast
ACID compliance	❌ No	✅ Yes
Query language	❌ None	✅ SQL
Binary format	❌ Plaintext	✅ Compact/efficient

If you’re running on ext4 and need structured queries, go with SQLite. If you just need to dump and store data as fast as possible, CSV may be fine.

Bean Huo

Bean Huo

SQLite3 versus Ext4 - A Simple Benchmark on Ubuntu

🔍 Introduction

📦 Filesystem and Setup

🧠 How SQLite Works with the File System

🔍 What About ext4?

⚔️ SQLite vs. ext4 CSV File: When Is SQLite Faster?

🧪 Benchmarking on ext4 with Python

🧬 Benchmark Code

📊 Benchmark Results (on EXT4)

📌 Interpretation

🧠 Final Thoughts

When to Use CSV + ext4:

When to Use SQLite:

✅ Summary

🔍 Introduction

📦 Filesystem and Setup

🧠 How SQLite Works with the File System

🔍 What About ext4?

⚔️ SQLite vs. ext4 CSV File: When Is SQLite Faster?

🧪 Benchmarking on ext4 with Python

🧬 Benchmark Code

📊 Benchmark Results (on EXT4)

📌 Interpretation

🧠 Final Thoughts

When to Use CSV + ext4:

When to Use SQLite:

✅ Summary

Share this post: