PostgreSQL 发展研究与展望

最后更新于:2025-11-21 02:14:13

The Evolution, Architecture, and Future Trajectory of PostgreSQL: An Exhaustive Technical Analysis

PostgreSQL 的演变、架构与未来轨迹:详尽技术分析

1. Introduction: The Paradigm Shift of the Post-Relational Era

1. 简介:后关系时代的范式转变

The history of database management systems (DBMS) is a narrative of continuous adaptation to the changing nature of data. While the 1970s were defined by the ascendancy of the Relational Model—championed by E.F. Codd and realized through systems like IBM's System R and the original INGRES—the 1980s presented a new set of challenges that rigid relational structures could not easily address. The genesis of PostgreSQL represents a critical pivot point in computer science history, where the theoretical purity of the relational model was deliberately expanded to accommodate the complexities of real-world objects. Originating from the computer science laboratories of the University of California, Berkeley, under the stewardship of Michael Stonebraker, the project initially known as POSTGRES was explicitly designed to define the "post-Ingres" era.1

数据库管理系统(DBMS)的历史是不断适应数据本质变化的历史。虽然 20 世纪 70 年代以关系模型的兴起为特征——由 E.F. Codd 倡导并通过 IBM 的 System R 和最初的 INGRES 等系统实现——但 20 世纪 80 年代提出了一系列刚性关系结构难以解决的新挑战。PostgreSQL 的起源代表了计算机科学史上的一个关键转折点,关系模型的理论纯洁性在此被刻意扩展,以适应现实世界对象的复杂性。该项目最初被称为 POSTGRES,源于加州大学伯克利分校的计算机科学实验室,在 Michael Stonebraker 的指导下,明确设计用于定义“后 Ingres”时代 1。

The impetus for POSTGRES was the "object-relational" impedance mismatch—a dissonance between how application developers modeled data in object-oriented programming languages and how databases stored it in flat tables. Stonebraker’s team sought to bridge this gap not by discarding the relational model, but by extending it. This ambition set the stage for a system that would not merely store data but would understand its structure through a rich, extensible type system. Today, nearly four decades later, PostgreSQL stands not merely as a relational database but as a universal data platform that underpins massive cloud architectures, geospatial systems, and increasingly, vector-based artificial intelligence workloads. This report provides an exhaustive technical examination of PostgreSQL, tracing its lineage from a university prototype to the "World's Most Advanced Open Source Relational Database," analyzing the architectural decisions that ensured its longevity, and projecting its future trajectory in the age of cloud-native storage and generative AI.1

POSTGRES 的动力来自“对象-关系”阻抗不匹配——即应用程序开发人员在面向对象编程语言中建模数据的方式与数据库在平面表中存储数据的方式之间的不协调。Stonebraker 的团队试图通过扩展关系模型而不是抛弃它来弥合这一差距。这一雄心壮志为这样一个系统奠定了基础:它不仅存储数据,而且通过丰富、可扩展的类型系统理解数据的结构。近四十年后的今天,PostgreSQL 不仅仅是一个关系数据库,而且是一个支撑大规模云架构、地理空间系统以及日益增长的基于向量的人工智能工作负载的通用数据平台。本报告对 PostgreSQL 进行了详尽的技术审查,追溯其从大学原型到“世界上最先进的开源关系数据库”的谱系,分析确保其长寿的架构决策,并预测其在云原生存储和生成式人工智能时代的未来轨迹 1。

2. The Berkeley Origins (1986–1994): The Design of POSTGRES

2. 伯克利起源 (1986–1994):POSTGRES 的设计

2.1 The Theoretical Foundation and "Object-Relational" Ambition

2.1 理论基础与“对象-关系”雄心

The POSTGRES project officially began in 1986 at UC Berkeley, funded by the Defense Advanced Research Projects Agency (DARPA), the Army Research Office (ARO), and the National Science Foundation (NSF).5 The primary motivation, as articulated in the seminal papers "The Design of POSTGRES" and subsequent technical reports, was to address the inherent limitations of the relational model as implemented in existing systems like INGRES and commercial counterparts.2 Specifically, the research team identified that traditional relational databases were incapable of efficiently handling complex objects—such as geometric shapes, CAD/CAM data, text documents, and arrays—and lacked a robust, integrated rules system.2

POSTGRES 项目于 1986 年在加州大学伯克利分校正式启动,由国防高级研究计划局 (DARPA)、陆军研究办公室 (ARO) 和国家科学基金会 (NSF) 资助 5。正如开创性论文《POSTGRES 的设计》及随后的技术报告所述,其主要动机是解决现有系统(如 INGRES 和商业同类系统)中实施的关系模型的固有局限性 2。具体而言,研究团队发现,传统的关系数据库无法有效地处理复杂对象(如几何形状、CAD/CAM 数据、文本文档和数组),并且缺乏健壮、集成的规则系统 2。

Stonebraker and his colleagues, including Lawrence A. Rowe, envisioned a system that introduced "object-relational" capabilities. This was a radical departure from the "pure" object-oriented database systems (OODBMS) emerging at the time, which often discarded SQL and the relational math entirely. Instead, POSTGRES proposed that data should remain in relations (tables), but the system must support inheritance, complex abstract data types (ADTs), and user-defined functions. This implied that the database engine itself could be taught new data types by the user, a feature that remains the cornerstone of PostgreSQL's extensibility today.1

Stonebraker 和他的同事(包括 Lawrence A. Rowe)构想了一个引入“对象-关系”功能的系统。这与当时出现的“纯”面向对象数据库系统 (OODBMS) 截然不同,后者通常完全抛弃 SQL 和关系数学。相反,POSTGRES 提出数据应保留在关系(表)中,但系统必须支持继承、复杂的抽象数据类型 (ADT) 和用户定义函数。这意味着数据库引擎本身可以由用户教授新的数据类型,这一特性至今仍是 PostgreSQL 可扩展性的基石 1。

2.2 The "No-Overwrite" Storage Manager and Time Travel

2.2 “无覆盖”存储管理器与时间旅行

A defining and controversial characteristic of the early POSTGRES architecture was its approach to storage management. Unlike contemporary systems (and modern PostgreSQL) that utilized a Write-Ahead Log (WAL) coupled with in-place updates to ensure durability, the original POSTGRES design proposed a "no-overwrite" storage manager. In this model, data was never physically overwritten on the disk. Instead, every update operation effectively created a new version of the record, leaving the old version intact.9

早期 POSTGRES 架构的一个决定性且具有争议的特征是其存储管理方法。与利用预写日志 (WAL) 结合就地更新来确保持久性的当代系统(以及现代 PostgreSQL)不同,最初的 POSTGRES 设计提出了一种“无覆盖”存储管理器。在这个模型中,数据从未在磁盘上被物理覆盖。相反,每次更新操作实际上都创建了记录的新版本,而保留旧版本完好无损 9。

This architecture was theoretically elegant because it inherently supported "time travel." Users could query the database as it existed at any specific timestamp in the past without needing a separate audit log or backup restoration. The system maintained a continuum of data states. However, making this construct perform well in practice proved exceptionally tricky. The "no-overwrite" model essentially turned the database into a write-optimized log, but reading current data required scanning past immense amounts of historical debris. Stonebraker noted that while the concept was sound, the I/O penalties on the hardware of the late 1980s—specifically the cost of maintaining separate delta structures and accessing magnetic disks—were prohibitive.7

这种架构在理论上是优雅的,因为它天生支持“时间旅行”。用户可以查询过去任何特定时间点存在的数据库,而无需单独的审计日志或备份恢复。系统维护了数据状态的连续体。然而,要在实践中使这种结构表现良好被证明是非常棘手的。“无覆盖”模型本质上将数据库变成了一个写优化的日志,但读取当前数据需要扫描大量的历史碎片。Stonebraker 指出,虽然这个概念是合理的,但在 20 世纪 80 年代末的硬件上——特别是维护单独的增量结构和访问磁盘的成本——I/O 惩罚是令人望而却步的 7。

Although the "no-overwrite" storage manager was eventually replaced to improve performance, its legacy persists in PostgreSQL's implementation of Multiversion Concurrency Control (MVCC), where multiple versions of a row (tuples) coexist to facilitate isolation, albeit with a vacuum process to clean them up.9

虽然“无覆盖”存储管理器最终为了提高性能而被取代,但其遗产仍保留在 PostgreSQL 的多版本并发控制 (MVCC) 实现中,其中一行的多个版本(元组)共存以促进隔离,尽管需要 vacuum 进程来清理它们 9。

2.3 PostQUEL, The Rules System, and The End of an Era

2.3 PostQUEL、规则系统与一个时代的终结

Initially, POSTGRES ignored the emerging SQL standard. Instead, it employed a query language known as PostQUEL, a derivative of the QUEL language used in the original INGRES.10 PostQUEL was specifically designed to support the advanced features of the system, such as the manipulation of complex objects and the definition of rules. The rules system was a core research component, intended to allow database administrators to define semantics for data integrity (e.g., "salary must never be negative") and views directly within the database engine using a syntax that was tightly coupled with the query language.2

最初,POSTGRES 忽略了新兴的 SQL 标准。相反,它采用了一种称为 PostQUEL 的查询语言,这是最初 INGRES 中使用的 QUEL 语言的衍生品 10。PostQUEL 专门设计用于支持系统的高级功能,例如复杂对象的操作和规则的定义。规则系统是一个核心研究组件,旨在允许数据库管理员使用与查询语言紧密耦合的语法,直接在数据库引擎中定义数据完整性(例如,“工资绝不能为负”)和视图的语义 2。

By 1993, the external user community for POSTGRES had grown significantly, nearly doubling in size that year alone. This growth was driven largely by the project's adoption as the primary data manager for the Sequoia 2000 scientific computing project, which utilized POSTGRES for managing large-scale satellite imagery and geographic information.3 However, the academic team faced a dilemma: the burden of maintaining prototype code, fixing bugs, and supporting a growing user base was consuming large amounts of time that should have been devoted to novel database research. Consequently, in an effort to reduce this support burden, the Berkeley POSTGRES project officially ended with Version 4.2.4

到 1993 年,POSTGRES 的外部用户社区显著增长,仅在那一年规模就几乎翻了一番。这种增长主要是由于该项目被采用为 Sequoia 2000 科学计算项目的主要数据管理器,该项目利用 POSTGRES 管理大规模卫星图像和地理信息 3。然而,学术团队面临两难境地:维护原型代码、修复错误和支持不断增长的用户群的负担消耗了大量本应用于新型数据库研究的时间。因此,为了减轻这种支持负担,伯克利 POSTGRES 项目在 4.2 版本正式结束 4。

3. The Renaissance (1994–2000): From Postgres95 to PostgreSQL

3. 复兴 (1994–2000):从 Postgres95 到 PostgreSQL

3.1 The SQL Transformation

3.1 SQL 转型

In 1994, the project underwent a metamorphosis that would determine its future survival. Two graduate students at UC Berkeley, Andrew Yu and Jolly Chen, undertook the critical engineering task of replacing the academic PostQUEL interpreter with a SQL language interpreter. This was not merely a syntactic translation; it required a fundamental reworking of how queries were parsed and planned. This modification resulted in the release of "Postgres95".3

1994 年,该项目经历了一次决定其未来生存的蜕变。加州大学伯克利分校的两名研究生 Andrew Yu 和 Jolly Chen 承担了用 SQL 语言解释器替换学术性 PostQUEL 解释器的关键工程任务。这不仅仅是句法翻译;它需要对查询的解析和规划方式进行根本性的重工。这一修改导致了“Postgres95”的发布 3。

This release was pivotal for several reasons beyond SQL support. The codebase was completely converted to ANSI C, improving portability across Unix systems. The size of the codebase was trimmed by approximately 25% by removing legacy research code, resulting in performance improvements of 30-50% compared to POSTGRES 4.2 on the Wisconsin Benchmark.11 While the instance-level rule system was removed, rewrite rules were retained, preserving some of the unique flexibility of the original design.

除了 SQL 支持之外,此版本之所以至关重要还有几个原因。代码库完全转换为 ANSI C,提高了在 Unix 系统上的可移植性。通过删除遗留的研究代码,代码库的大小削减了约 25%,导致在 Wisconsin 基准测试中,与 POSTGRES 4.2 相比,性能提高了 30-50% 11。虽然移除了实例级规则系统,但保留了重写规则,保留了原始设计的一些独特灵活性。

3.2 The Birth of PostgreSQL

3.2 PostgreSQL 的诞生

By 1996, it became evident that the name "Postgres95" would not age well and implied a specific year rather than a lasting platform. The project was renamed "PostgreSQL" to reflect the fusion of the original POSTGRES architecture with standard SQL capabilities. The version numbering was reset to 6.0, creating a lineage continuity from the Berkeley 4.2 release.3

到了 1996 年,显而易见的是,“Postgres95”这个名字将不再适用,它暗示了一个特定的年份,而不是一个持久的平台。该项目更名为“PostgreSQL”,以反映最初的 POSTGRES 架构与标准 SQL 功能的融合。版本编号重置为 6.0,建立了从伯克利 4.2 版本开始的谱系连续性 3。

This era marked the definitive transition from an academic research prototype to a community-driven open-source project. The development focus shifted from experimental features to stability, standard compliance, and feature augmentation. Key features added during this renaissance included Multi-Version Concurrency Control (MVCC), which replaced the older locking strategies, and a robust implementation of SQL constraints.3

这个时代标志着从学术研究原型向社区驱动的开源项目的明确过渡。开发重点从实验性功能转向稳定性、标准合规性和功能增强。在此复兴期间添加的关键功能包括多版本并发控制 (MVCC),它取代了旧的锁定策略,以及 SQL 约束的健壮实现 3。

4. Core Architecture: Multiversion Concurrency Control (MVCC)

4. 核心架构:多版本并发控制 (MVCC)

4.1 The Mechanism of Isolation and Snapshot Visibility

4.1 隔离与快照可见性机制

A distinguishing feature of PostgreSQL is its sophisticated implementation of Multiversion Concurrency Control (MVCC). Traditional database systems often relied on "Two-Phase Locking" (2PL), where a reader would acquire a shared lock that blocked writers, and a writer would acquire an exclusive lock that blocked readers. This approach severely limited concurrency in high-throughput environments. In contrast, PostgreSQL's MVCC ensures that "reading never blocks writing and writing never blocks reading".13

PostgreSQL 的一个显著特征是其多版本并发控制 (MVCC) 的复杂实现。传统的数据库系统通常依赖于“两阶段锁定”(2PL),其中读者获取阻止写者的共享锁,而写者获取阻止读者的排他锁。这种方法严重限制了高吞吐量环境中的并发性。相比之下,PostgreSQL 的 MVCC 确保“读操作从不阻塞写操作,写操作也从不阻塞读操作” 13。

This is achieved by maintaining multiple versions of data rows (tuples) simultaneously. When a transaction reads data, the system provides a "snapshot" of the database as it existed at the start of the transaction (for REPEATABLE READ or SERIALIZABLE isolation) or the start of the statement (for READ COMMITTED). This snapshot logic is implemented using two hidden fields embedded in the header of every tuple: xmin and xmax.15

这是通过同时维护数据行(元组)的多个版本来实现的。当一个事务读取数据时,系统提供该事务开始时(对于 REPEATABLE READ 或 SERIALIZABLE 隔离)或语句开始时(对于 READ COMMITTED)数据库存在的“快照”。这种快照逻辑是利用嵌入在每个元组头部的两个隐藏字段来实现的:xmin 和 xmax 15。

xmin (Creation Transaction ID): Stores the transaction ID (XID) of the transaction that inserted the tuple. For a tuple to be visible to a current transaction, the xmin must be a committed transaction ID that is logically "older" than the current transaction's snapshot horizon.

xmax (Deletion Transaction ID): Stores the XID of the transaction that deleted or updated the tuple (in PostgreSQL, an UPDATE is implemented as a DELETE of the old version followed by an INSERT of the new version). If xmax is set and committed by a transaction visible to the current snapshot, the tuple represents a deleted version and is invisible.16

xmin (创建事务 ID):存储插入该元组的事务的事务 ID (XID)。为了使一个元组对当前事务可见,xmin 必须是一个已提交的事务 ID,且在逻辑上比当前事务的快照视界“更旧”。

xmax (删除事务 ID):存储删除或更新该元组的事务的 XID(在 PostgreSQL 中,UPDATE 被实现为旧版本的 DELETE 紧接着新版本的 INSERT)。如果 xmax 已设置且由对当前快照可见的事务提交,则该元组代表已删除的版本,因此不可见 16。

This elegant mechanism allows PostgreSQL to support strict isolation levels, including Serializable Snapshot Isolation (SSI), which detects serialization anomalies (like write skew) without the heavy performance penalty of pessimistic locking.13

这种优雅的机制允许 PostgreSQL 支持严格的隔离级别,包括可串行化快照隔离 (SSI),它无需悲观锁定的沉重性能损失即可检测串行化异常(如写偏斜)13。

4.2 The Necessity of Vacuuming and Transaction Freeze

4.2 Vacuum 与事务冻结的必要性

The side effect of the MVCC architecture is the rapid accumulation of "dead tuples"—versions of data that have been deleted or updated and are no longer visible to any active transaction. Unlike Oracle or MySQL (InnoDB), which use a separate undo log segment for old versions, PostgreSQL stores these versions in the main table structure (the heap). To reclaim this storage space, PostgreSQL employs a background process called VACUUM (and its automated daemon, autovacuum). Without regular vacuuming, the database size would grow indefinitely, known as "bloat".18

MVCC 架构的副作用是“死元组”的快速积累——即已被删除或更新且对任何活动事务不再可见的数据版本。与使用单独的撤消日志段来存储旧版本的 Oracle 或 MySQL (InnoDB) 不同,PostgreSQL 将这些版本存储在主表结构(堆)中。为了回收此存储空间,PostgreSQL 采用了一个称为 VACUUM 的后台进程(及其自动化守护进程 autovacuum)。如果没有定期的 vacuum 操作,数据库的大小将无限增长,即所谓的“膨胀” 18。

A critical aspect of VACUUM is the prevention of Transaction ID Wraparound. PostgreSQL uses 32-bit transaction IDs, which allows for approximately 4 billion unique transactions. In a high-throughput system, this limit can be reached relatively quickly. Once the counter wraps around, recent transactions would appear to be in the distant past (older than the wrapped counter), potentially causing catastrophic data invisibility or loss.18

VACUUM 的一个关键方面是防止事务 ID 回绕。PostgreSQL 使用 32 位事务 ID,这允许大约 40 亿个唯一的事务。在高吞吐量系统中,可以相对较快地达到此限制。一旦计数器回绕,最近的事务将显示为发生在遥远的过去(比回绕后的计数器更旧),可能导致灾难性的数据不可见或丢失 18。

To mitigate this, VACUUM performs a Freeze operation. It scans pages and marks tuples with old XIDs with a special frozen bit (or replaces the XID with a simplified FrozenTransactionId in older versions). This indicates that the tuple is so old that it is visible to all current and future transactions, effectively decoupling it from the circular XID comparison logic.18 The management of this freeze mechanism, specifically the autovacuum_freeze_max_age parameter, is a vital administrative task for maintaining PostgreSQL stability. Recent versions (PostgreSQL 13–17) have introduced aggressive optimizations to the vacuum process, such as "failsafe" modes to prioritize wraparound prevention over other tasks when danger is imminent.19

为了减轻这种情况,VACUUM 执行冻结操作。它扫描页面并用特殊的 frozen 位标记具有旧 XID 的元组(或在旧版本中用简化的 FrozenTransactionId 替换 XID)。这表明该元组非常古老,以至于它对所有当前和未来的事务都可见,从而有效地将其与循环 XID 比较逻辑解耦 18。管理这种冻结机制,特别是 autovacuum_freeze_max_age 参数,是维护 PostgreSQL 稳定性的重要管理任务。最近的版本(PostgreSQL 13–17)对 vacuum 过程引入了积极的优化,例如“故障安全”模式,以便在危险迫在眉睫时优先考虑防止回绕而不是其他任务 19。

4.3 JIT Compilation and Query Execution

4.3 JIT 编译与查询执行

In the domain of query execution, modern PostgreSQL has integrated Just-In-Time (JIT) compilation, a feature that leverages the LLVM (Low Level Virtual Machine) compiler infrastructure. Introduced in PostgreSQL 11, JIT compilation is designed to accelerate the evaluation of expressions (such as complex WHERE clauses) and tuple deforming (the process of decoding the on-disk tuple format into memory).23

在查询执行领域,现代 PostgreSQL 集成了即时 (JIT) 编译,这是一项利用 LLVM(底层虚拟机)编译器基础设施的功能。PostgreSQL 11 中引入了 JIT 编译,旨在加速表达式的评估(例如复杂的 WHERE 子句)和元组变形(将磁盘上的元组格式解码到内存中的过程)23。

For analytical queries (OLAP) which process millions of rows, the overhead of interpreting SQL functions for every row is significant. JIT compiles these expressions into native machine code at runtime. The decision to use JIT is cost-based; if the planner estimates a query's cost exceeds jit_above_cost, compilation is triggered. While beneficial for CPU-bound tasks, JIT introduces a startup overhead, making it less suitable for short, transactional queries (OLTP).24

对于处理数百万行的分析查询 (OLAP),为每一行解释 SQL 函数的开销是巨大的。JIT 在运行时将这些表达式编译为本机机器代码。使用 JIT 的决定是基于成本的;如果规划器估计查询的成本超过 jit_above_cost,则触发编译。虽然对 CPU 密集型任务有利,但 JIT 引入了启动开销,使其不太适合短的事务性查询 (OLTP) 24。

5. Extensibility: The "Object-Relational" Fulfillment

5. 可扩展性:“对象-关系”的实现

5.1 Generalized Inverted Indexes (GIN) and JSONB

5.1 广义倒排索引 (GIN) 与 JSONB

While the "object-relational" label was coined in the 1980s, its true power materialized with the rise of semi-structured data in the web era. PostgreSQL distinguishes itself from standard relational databases through its robust support for JSONB (Binary JSON). Unlike the JSON data type, which stores data as a text string and requires reparsing for every operation, JSONB stores data in a decomposed binary format. This allows for efficient indexing, compression, and rapid access to specific keys without reading or parsing the entire document.26

虽然“对象-关系”的标签是在 20 世纪 80 年代创造的,但其真正的威力随着网络时代半结构化数据的兴起而得以实现。PostgreSQL 通过对 JSONB(二进制 JSON)的健壮支持将自己与标准关系数据库区分开来。与将数据存储为文本字符串并需要为每次操作重新解析的 JSON 数据类型不同,JSONB 以分解的二进制格式存储数据。这允许进行有效的索引、压缩和快速访问特定键,而无需读取或解析整个文档 26。

To effectively index these complex structures, PostgreSQL utilizes GIN (Generalized Inverted Index). GIN is an inverted index structure designed to handle values that contain multiple internal elements (like a JSON document containing many keys, or an array of integers). By mapping individual keys to lists of row IDs (TIDs), GIN indexes enable performance that rivals or exceeds dedicated NoSQL document stores like MongoDB for many workloads, allowing PostgreSQL to serve as a unified hybrid data platform.27 The dominance of this feature is evidenced by the introduction of specific optimizations for GIN in versions leading up to PostgreSQL 17, such as better handling of high-frequency keys and compression of posting lists.29

为了有效地索引这些复杂的结构,PostgreSQL 利用了GIN(广义倒排索引)。GIN 是一种倒排索引结构,旨在处理包含多个内部元素的值(如包含许多键的 JSON 文档或整数数组)。通过将单个键映射到行 ID (TID) 列表,GIN 索引实现了在许多工作负载中可与 MongoDB 等专用 NoSQL 文档存储相媲美甚至超越的性能,使 PostgreSQL 能够充当统一的混合数据平台 27。PostgreSQL 17 之前的版本中针对 GIN 引入的特定优化,如更好地处理高频键和压缩发布列表,证明了此功能的主导地位 29。

5.2 Partitioning: From Inheritance to Declarative

5.2 分区:从继承到声明式

Table partitioning—the division of large tables into smaller, more manageable pieces—has a long evolutionary history in PostgreSQL. Historically, partitioning was implemented via Table Inheritance, a feature rooted in the system's object-oriented origins. Users had to manually create child tables that inherited from a parent and define complex triggers or rules to route inserted data to the correct child. This process was cumbersome, opaque to the query planner, and prone to significant performance overhead during high-volume ingestion.30

表分区——将大表分割成更小、更易于管理的片段——在 PostgreSQL 中有着漫长的演变历史。历史上,分区是通过表继承来实现的,这一功能植根于系统的面向对象起源。用户必须手动创建继承自父表的子表,并定义复杂的触发器或规则以将插入的数据路由到正确的子表。这一过程繁琐,对查询规划器不透明,并且在大批量摄取期间容易产生巨大的性能开销 30。

Recognizing the need for simpler management in big data scenarios, the community introduced Declarative Partitioning in PostgreSQL 10. This feature allows users to define partitions using standard SQL syntax (PARTITION BY RANGE/LIST/HASH), with the database engine automatically handling tuple routing, input validation, and partition pruning. The shift from an imperative, manual mechanism (inheritance/triggers) to a declarative one represents a move away from academic complexity toward pragmatic, enterprise-grade usability. In recent versions, performance features like Partition Pruning (skipping scans of partitions that cannot contain the requested data) have been optimized to work at execution time as well as planning time, enabling efficient querying of multi-terabyte datasets.30

认识到在大数据场景中简化管理的必要性,社区在 PostgreSQL 10 中引入了声明式分区。此功能允许用户使用标准 SQL 语法(PARTITION BY RANGE/LIST/HASH)定义分区,数据库引擎自动处理元组路由、输入验证和分区修剪。从命令式、手动机制(继承/触发器)向声明式机制的转变,代表了从学术复杂性向务实的企业级可用性的迈进。在最近的版本中,像分区修剪(跳过扫描不可能包含请求数据的分区)这样的性能功能已得到优化,可以在执行时以及规划时工作,从而实现对多 TB 数据集的高效查询 30。

6. Cloud-Native Evolution: Amazon Aurora and AlloyDB

6. 云原生演进:Amazon Aurora 与 AlloyDB

As PostgreSQL became the standard for cloud deployments, hyperscale cloud providers encountered limitations in its traditional monolithic architecture, particularly regarding storage I/O and recovery times. This led to radical architectural reimaginings of the PostgreSQL engine.

随着 PostgreSQL 成为云部署的标准,超大规模云提供商在其传统的单体架构中遇到了局限性,特别是在存储 I/O 和恢复时间方面。这导致了对 PostgreSQL 引擎的根本性架构重构。

6.1 Amazon Aurora: "The Log is the Database"

6.1 Amazon Aurora:“日志即数据库”

In a seminal 2017 SIGMOD paper, Amazon Web Services (AWS) detailed the architecture of Amazon Aurora. The core innovation was the decoupling of compute and storage to address write amplification. In traditional PostgreSQL, a write operation involves writing data pages to disk (often causing "torn pages" risks requiring full-page writes) and synchronously writing WAL records for crash recovery. This results in excessive network I/O in a replicated cloud environment.33

在 2017 年一篇开创性的 SIGMOD 论文中,亚马逊网络服务 (AWS) 详细介绍了 Amazon Aurora 的架构。核心创新在于计算与存储的解耦,以解决写入放大问题。在传统的 PostgreSQL 中,写操作涉及将数据页写入磁盘(通常会导致“页面撕裂”风险,需要全页写入)并同步写入 WAL 记录以进行崩溃恢复。这会导致在复制的云环境中产生过多的网络 I/O 33。

Aurora eliminates writing database pages from the compute node entirely. Instead, it pushes the redo log (WAL) processing down to a distributed, multi-tenant storage fleet. The compute node writes only the log records to the storage service. The storage nodes then asynchronously apply these log records to generate the data pages on demand. This "log-structured storage" architecture yields two profound benefits:

Reduced Network Traffic: Only the log records traverse the network, not full pages.

Instant Crash Recovery: Since the storage layer constantly replays logs, the compute node does not need to perform a lengthy replay of WAL files upon restart. The database is essentially always "recovered".33

Aurora 完全消除了从计算节点写入数据库页面的操作。相反,它将重做日志 (WAL) 处理下推到分布式的多租户存储机群。计算节点仅将日志记录写入存储服务。然后,存储节点异步应用这些日志记录以按需生成数据页。这种“日志结构化存储”架构产生了两个深远的好处:

减少网络流量:只有日志记录通过网络传输,而不是整页。

即时崩溃恢复:由于存储层不断重放日志,计算节点在重启时不需要对 WAL 文件进行长时间的重放。数据库本质上始终处于“已恢复”状态 33。

6.2 Google AlloyDB: Disaggregated Columnar Storage & ML

6.2 Google AlloyDB:存算分离的列式存储与机器学习

Google Cloud's AlloyDB further evolves this paradigm by introducing a disaggregated architecture that separates compute and storage to an even greater degree, leveraging Google's Colossus distributed file system. A key differentiator in AlloyDB is its integration of a columnar engine for hybrid workloads (HTAP).

Google Cloud 的 AlloyDB 通过引入存算分离架构进一步发展了这一范式,该架构在更大程度上分离了计算和存储,并利用了 Google 的 Colossus 分布式文件系统。AlloyDB 的一个关键区别在于其集成了用于混合工作负载 (HTAP) 的列式引擎。

While keeping the standard PostgreSQL interface, AlloyDB utilizes machine learning to analyze workload patterns. It automatically identifies tables suited for analytical queries and converts them into a columnar format in memory. This allows for vectorized execution (SIMD instructions), processing batches of values simultaneously rather than row-by-row. Google claims this architecture is up to 4x faster than standard PostgreSQL for transactional workloads (due to storage offloading) and significantly faster for analytical queries, effectively removing the need to ETL data to a separate data warehouse.35

在保留标准 PostgreSQL 接口的同时,AlloyDB 利用机器学习来分析工作负载模式。它自动识别适合分析查询的表,并将它们转换为内存中的列式格式。这允许进行向量化执行(SIMD 指令),同时处理批量值而不是逐行处理。Google 声称,这种架构在事务工作负载方面比标准 PostgreSQL 快 4 倍(由于存储卸载),在分析查询方面也显著更快,有效地消除了将数据 ETL 到单独数据仓库的需求 35。

7. The AI Convergence: Vector Search and pgvector

7. AI 融合:向量搜索与 pgvector

The explosion of Large Language Models (LLMs) and Generative AI in 2023-2024 created an urgent demand for "Vector Databases"—systems capable of storing and searching high-dimensional embeddings (arrays of floating-point numbers representing semantic meaning). Rather than migrating to niche, specialized vector databases (like Pinecone or Milvus), the PostgreSQL community responded with pgvector, an extension that has rapidly become the de-facto standard for vector workloads.

2023-2024 年大型语言模型 (LLM) 和生成式 AI 的爆炸式增长,创造了对“向量数据库”的迫切需求——即能够存储和搜索高维嵌入(表示语义含义的浮点数数组)的系统。PostgreSQL 社区没有迁移到利基的专用向量数据库(如 Pinecone 或 Milvus),而是通过 pgvector 做出了回应,这是一个已迅速成为向量工作负载事实标准的扩展。

7.1 Algorithmic Implementation: IVFFlat vs. HNSW

7.1 算法实现:IVFFlat 与 HNSW

pgvector transforms PostgreSQL into a vector database by introducing the vector data type and implementing specialized indexing algorithms that facilitate Approximate Nearest Neighbor (ANN) search.

pgvector 通过引入 vector 数据类型并实现促进近似最近邻 (ANN) 搜索的专门索引算法,将 PostgreSQL 转变为向量数据库。

Table 1: Comparison of Vector Indexing Algorithms in PostgreSQL

表 1:PostgreSQL 中向量索引算法的比较

The introduction of HNSW support in pgvector was a game-changer. It allows PostgreSQL to perform semantic searches (using Cosine Similarity, Euclidean Distance, or Inner Product) with millisecond latency, enabling features like RAG (Retrieval Augmented Generation) where an LLM retrieves context from the database before generating an answer.37

pgvector 中引入 HNSW 支持是一个改变游戏规则的举措。它允许 PostgreSQL 以毫秒级延迟执行语义搜索(使用余弦相似度、欧几里得距离或内积),从而实现 RAG(检索增强生成)等功能,即 LLM 在生成答案之前从数据库中检索上下文 37。

7.2 Cloud Integrations: DiskANN and Scaling

7.2 云集成:DiskANN 与扩展

Microsoft Azure has extended this capability by integrating the DiskANN algorithm into its PostgreSQL offering. Developed by Microsoft Research, DiskANN addresses the primary limitation of HNSW: memory cost. While HNSW requires the entire graph index to reside mostly in RAM for performance, DiskANN leverages the parallelism of modern NVMe SSDs to store the graph on disk while maintaining near-memory performance speeds. This allows for vector indexes that far exceed the size of available RAM, offering a cost-effective solution for billion-scale vector datasets.40

Microsoft Azure 通过将其 PostgreSQL 产品中集成 DiskANN 算法扩展了这一能力。由微软研究院开发的 DiskANN 解决了 HNSW 的主要限制:内存成本。虽然 HNSW 需要将整个图索引主要驻留在 RAM 中以获得性能,但 DiskANN 利用现代 NVMe SSD 的并行性将图存储在磁盘上,同时保持接近内存的性能速度。这允许向量索引远远超过可用 RAM 的大小,为十亿级向量数据集提供了一种具有成本效益的解决方案 40。

8. Future Roadmap: PostgreSQL 17, 18, and Beyond

8. 未来路线图:PostgreSQL 17, 18 及以后

As the platform moves into late 2024 and 2025, the development roadmap is focused on fundamental architectural modernization to match the capabilities of modern hardware.

随着该平台进入 2024 年底和 2025 年,开发路线图的重点是基础架构现代化,以匹配现代硬件的能力。

8.1 The Asynchronous I/O (AIO) Revolution

8.1 异步 I/O (AIO) 革命

Historically, PostgreSQL has relied on synchronous, buffered I/O. When a query needs a page not in memory, the backend process issues a read() system call and blocks (waits) until the OS retrieves the data. This model is inefficient for modern NVMe storage, which can handle thousands of parallel requests, and for cloud storage where latency is higher.

历史上,PostgreSQL 依赖于同步、缓冲 I/O。当查询需要不在内存中的页面时,后端进程发出 read() 系统调用并阻塞(等待),直到操作系统检索到数据。这种模型对于可以处理数千个并行请求的现代 NVMe 存储以及延迟较高的云存储来说效率低下。

PostgreSQL 18 is actively introducing a comprehensive Asynchronous I/O (AIO) subsystem. This represents a massive refactoring of the storage layer. With AIO (utilizing io_uring on Linux), the database can issue multiple I/O requests simultaneously (prefetching) and continue processing other tasks while waiting for them to complete. Early benchmarks for PostgreSQL 18 betas suggest up to a 3x performance improvement for sequential scans and vacuum operations on systems with high storage latency.42 This shift also paves the way for Direct I/O (bypassing the OS kernel cache entirely), giving the database full control over memory management.

PostgreSQL 18 正积极引入全面的异步 I/O (AIO) 子系统。这代表了存储层的巨大重构。通过 AIO(在 Linux 上利用 io_uring),数据库可以同时发出多个 I/O 请求(预取),并在等待它们完成的同时继续处理其他任务。PostgreSQL 18 beta 版的早期基准测试表明,在存储延迟较高的系统上,顺序扫描和 vacuum 操作的性能提高了 3 倍 42。这一转变也为直接 I/O(完全绕过操作系统内核缓存)铺平了道路,使数据库能够完全控制内存管理。

8.2 Active-Active Replication and Logical Decoding

8.2 双活复制与逻辑解码

While physical streaming replication is mature, PostgreSQL is steadily moving toward native Active-Active (Multi-Master) replication support. Enhancements in Logical Replication—such as the ability to replicate DDL commands, better conflict resolution hooks, and the introduction of parameters like max_active_replication_origins in Version 18—indicate a clear trend. The goal is to allow writes to occur on multiple nodes simultaneously with automatic synchronization, reducing the reliance on external tools like Bucardo or proprietary forks for high-availability global deployments.42

虽然物理流复制已经成熟,但 PostgreSQL 正稳步迈向原生的**双活(多主)**复制支持。逻辑复制的增强——例如复制 DDL 命令的能力、更好的冲突解决钩子,以及在版本 18 中引入 max_active_replication_origins 等参数——表明了一个明显的趋势。目标是允许在多个节点上同时进行写入并自动同步,从而减少对 Bucardo 等外部工具或专有分支进行高可用性全球部署的依赖 42。

8.3 Security: The Quest for TDE

8.3 安全性:对 TDE 的追求

Transparent Data Encryption (TDE)—encryption of data at rest managed by the database—remains a critical missing feature in the core open-source version compared to Oracle or SQL Server. Implementing TDE in PostgreSQL is technically challenging due to its reliance on the OS page cache (if the DB encrypts a page, but the OS caches the plaintext, security is compromised). The roadmap includes rigorous discussions and patches aimed at implementing a cluster-wide encryption architecture with internal key management, but this remains a complex, ongoing effort likely to land in versions post-18.47

与 Oracle 或 SQL Server 相比,透明数据加密 (TDE)——由数据库管理的静态数据加密——仍然是核心开源版本中缺失的一个关键功能。在 PostgreSQL 中实现 TDE 在技术上具有挑战性,因为它依赖于操作系统页面缓存(如果数据库加密页面,但操作系统缓存明文,则安全性会受到影响)。路线图包括旨在实现具有内部密钥管理的集群范围加密架构的严格讨论和补丁,但这仍然是一项复杂的、持续的工作,可能会在 18 之后的版本中落地 47。

9. Conclusion

9. 结论

The trajectory of PostgreSQL—from a DARPA-funded research prototype at Berkeley to the dominant open-source database of the cloud era—is a testament to the foresight of its original design. Michael Stonebraker’s vision of an extensible, object-relational system proved remarkably prescient. By building a database that could "learn" new types, PostgreSQL was able to absorb the JSON document revolution (via JSONB), the geospatial revolution (via PostGIS), and now the AI vector revolution (via pgvector), effectively rendering many "niche" databases obsolete.

PostgreSQL 的轨迹——从伯克利由 DARPA 资助的研究原型到云时代的主导开源数据库——证明了其原始设计的远见卓识。Michael Stonebraker 关于可扩展、对象-关系系统的愿景被证明是非常有先见之明的。通过构建一个可以“学习”新类型的数据库,PostgreSQL 能够吸收 JSON 文档革命(通过 JSONB)、地理空间革命(通过 PostGIS)以及现在的 AI 向量革命(通过 pgvector),有效地使许多“利基”数据库过时。

As it approaches Version 18, PostgreSQL is shedding its last vestiges of legacy architecture. The move to Asynchronous I/O, the integration of JIT compilation, and the embrace of cloud-native storage patterns ensure that it remains not just a standard, but a high-performance platform capable of powering the most demanding data-intensive applications of the next decade.

随着版本 18 的临近,PostgreSQL 正在摆脱其遗留架构的最后残留。向异步 I/O 的迈进、JIT 编译的集成以及对云原生存储模式的拥抱,确保它不仅仅是一个标准,而且是一个高性能平台,能够为未来十年最苛刻的数据密集型应用提供动力。

Works cited

Michael Stonebraker | EECS at UC Berkeley, accessed November 21, 2025,

Copyright © 1986, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work - UC Berkeley EECS, accessed November 21, 2025,

Documentation: 7.0: A Short History of Postgres - PostgreSQL, accessed November 21, 2025,

Documentation: 18: 2. A Brief History of PostgreSQL - PostgreSQL, accessed November 21, 2025,

Copyright © 1992, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work - UC Berkeley EECS, accessed November 21, 2025,

Michael Stonebraker Bibliography - A.M. Turing Award Winner, accessed November 21, 2025,

managing persistent objects in a multi-level store - UC Berkeley EECS, accessed November 21, 2025,

What Goes Around Comes Around... And Around... - Carnegie Mellon Database Group, accessed November 21, 2025,

The Land Sharks Are on the Squawk Box - Communications of the ACM, accessed November 21, 2025,

PostgreSQL 7.3.2 Administrator's Guide, accessed November 21, 2025,

PostgreSQL 7.3.2 User's Guide, accessed November 21, 2025,

PostgreSQL 7.3.2 Tutorial, accessed November 21, 2025,

Documentation: 18: 13.1. Introduction - PostgreSQL, accessed November 21, 2025,

Documentation: 7.0: Multi-Version Concurrency Control - PostgreSQL, accessed November 21, 2025,

Transaction Processing in PostgreSQL, accessed November 21, 2025,

Operations cheat sheet - PostgreSQL wiki, accessed November 21, 2025,

src/include/utils/snapshot.h File Reference - PostgreSQL Source Code, accessed November 21, 2025,

Documentation: 8.1: Routine Database Maintenance Tasks - PostgreSQL, accessed November 21, 2025,

Documentation: 18: VACUUM - PostgreSQL, accessed November 21, 2025,

E.24. Release 9.3.2 - PostgreSQL, accessed November 21, 2025,

Documentation: 18: 24.1. Routine Vacuuming - PostgreSQL, accessed November 21, 2025,

PostgreSQL 17 Released!, accessed November 21, 2025,

Documentation: 18: 30.1. What Is JIT compilation? - PostgreSQL, accessed November 21, 2025,

Documentation: 18: 30.2. When to JIT? - PostgreSQL, accessed November 21, 2025,

JIT compilation in PostgreSQL, accessed November 21, 2025,

Documentation: 18: 8.14. JSON Types - PostgreSQL, accessed November 21, 2025,

Documentation: 18: 65.4. GIN Indexes - PostgreSQL, accessed November 21, 2025,

Documentation: 9.1: GiST and GIN Index Types - PostgreSQL, accessed November 21, 2025,

9.4.6 - PostgreSQL: Release Notes, accessed November 21, 2025,

Documentation: 18: 5.12. Table Partitioning - PostgreSQL, accessed November 21, 2025,

Documentation: 18: 5.11. Inheritance - PostgreSQL, accessed November 21, 2025,

Feature Matrix - PostgreSQL, accessed November 21, 2025,

Amazon Aurora: Design considerations for high throughput cloud-native relational databases, accessed November 21, 2025,

Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases - cs.wisc.edu, accessed November 21, 2025,

AlloyDB for PostgreSQL | Google Cloud, accessed November 21, 2025,

AlloyDB overview | AlloyDB for PostgreSQL - Google Cloud Documentation, accessed November 21, 2025,

How to optimize performance when using pgvector - Azure Cosmos DB for PostgreSQL, accessed November 21, 2025,

Vector relevance and ranking - Azure AI Search - Microsoft Learn, accessed November 21, 2025,

pgvector 0.7.0 Released! - PostgreSQL, accessed November 21, 2025,

What's new with Postgres at Microsoft, 2025 edition, accessed November 21, 2025,

DiskANN on Azure Database for PostgreSQL – Now Generally Available, accessed November 21, 2025,

Documentation: 18: E.2. Release 18 - PostgreSQL, accessed November 21, 2025,

PostgreSQL 18 Beta 1 Released!, accessed November 21, 2025,

PostgreSQL 18 Released!, accessed November 21, 2025,

PostgreSQL 18 Asynchronous I/O - Improve Read Performance - Neon, accessed November 21, 2025,

Documentation: 18: Chapter 29. Logical Replication - PostgreSQL, accessed November 21, 2025,

Transparent Data Encryption - PostgreSQL wiki, accessed November 21, 2025,

TDE (Transparent Data Encryption) supported - PostgreSQL, accessed November 21, 2025,