现在我找到诀窍了,如何让codex写复杂逻辑,就是让他do math,先用形式化数学语言写文档,与人商量敲定各种条件之后,然后让它实现,可一次成型无虫代码。
否则的话,无穷无尽的debug循环,烧多少钱也没有用。
经过与codex几小时的讨论,最后成型的算法文档长这样:
Fast-Path Restart Math
This note captures the exact arithmetic needed to service leaf-prefix insert
and delete without rebuilding the whole page. The fast path only decodes the
local window around the edit, runs the prefix analyzer once, and then evaluates
one of four cases before applying a single in-place mutation.
Shared definitions
interval = mdb_prefix_interval(db)
threshold = mdb_prefix_threshold(db)
RP = {r0 < r1 < … < r_{m-1}}
current restart (anchor) indexes
anchor_left(x)
– largest restart ≤ x
(binary search over RP
)
run_len(x) = x - anchor_left(x)
(0
for a restart slot)
encoded_len(key_len, shared) = sizeof(uint16_t) + key_len - shared
node_size(enc_len, data_len, flags) = EVEN(NODESIZE + enc_len +
(flags & F_BIGDATA ? sizeof(pgno_t) : data_len))
Only two page slots are touched during a fast-path edit: the slot being inserted
or removed (i
) plus its successor (i
for insert, i+1
for delete). All math
is computed before the page is mutated.
Minimal decode window
- Locate
r = anchor_left(target)
where target
is the slot being inspected
(i-1
for insert, i
for delete). We walk RP
linearly while target
remains within the first handful of slots (≈8); otherwise we fall back to
the binary search over the restart list.
- Decode keys starting at slot
r
until the predecessor (K_prev
), the key
of interest, and the successor (K_succ
) are fully materialised.
Analyzer usage
Insert (slot i
): run mdb_leaf_prefix_analyze(db, run_prev, K_prev?, K_new)
where run_prev = i ? (i-1) - r : 0
. The result yields:
is_restart_new
shared_new
(0
if is_restart_new
)
Delete (slot i
): run mdb_leaf_prefix_analyze(db, run_prev, K_prev?, K_succ)
with the same run_prev
definition. The result yields:
is_restart_succ_new
shared_succ
(0
if is_restart_succ_new
)
The analyzer is only invoked once per edit (incoming key for insert,
successor for delete).
Insert math
Let i
be the insertion index, K_prev
the predecessor key (if any),
K_new
the incoming full key, and K_succ
the current key at slot i
.
New-key encoding
代码: 全选
enc_new = encoded_len(|K_new|, is_restart_new ? 0 : shared_new)
data_new = F_BIGDATA ? sizeof(pgno_t) : data_len_new
sz_new = node_size(enc_new, data_new, flags_new)
Successor decision (pure math)
代码: 全选
run_succ_old = i - r
shared_succ = lcp(K_new, K_succ) // 0 when there's no successor
wants_anchor = is_restart_new ||
run_succ_old + 1 >= interval ||
shared_succ <= threshold
is_restart_succ_new = wants_anchor ? 1 : 0
enc_succ_new = encoded_len(|K_succ|,
is_restart_succ_new ? 0 : shared_succ)
data_succ_old = flags_succ & F_BIGDATA ? sizeof(pgno_t)
: NODEDSZ_old(succ)
sz_succ_old = node_size(NODEKSZ_old(succ), data_succ_old, flags_succ)
sz_succ_new = node_size(enc_succ_new, data_succ_old, flags_succ)
Space check
代码: 全选
delta_payload = sz_new + sz_succ_new - sz_succ_old
lower_after = MP_LOWER
+ sizeof(indx_t) // new pointer
+ sizeof(indx_t) * (is_restart_new
+ is_restart_succ_new
- was_restart_succ)
upper_after = MP_UPPER - delta_payload
require upper_after ≥ lower_after
Pointer adjustments (math only)
ptr_new = upper_after
ptr_succ_new = ptr_new + sz_new
- Any pointer whose payload offset is below the old successor block shifts
downward by delta_payload
- Shift pointer indices ≥
i
right by one; write ptr_new
at i
Restart table math
Write the current restart array as RP = (r_0, r_1, …, r_{m-1})
with
r_j < r_{j+1}
. Let
代码: 全选
t = #{ r_j | r_j < i } // anchors strictly left of i
succ_o = 1 if any r_j = i, else 0 // old successor anchor at i
tail = m - t - succ_o // anchors strictly right of i
The new restart indexes are:
For 0 ≤ j < t
: r'_j = r_j
.
Initialize write = t
. If is_restart_new = 1
, store r'_write = i
and increment write
.
If is_restart_succ_new = 1
, store r'_write = i+1
and increment write
.
For q = 0 … tail - 1
: r'_{write + q} = r_{t + succ_o + q} + 1
.
When the old successor was a restart (succ_o = 1
) and the new decision keeps
it anchored, the slot is re-emitted at i+1
; otherwise it is dropped.
Only after these values are known do we memmove the payload block and write the
two encoded nodes, followed by header, pointer table, and restart array updates.
Delete math
Let slot i
be removed. K_prev
is the predecessor (if any), K_succ
is
slot i+1
(if any), and flags_i
/ flags_succ
are the node flags.
Deleted node payload
代码: 全选
data_del = flags_i & F_BIGDATA ? sizeof(pgno_t) : NODEDSZ_old(i)
sz_del = node_size(NODEKSZ_old(i), data_del, flags_i)
del_is_anchor = (i ∈ RP)
Successor decision
If there is a successor:
Analyzer already provided is_restart_succ_new
and shared_succ
enc_succ_new = encoded_len(|K_succ|,
is_restart_succ_new ? 0 : shared_succ)
data_succ_old = flags_succ & F_BIGDATA ? sizeof(pgno_t) : NODEDSZ_old(succ)
sz_succ_old = node_size(NODEKSZ_old(succ), data_succ_old, flags_succ)
sz_succ_new = node_size(enc_succ_new, data_succ_old, flags_succ)
Without a successor the adjustments for that slot vanish.
Space check
代码: 全选
delta_payload = sz_del + (sz_succ_new - sz_succ_old) // 0 if no successor
lower_after = MP_LOWER
- sizeof(indx_t) // pointer removed
+ sizeof(indx_t) * (is_restart_succ_new
- was_restart_succ
- (del_is_anchor ? 1 : 0))
upper_after = MP_UPPER + delta_payload
require upper_after ≥ lower_after
Pointer adjustments
- Remove pointer entry
i
, shift later entries left
- Any pointer whose payload offset lay below the successor block shifts
upward by delta_payload
Restart table math
Partition as before:
RP_< = { r ∈ RP | r < i }
RP_> = { r ∈ RP | r > i }
Because RP
is sorted by slot index, describe it as
RP = (r_0, r_1, …, r_{m-1})
with r_j < r_{j+1}
. Let
代码: 全选
t = #{ r_j | r_j < i } // anchors strictly left of i
del = 1 if any r_j = i, else 0 // deleted key was anchor
succ_o = 1 if any r_j = i+1, else 0 // old successor anchor
tail = m - t - del // anchors strictly right of i
skip = succ_o // entries to discard from tail
After deletion all tail anchors shift left by one. The new restart array is
still sorted and is given component-wise by:
For 0 ≤ j < t
: r'_j = r_j
.
If the successor must anchor (is_restart_succ_new = 1
):
代码: 全选
r'_t = i
for q = 0 … tail - skip - 1:
r'_{t + 1 + q} = r_{t + del + skip + q} - 1
If the successor must not anchor (is_restart_succ_new = 0
):
代码: 全选
for q = 0 … tail - skip - 1:
r'_{t + q} = r_{t + del + skip + q} - 1
The skip
term ensures that when the old successor was an anchor (i+1
),
its shifted index i
is retained only if the new decision demands it.
With all numbers established, the mutation consists of a single memmove of the
payload block, writing the successor’s new encoding (if present), and updating
header fields, pointer table, and restart array.
Notes
- The restart array used here does not need to match the builder’s output
byte-for-byte; it only has to be consistent with the encoded keys we store.
- We only invoke the prefix analyzer once per edit (incoming key on insert,
successor on delete) and only decode the small segment from the controlling
restart through the successor.
- All space and pointer arithmetic is resolved before touching the real page,
so the page update is a single, deterministic write sequence.
Bailout conditions
Fallback to the full-page builder whenever any of these checks fail:
- Page not eligible: the leaf is not prefix-enabled (no
MDB_PREFIX_COMPRESSION
),
is a LEAF2
page, or is a sub-page. The fast path assumes the restart table
is present.
- Decode/analyze failure:
mdb_cursor_decode_leaf_key
or
mdb_leaf_prefix_analyze
returns an error for any slot in the minimal window.
- Space exhaustion: the computed
upper_after
is < lower_after
, meaning
the node payloads, pointer table, and restart array would overlap. This is the
fast-path equivalent of MDB_PAGE_FULL
.
- Interval overrun beyond the successor: when inserting a non-anchor,
if keeping the successor non-anchor would stretch the run past the next
restart (i.e. anchor_right(i) - r >= interval
after accounting for the new
slot), we would need to re-encode more than one successor – bail out.
- Unsupported flag mix: any key in the window carries flags the fast path
doesn’t handle (e.g. sub-databases that require special handling). Until that
support exists, fall back.
- Reserve bookkeeping issue: if an insert with
MDB_RESERVE
or bigdata
cannot produce the required payload pointer in-place (e.g. overflow page not
available yet), defer to the builder.
These guardrails keep the optimized path constrained to “two-key” surgery. The
moment a wider transformation is required, the code should bail early and let
the general rebuild logic take over.
hci 写了: 2025年 10月 15日 17:43
用Codex太贵了,Vide coding一个星期就花了400美刀,还搞不定稍微比较复杂的问题,关键还慢得要死。
现在换成中国公司z.ai的开源模型GLM4.6,用在Claude Code里面,速度快太多了,爽。还便宜,一个季度随便用也就180美刀。
Codex完全没法比。
美帝AI完蛋了,泡泡绝对爆。