Two Sum Problem : aiming at (log n) square complexity

heteroclinic

https://github.com/wangzhikai/TwoSumProblem/blob/main/README.md

TwoSumProblem

Two Sum Problem - - the well-known tech interview question and proposed solution aiming at (log n) square time complexity and O(1) space complexity.

The high level algorithm design is documented here. text

The original work of mine, POC completed is as this file. text

I asked Chatgpt to review the code. It provided a more concise version plus more test cases. Again, it is still POC protype and I don't aim it having the industry level robustness. text

With the assistance of Chatgpt, we are able to construct worst case test data. Which makes time complexity O(n). text

I have been thinking of the worst case for days. According to the analysis in the pdf file, I decided to add one step per loop in the algorithm "opportunistic guessing". It is not purely opportunistic as discussed in the pdf file. Chatgpt quickly implemented the idea as a Python file. text It shows case we can beat the O(n) case with a O( log^2 n) approach. I have not got the chance to thouroughly review the Python file. Any comments will be welcome!

Further Research An immediate application of this problem can be for Goldbach’s conjecture as computer validation in limited scope. A large enough even number is the sum of two prime numbers. Given a large even number t and a complete list of prime numbers in scope, we can search for two prime numbers one is less than t/2 and the other is greater than t/2.

Acknowledgement Thanks to Leetcode presenting the problem to the ardent CS students and programmers! I used Node JS, VS Code + Copilot for the POC. Chatgpt greatly speeds up orginal research. Python is also used in the POC of code and visualizing important clues. Here are some chat histories with Chatgpt: https://chatgpt.com/share/682cc20d-57d0-8013-a706-0c460c77d0a2 https://chatgpt.com/share/682cc226-2e38-8013-9a4f-2260ee0c65e8 https://chatgpt.com/share/682cc258-324c-8013-bcdb-7d471a70e78e

heteroclinic

哪位将军,女侠有兴趣给看看有没有什么深挖的价值. 本人有有限的预算.
多谢!

eflame99 · 帖子由 **eflame99** » 2025年 5月 21日 06:28

https://leetcode.com/problems/two-sum-i ... s-beat-98/

苍井吱

eflame99 写了： 2025年 5月 21日 06:28 https://leetcode.com/problems/two-sum-i ... s-beat-98/

不大可能，lz那个文章里面用到了sorted array这个信息才提供了二分法的可能

乱序array不大可能比O(n)还快

eflame99 · 帖子由 **eflame99** » 2025年 5月 21日 13:21

苍井吱写了： 2025年 5月 21日 12:01 不大可能，lz那个文章里面用到了sorted array这个信息才提供了二分法的可能

乱序array不大可能比O(n)还快

你去读一读楼主的paper先，看看讲的是不是sorted array，用的是不是一模一样的二分法

苍井吱

eflame99 写了： 2025年 5月 21日 13:21 你去读一读楼主的paper先，看看讲的是不是sorted array，用的是不是一模一样的二分法

我当然是看了才知道的

你自己没看

heteroclinic

eflame99 写了： 2025年 5月 21日 06:28 https://leetcode.com/problems/two-sum-i ... s-beat-98/

i did not check the solutions in leetcode, though I am premium member.
I ask chatgpt, buddy says, still the binary search worst case is O n log n.

So there is analysis in my writings. it is not really O n log n. It is O k log n. k is the min (a_i, l -b_i).
From pytthon visualization k maximize when a_i /approx b_i /approx l/2
Here, we see when binary search degenerates, a_i and b_i are pushed nearer almost together.

So the point I trying to make is not binary search. It has been there. But, when we found binary search moves slow, we make a new guess range then confirm it with binary search. In program chatgpt wrote, it says beat the worst case 500 loop count to below 10.

Thanks all for reading.

heteroclinic

帖到买卖提大的目的首先,这两天脑子比较乱.另外就是我不太习惯看别人的程序,you尤其genai生成二程序太精练,大量fp 模式你想加入传统循环变量调试code非常困难

qxc · 帖子由 **qxc** » 2025年 5月 21日 18:14

heteroclinic 写了： 2025年 5月 21日 15:13 帖到买卖提大的目的首先,这两天脑子比较乱.另外就是我不太习惯看别人的程序,you尤其genai生成二程序太精练,大量fp 模式你想加入传统循环变量调试code非常困难

你不习惯看别人的程序，但习惯于让别人帮你看程序？

得先裸奔才有一点希望做到吧。

eflame99 · 帖子由 **eflame99** » 2025年 5月 21日 18:59

苍井吱写了： 2025年 5月 21日 13:32 我当然是看了才知道的

你自己没看

我看了才贴的这个link, 同样sorted array, 同样从两端开始做binary search。有什么问题呢？不太明白为什么要提乱序。

苍井吱

eflame99 写了： 2025年 5月 21日 18:59 我看了才贴的这个link, 同样sorted array, 同样从两端开始做binary search。有什么问题呢？不太明白为什么要提乱序。

所有的O(log n)搜索算法都要求pre-sorted array吧，不然在任何搜索目标下，最差情况都需要把所有的元素检查一遍。

当然比如我可以证明只有Pr = O(log n) / n的情况下才会需要O(log n)时间复杂度，其他时候都是O(log n)这样也行。需要更多分析了

heteroclinic

qxc 写了： 2025年 5月 21日 18:14 你不习惯看别人的程序，但习惯于让别人帮你看程序？

得先裸奔才有一点希望做到吧。

如何裸奔?一般欧洲有狂欢节,性工作着打扮成政治家,vice versa.

这个看程序就是要看成本收益.比如编辑能发好的稿子,以后在委员会或业界会有更高威望.

kiwi

肏尼玛都是深井冰吗

哪怕数组每个元素读一遍也要O(n)啊

现在本科教育怎么了？

heteroclinic

kiwi 写了： 2025年 5月 22日 10:18 肏尼玛都是深井冰吗

哪怕数组每个元素读一遍也要O(n)啊

现在本科教育怎么了？

问题问的是sorted array,你再看看.

heteroclinic

找到一个bug, guess 只对特定的worst case 有用.下面这个就是没有处理好的guess

import csv
from sympy import primerange
import math
import random

# Generate worst-case data where a_i ≈ b_i, clustering around target/2
def generate_worst_case_data(n, target):
half = target // 2
spread = n // 10 # small spread to cluster around target/2
data = sorted(random.randint(half - spread, half + spread) for _ in range(n))
return data

# Binary search to find the smallest index such that arr >= x
def binary_search_closest_ge(arr, low, high, x):
while low < high:
mid = (low + high) // 2
if arr[mid] < x:
low = mid + 1
else:
high = mid
return low

# Binary search to find the largest index such that arr <= x
def binary_search_closest_le(arr, low, high, x):
while low < high:
mid = (low + high + 1) // 2
if arr[mid] > x:
high = mid - 1
else:
low = mid
return low

# Find the left pointer such that arr + arr[j] >= target
def findLeft(arr, i, j, target):
while i < j:
if arr + arr[j] < target:
i += 1
else:
break
return i

# Find the right pointer such that arr + arr[j] <= target
def findRight(arr, i, j, target):
while i < j:
if arr + arr[j] > target:
j -= 1
else:
break
return j

# Check if a valid pair exists at (i, j)
def is_valid_pair(arr, i, j, target):
return 0 <= i < j < len(arr) and arr + arr[j] == target

# Main algorithm with log^2 strategy
def two_sum_log_squared(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
max_loops = int(math.log2(len(arr))) + 1

for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if i == -1 or j == -1 or i >= j:
return None, steps

pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)

stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)

guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)

gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps

i, j = newI, newJ

return None, steps

def two_sum_log_squared_guess_improved(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
#max_loops = int(math.log2(len(arr))) + 1
max_loops = j + 1

for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if is_valid_pair(arr, i,j, target):
return (i,j), steps
elif i == -1 or j == -1 or i >= j:
return None, steps

pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)

stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)

guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)

gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
else:
print(f'gi {gi} arr[gi] {arr[gi]} gj {gj} arr[gj] {arr[gj]}')
i, j = gi, gj
# else:
# i, j = newI, newJ

return None, steps

# # Example usage
# if __name__ == "__main__":
# n = 10000
# target = 1000
# arr = generate_worst_case_data(n, target)
# result, steps = two_sum_log_squared(arr, target)
# print(f"Result: {result}, Steps: {steps}")
# if result:
# i, j = result
# print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr + arr[j]}")

n = 1000
# Generate list of primes less than 10000
primes = list(primerange(2, n))

# Goldbach conjecture: even numbers ≥ 6
even_numbers = list(range(6, n, 2))

# # Write results to CSV
# with open("goldbach_results.csv", "w", newline="") as csvfile:
# writer = csv.writer(csvfile)
# writer.writerow(["even_number", "prime1", "prime2", "index1", "index2", "loop_count"])

# for even in even_numbers:
# p1, p2, i1, i2, loops = find_goldbach_pair(even, primes)
# if p1 is not None:
# writer.writerow([even, p1, p2, i1, i2, loops])
# else:
# writer.writerow([even, "", "", "", "", loops])
for even in even_numbers:
target = even
result, steps = two_sum_log_squared_guess_improved(primes, target)
#print(f"Result: {result}, Steps: {steps}")
arr = primes
if result:
i, j = result
print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")
else:
i, j = -1,-1
print(f"arr[{i}] = {arr[i]}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")

heteroclinic

如果直用binary search. 还是可以用的. 好像是guess 把答案跳过去然后就,超出循环的max 设置.这是 working binary search.我还要再想想如何修复.或者就是没法修复.

import csv
from sympy import primerange
import math
import random

# Generate worst-case data where a_i ≈ b_i, clustering around target/2
def generate_worst_case_data(n, target):
half = target // 2
spread = n // 10 # small spread to cluster around target/2
data = sorted(random.randint(half - spread, half + spread) for _ in range(n))
return data

# Binary search to find the smallest index such that arr >= x
def binary_search_closest_ge(arr, low, high, x):
while low < high:
mid = (low + high) // 2
if arr[mid] < x:
low = mid + 1
else:
high = mid
return low

# Binary search to find the largest index such that arr <= x
def binary_search_closest_le(arr, low, high, x):
while low < high:
mid = (low + high + 1) // 2
if arr[mid] > x:
high = mid - 1
else:
low = mid
return low

# Find the left pointer such that arr + arr[j] >= target
def findLeft(arr, i, j, target):
while i < j:
if arr + arr[j] < target:
i += 1
else:
break
return i

# Find the right pointer such that arr + arr[j] <= target
def findRight(arr, i, j, target):
while i < j:
if arr + arr[j] > target:
j -= 1
else:
break
return j

# Check if a valid pair exists at (i, j)
def is_valid_pair(arr, i, j, target):
return 0 <= i < j < len(arr) and arr + arr[j] == target

# Main algorithm with log^2 strategy
def two_sum_log_squared(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
max_loops = int(math.log2(len(arr))) + 1

for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if i == -1 or j == -1 or i >= j:
return None, steps

pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)

stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)

guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)

gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps

i, j = newI, newJ

return None, steps

def two_sum_log_squared_guess_improved(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
#max_loops = int(math.log2(len(arr))) + 1
max_loops = j + 1

for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if is_valid_pair(arr, i,j, target):
return (i,j), steps
elif i == -1 or j == -1 or i >= j:
return None, steps

# pivot = target // 2
# newI = binary_search_closest_ge(arr, i, j, pivot)
# newJ = binary_search_closest_le(arr, i, j, pivot)

# stepI = int((newI - i) * percent)
# stepJ = int((j - newJ) * percent)

# guessI = max(i, newI - stepI)
# guessJ = min(j, newJ + stepJ)

# gi = findLeft(arr, guessI, guessJ, target)
# gj = findRight(arr, guessI, guessJ, target)
# if is_valid_pair(arr, gi, gj, target):
# return (gi, gj), steps
# #i, j = gi, gj
# else:
# i, j = newI, newJ

return None, steps

# # Example usage
# if __name__ == "__main__":
# n = 10000
# target = 1000
# arr = generate_worst_case_data(n, target)
# result, steps = two_sum_log_squared(arr, target)
# print(f"Result: {result}, Steps: {steps}")
# if result:
# i, j = result
# print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr + arr[j]}")

n = 1000
# Generate list of primes less than 10000
primes = list(primerange(2, n))

# Goldbach conjecture: even numbers ≥ 6
even_numbers = list(range(6, n, 2))

# # Write results to CSV
# with open("goldbach_results.csv", "w", newline="") as csvfile:
# writer = csv.writer(csvfile)
# writer.writerow(["even_number", "prime1", "prime2", "index1", "index2", "loop_count"])

# for even in even_numbers:
# p1, p2, i1, i2, loops = find_goldbach_pair(even, primes)
# if p1 is not None:
# writer.writerow([even, p1, p2, i1, i2, loops])
# else:
# writer.writerow([even, "", "", "", "", loops])
for even in even_numbers:
target = even
result, steps = two_sum_log_squared_guess_improved(primes, target)
print(f"Result: {result}, Steps: {steps}")
arr = primes
if result:
i, j = result
print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]}")

heteroclinic

你使用"引用回复"就可以复制python 格式.

heteroclinic

OK 把 BUG 修上了, 你可以猜小一点的歌德巴赫.
比较有意思的是992这个数.不猜用binary search 走了11 步
arr[20] = 73, arr[156] = 919, sum = 992 Result: (20, 156), Steps: 11
猜了,走了6步
arr[60] = 283, arr[126] = 709, sum = 992 Result: (60, 126), Steps: 6 even 992
猜的方法还应该能够改进,没有完全达到设计的初衷
python3 Two_Sum_with_opportunistic_guess-gbc-03.py
992 这个数有两个或以上答案,这个题的前提条件排好序的数列,而且知道答案一定存在.我忘了答案是不是必须唯一.
另外这个程序大部分是chatgpt 写的,我提供个思路,然后加了一些调试.

import csv
from sympy import primerange
import math
import random

# Generate worst-case data where a_i ≈ b_i, clustering around target/2
def generate_worst_case_data(n, target):
half = target // 2
spread = n // 10 # small spread to cluster around target/2
data = sorted(random.randint(half - spread, half + spread) for _ in range(n))
return data

# Binary search to find the smallest index such that arr >= x
def binary_search_closest_ge(arr, low, high, x):
while low < high:
mid = (low + high) // 2
if arr[mid] < x:
low = mid + 1
else:
high = mid
return low

# Binary search to find the largest index such that arr <= x
def binary_search_closest_le(arr, low, high, x):
while low < high:
mid = (low + high + 1) // 2
if arr[mid] > x:
high = mid - 1
else:
low = mid
return low

# Find the left pointer such that arr + arr[j] >= target
def findLeft(arr, i, j, target):
while i < j:
if arr + arr[j] < target:
i += 1
else:
break
return i

# Find the right pointer such that arr + arr[j] <= target
def findRight(arr, i, j, target):
while i < j:
if arr + arr[j] > target:
j -= 1
else:
break
return j

# Check if a valid pair exists at (i, j)
def is_valid_pair(arr, i, j, target):
return 0 <= i < j < len(arr) and arr + arr[j] == target

# Main algorithm with log^2 strategy
def two_sum_log_squared(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
max_loops = int(math.log2(len(arr))) + 1

for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if i == -1 or j == -1 or i >= j:
return None, steps

pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)

stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)

guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)

gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps

i, j = newI, newJ

return None, steps

def two_sum_log_squared_guess_improved(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
#max_loops = int(math.log2(len(arr))) + 1
max_loops = (j + 1)*(j+1)

for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if is_valid_pair(arr, i,j, target):
return (i,j), steps
elif i == -1 or j == -1 or i >= j:
return None, steps

pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)

stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)

guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)

gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
#else:
#print(f'gi {gi} arr[gi] {arr[gi]} gj {gj} arr[gj] {arr[gj]}')
#i, j = gi, gj
# else:
# i, j = newI, newJ

return None, steps

# # Example usage
# if __name__ == "__main__":
# n = 10000
# target = 1000
# arr = generate_worst_case_data(n, target)
# result, steps = two_sum_log_squared(arr, target)
# print(f"Result: {result}, Steps: {steps}")
# if result:
# i, j = result
# print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr + arr[j]}")

n = 1000
# Generate list of primes less than 10000
primes = list(primerange(2, n))

# Goldbach conjecture: even numbers ≥ 6
even_numbers = list(range(6, n, 2))

# # Write results to CSV
# with open("goldbach_results.csv", "w", newline="") as csvfile:
# writer = csv.writer(csvfile)
# writer.writerow(["even_number", "prime1", "prime2", "index1", "index2", "loop_count"])

# for even in even_numbers:
# p1, p2, i1, i2, loops = find_goldbach_pair(even, primes)
# if p1 is not None:
# writer.writerow([even, p1, p2, i1, i2, loops])
# else:
# writer.writerow([even, "", "", "", "", loops])
for even in even_numbers:
target = even
result, steps = two_sum_log_squared_guess_improved(primes, target)
#print(f"Result: {result}, Steps: {steps}")
arr = primes
if result:
i, j = result
print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")
else:
i, j = -1,-1
print(f"arr[{i}] = {arr[i]}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")

heteroclinic

今天在积木蹬自行车的时候,想通了多个解怎么处理.
现在猜的时候只有一个pivot,可以选left pivot, right pivot, 扫到一个新的就换上去直到两个指针会合,指针会合之后才返回结果

heteroclinic

continue cosa nostra

Postulate:
For each even number n≥4n≥4, let pp and qq be the smallest pair of prime numbers such that p+q=np+q=n (with p≤qp≤q), and let i(p)i(p) and i(q)i(q) denote their respective indices in the ascending sequence of all prime numbers starting from index 0. Then the sequence of values Δ(n)=i(q)−i(p)Δ(n)=i(q)−i(p) grows gradually and is constrained by a smooth, slowly increasing bound. When sorted, the sequence of Δ(n)Δ(n) values for n≤Nn≤N (e.g., N=500N=500) exhibits sublinear acceleration with no abrupt deviations.

Zhikai–ChatGPT Team Postulate
https://chatgpt.com/share/68311b9c-d370 ... 7029e1f157

新未名空间

Two Sum Problem : aiming at (log n) square complexity

#1 Two Sum Problem : aiming at (log n) square complexity

#2 Re: Two Sum Problem : aiming at (log n) square complexity

#3 Re: Two Sum Problem : aiming at (log n) square complexity

#4 Re: Two Sum Problem : aiming at (log n) square complexity

#5 Re: Two Sum Problem : aiming at (log n) square complexity

#6 Re: Two Sum Problem : aiming at (log n) square complexity

#7 Re: Two Sum Problem : aiming at (log n) square complexity

#8 Re: Two Sum Problem : aiming at (log n) square complexity

#9 Re: Two Sum Problem : aiming at (log n) square complexity

#10 Re: Two Sum Problem : aiming at (log n) square complexity

#11 Re: Two Sum Problem : aiming at (log n) square complexity

#12 Re: Two Sum Problem : aiming at (log n) square complexity

#13 Re: Two Sum Problem : aiming at (log n) square complexity

#14 Re: Two Sum Problem : aiming at (log n) square complexity

#15 Re: Two Sum Problem : aiming at (log n) square complexity

#16 Re: Two Sum Problem : aiming at (log n) square complexity

#17 Re: Two Sum Problem : aiming at (log n) square complexity

#18 Re: Two Sum Problem : aiming at (log n) square complexity

#20 Re: Two Sum Problem : aiming at (log n) square complexity

#21 Re: Two Sum Problem : aiming at (log n) square complexity