https://github.com/wangzhikai/TwoSumProblem/blob/main/README.md
TwoSumProblem
Two Sum Problem - - the well-known tech interview question and proposed solution aiming at (log n) square time complexity and O(1) space complexity.
The high level algorithm design is documented here. text
The original work of mine, POC completed is as this file. text
I asked Chatgpt to review the code. It provided a more concise version plus more test cases. Again, it is still POC protype and I don't aim it having the industry level robustness. text
With the assistance of Chatgpt, we are able to construct worst case test data. Which makes time complexity O(n). text
I have been thinking of the worst case for days. According to the analysis in the pdf file, I decided to add one step per loop in the algorithm "opportunistic guessing". It is not purely opportunistic as discussed in the pdf file. Chatgpt quickly implemented the idea as a Python file. text It shows case we can beat the O(n) case with a O( log^2 n) approach. I have not got the chance to thouroughly review the Python file. Any comments will be welcome!
Further Research An immediate application of this problem can be for Goldbach’s conjecture as computer validation in limited scope. A large enough even number is the sum of two prime numbers. Given a large even number t and a complete list of prime numbers in scope, we can search for two prime numbers one is less than t/2 and the other is greater than t/2.
Acknowledgement Thanks to Leetcode presenting the problem to the ardent CS students and programmers! I used Node JS, VS Code + Copilot for the POC. Chatgpt greatly speeds up orginal research. Python is also used in the POC of code and visualizing important clues. Here are some chat histories with Chatgpt: https://chatgpt.com/share/682cc20d-57d0-8013-a706-0c460c77d0a2 https://chatgpt.com/share/682cc226-2e38-8013-9a4f-2260ee0c65e8 https://chatgpt.com/share/682cc258-324c-8013-bcdb-7d471a70e78e
Two Sum Problem : aiming at (log n) square complexity
此博文来自论坛版块:STEM
SEE STH SAY STH!
Any saying about this sequence? try to relate to 2^1 to 2^20
0
0
2
5
14
33
74
159
340
715
1484
3068
6292
12872
26226
53285
108072
218754
442263
Storage Method Estimated Disk Space
Raw 64-bit integers ~750 petabytes
Efficient delta + compressed ~60 petabytes (or even less with better compression)
Ok, for very large primes, most of their gaps are very small, many twin primes. So given a start prime and and the end prime. We just compute the gaps and record the gaps. I believe this will make 10^18 smaller primes list taking far less disk space.
Prediction for N=10^18:
R(10^18)≈0.133
- - 我个人估计primes smaller than 4*10^18 需要 不到10PB disk space.
https://chatgpt.com/share/68394e58-64bc ... 7170b2e7f4
let's say any of the p_i i<=n be first n primes. testing p_n +2 modulo p_i, you can tell if it is odd multiple or a prime. 可以叫reverse sieve
Chatgpt 说,挺好,有类似的想法,还有更高阶的文章,算法复杂度也更低一些.但是一定程度,这么说逻辑上更简明.
prime kernel是chatgpt 在更早的聊天里提到的, 核心是给定一个小的质数清单,完全可以它来排除所有的偶数和奇合数,剩下的是质数. 也就是说,自然数空间完全可以用质数来构造.当然说sieve算法已经表达了这一理念.
进一步推,
Let's say Q_m is a none-empty subset P_n. P_n is the n first primes excluding 1 and 2. You can suppose P_n +2 is not a prime. Computationally you can keep checking P_n +2 *i is a prime. but we know P_n + 2 can be expressed as q1^{e1}* q2^{e2}... ql^{el}. So our goal is to find such Q_m so that q1^{e1}* q2^{e2}... ql^{el} is smallest , and there is a gap in P_n +2 *i.
也就是说把prime gap 问题换成找奇合数之间的最小漏洞.buddy说,可能是好主义,类似有观点,不完全一样.即便如此.这个是复杂的组合数学题目,同时也有可能相关的方向进展.
最后探讨了一下q1^{e1}* q2^{e2}... ql^{el} 是一个正则表达,那么完全可以图搜索算法来找GAP.生成了一些python 程序.今天没有时间了.
老人家说这个东西的确是毒草
For even number n, all the primes participate in the construction must pair with 3 in the first appearance. Why?
不过贴点这两天的心得体会
1. 构造完备的质数清单用的是sieve和任何猜想没有关系,有足够算力就足够大
2. 用two sum 搜索GBC猜想也是一个计算问题, 应该能够做到 (log n)^2 这个 n 是质数清单的尺寸.比2m 小.
3. 较小的数3被大量使用,很难总结出来什么规律,从左侧搜索远比log n 便宜,比如十万以内之用到前61个质数,不知道会有什么解释或启发
得先放一放
我觉得对于计算机专业以上很好理解,希望对数学业者有所帮助.
Postulate:
For each even number n≥4n≥4, let pp and qq be the smallest pair of prime numbers such that p+q=np+q=n (with p≤qp≤q), and let i(p)i(p) and i(q)i(q) denote their respective indices in the ascending sequence of all prime numbers starting from index 0. Then the sequence of values Δ(n)=i(q)−i(p)Δ(n)=i(q)−i(p) grows gradually and is constrained by a smooth, slowly increasing bound. When sorted, the sequence of Δ(n)Δ(n) values for n≤Nn≤N (e.g., N=500N=500) exhibits sublinear acceleration with no abrupt deviations.
Zhikai–ChatGPT Team Postulate
https://chatgpt.com/share/68311b9c-d370 ... 7029e1f157
现在猜的时候只有一个pivot,可以选left pivot, right pivot, 扫到一个新的就换上去直到两个指针会合,指针会合之后才返回结果
比较有意思的是992这个数.不猜用binary search 走了11 步
arr[20] = 73, arr[156] = 919, sum = 992 Result: (20, 156), Steps: 11
猜了,走了6步
arr[60] = 283, arr[126] = 709, sum = 992 Result: (60, 126), Steps: 6 even 992
猜的方法还应该能够改进,没有完全达到设计的初衷
python3 Two_Sum_with_opportunistic_guess-gbc-03.py
992 这个数有两个或以上答案,这个题的前提条件排好序的数列,而且知道答案一定存在.我忘了答案是不是必须唯一.
另外这个程序大部分是chatgpt 写的,我提供个思路,然后加了一些调试.
import csv
from sympy import primerange
import math
import random
# Generate worst-case data where a_i ≈ b_i, clustering around target/2
def generate_worst_case_data(n, target):
half = target // 2
spread = n // 10 # small spread to cluster around target/2
data = sorted(random.randint(half - spread, half + spread) for _ in range(n))
return data
# Binary search to find the smallest index such that arr >= x
def binary_search_closest_ge(arr, low, high, x):
while low < high:
mid = (low + high) // 2
if arr[mid] < x:
low = mid + 1
else:
high = mid
return low
# Binary search to find the largest index such that arr <= x
def binary_search_closest_le(arr, low, high, x):
while low < high:
mid = (low + high + 1) // 2
if arr[mid] > x:
high = mid - 1
else:
low = mid
return low
# Find the left pointer such that arr + arr[j] >= target
def findLeft(arr, i, j, target):
while i < j:
if arr + arr[j] < target:
i += 1
else:
break
return i
# Find the right pointer such that arr + arr[j] <= target
def findRight(arr, i, j, target):
while i < j:
if arr + arr[j] > target:
j -= 1
else:
break
return j
# Check if a valid pair exists at (i, j)
def is_valid_pair(arr, i, j, target):
return 0 <= i < j < len(arr) and arr + arr[j] == target
# Main algorithm with log^2 strategy
def two_sum_log_squared(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
max_loops = int(math.log2(len(arr))) + 1
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if i == -1 or j == -1 or i >= j:
return None, steps
pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)
stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)
guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)
gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
i, j = newI, newJ
return None, steps
def two_sum_log_squared_guess_improved(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
#max_loops = int(math.log2(len(arr))) + 1
max_loops = (j + 1)*(j+1)
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if is_valid_pair(arr, i,j, target):
return (i,j), steps
elif i == -1 or j == -1 or i >= j:
return None, steps
pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)
stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)
guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)
gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
#else:
#print(f'gi {gi} arr[gi] {arr[gi]} gj {gj} arr[gj] {arr[gj]}')
#i, j = gi, gj
# else:
# i, j = newI, newJ
return None, steps
# # Example usage
# if __name__ == "__main__":
# n = 10000
# target = 1000
# arr = generate_worst_case_data(n, target)
# result, steps = two_sum_log_squared(arr, target)
# print(f"Result: {result}, Steps: {steps}")
# if result:
# i, j = result
# print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr + arr[j]}")
n = 1000
# Generate list of primes less than 10000
primes = list(primerange(2, n))
# Goldbach conjecture: even numbers ≥ 6
even_numbers = list(range(6, n, 2))
# # Write results to CSV
# with open("goldbach_results.csv", "w", newline="") as csvfile:
# writer = csv.writer(csvfile)
# writer.writerow(["even_number", "prime1", "prime2", "index1", "index2", "loop_count"])
# for even in even_numbers:
# p1, p2, i1, i2, loops = find_goldbach_pair(even, primes)
# if p1 is not None:
# writer.writerow([even, p1, p2, i1, i2, loops])
# else:
# writer.writerow([even, "", "", "", "", loops])
for even in even_numbers:
target = even
result, steps = two_sum_log_squared_guess_improved(primes, target)
#print(f"Result: {result}, Steps: {steps}")
arr = primes
if result:
i, j = result
print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")
else:
i, j = -1,-1
print(f"arr[{i}] = {arr[i]}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")
import csv
from sympy import primerange
import math
import random
# Generate worst-case data where a_i ≈ b_i, clustering around target/2
def generate_worst_case_data(n, target):
half = target // 2
spread = n // 10 # small spread to cluster around target/2
data = sorted(random.randint(half - spread, half + spread) for _ in range(n))
return data
# Binary search to find the smallest index such that arr >= x
def binary_search_closest_ge(arr, low, high, x):
while low < high:
mid = (low + high) // 2
if arr[mid] < x:
low = mid + 1
else:
high = mid
return low
# Binary search to find the largest index such that arr <= x
def binary_search_closest_le(arr, low, high, x):
while low < high:
mid = (low + high + 1) // 2
if arr[mid] > x:
high = mid - 1
else:
low = mid
return low
# Find the left pointer such that arr + arr[j] >= target
def findLeft(arr, i, j, target):
while i < j:
if arr + arr[j] < target:
i += 1
else:
break
return i
# Find the right pointer such that arr + arr[j] <= target
def findRight(arr, i, j, target):
while i < j:
if arr + arr[j] > target:
j -= 1
else:
break
return j
# Check if a valid pair exists at (i, j)
def is_valid_pair(arr, i, j, target):
return 0 <= i < j < len(arr) and arr + arr[j] == target
# Main algorithm with log^2 strategy
def two_sum_log_squared(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
max_loops = int(math.log2(len(arr))) + 1
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if i == -1 or j == -1 or i >= j:
return None, steps
pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)
stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)
guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)
gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
i, j = newI, newJ
return None, steps
def two_sum_log_squared_guess_improved(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
#max_loops = int(math.log2(len(arr))) + 1
max_loops = j + 1
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if is_valid_pair(arr, i,j, target):
return (i,j), steps
elif i == -1 or j == -1 or i >= j:
return None, steps
# pivot = target // 2
# newI = binary_search_closest_ge(arr, i, j, pivot)
# newJ = binary_search_closest_le(arr, i, j, pivot)
# stepI = int((newI - i) * percent)
# stepJ = int((j - newJ) * percent)
# guessI = max(i, newI - stepI)
# guessJ = min(j, newJ + stepJ)
# gi = findLeft(arr, guessI, guessJ, target)
# gj = findRight(arr, guessI, guessJ, target)
# if is_valid_pair(arr, gi, gj, target):
# return (gi, gj), steps
# #i, j = gi, gj
# else:
# i, j = newI, newJ
return None, steps
# # Example usage
# if __name__ == "__main__":
# n = 10000
# target = 1000
# arr = generate_worst_case_data(n, target)
# result, steps = two_sum_log_squared(arr, target)
# print(f"Result: {result}, Steps: {steps}")
# if result:
# i, j = result
# print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr + arr[j]}")
n = 1000
# Generate list of primes less than 10000
primes = list(primerange(2, n))
# Goldbach conjecture: even numbers ≥ 6
even_numbers = list(range(6, n, 2))
# # Write results to CSV
# with open("goldbach_results.csv", "w", newline="") as csvfile:
# writer = csv.writer(csvfile)
# writer.writerow(["even_number", "prime1", "prime2", "index1", "index2", "loop_count"])
# for even in even_numbers:
# p1, p2, i1, i2, loops = find_goldbach_pair(even, primes)
# if p1 is not None:
# writer.writerow([even, p1, p2, i1, i2, loops])
# else:
# writer.writerow([even, "", "", "", "", loops])
for even in even_numbers:
target = even
result, steps = two_sum_log_squared_guess_improved(primes, target)
print(f"Result: {result}, Steps: {steps}")
arr = primes
if result:
i, j = result
print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]}")
import csv
from sympy import primerange
import math
import random
# Generate worst-case data where a_i ≈ b_i, clustering around target/2
def generate_worst_case_data(n, target):
half = target // 2
spread = n // 10 # small spread to cluster around target/2
data = sorted(random.randint(half - spread, half + spread) for _ in range(n))
return data
# Binary search to find the smallest index such that arr >= x
def binary_search_closest_ge(arr, low, high, x):
while low < high:
mid = (low + high) // 2
if arr[mid] < x:
low = mid + 1
else:
high = mid
return low
# Binary search to find the largest index such that arr <= x
def binary_search_closest_le(arr, low, high, x):
while low < high:
mid = (low + high + 1) // 2
if arr[mid] > x:
high = mid - 1
else:
low = mid
return low
# Find the left pointer such that arr + arr[j] >= target
def findLeft(arr, i, j, target):
while i < j:
if arr + arr[j] < target:
i += 1
else:
break
return i
# Find the right pointer such that arr + arr[j] <= target
def findRight(arr, i, j, target):
while i < j:
if arr + arr[j] > target:
j -= 1
else:
break
return j
# Check if a valid pair exists at (i, j)
def is_valid_pair(arr, i, j, target):
return 0 <= i < j < len(arr) and arr + arr[j] == target
# Main algorithm with log^2 strategy
def two_sum_log_squared(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
max_loops = int(math.log2(len(arr))) + 1
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if i == -1 or j == -1 or i >= j:
return None, steps
pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)
stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)
guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)
gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
i, j = newI, newJ
return None, steps
def two_sum_log_squared_guess_improved(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
#max_loops = int(math.log2(len(arr))) + 1
max_loops = j + 1
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if is_valid_pair(arr, i,j, target):
return (i,j), steps
elif i == -1 or j == -1 or i >= j:
return None, steps
pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)
stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)
guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)
gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
else:
print(f'gi {gi} arr[gi] {arr[gi]} gj {gj} arr[gj] {arr[gj]}')
i, j = gi, gj
# else:
# i, j = newI, newJ
return None, steps
# # Example usage
# if __name__ == "__main__":
# n = 10000
# target = 1000
# arr = generate_worst_case_data(n, target)
# result, steps = two_sum_log_squared(arr, target)
# print(f"Result: {result}, Steps: {steps}")
# if result:
# i, j = result
# print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr + arr[j]}")
n = 1000
# Generate list of primes less than 10000
primes = list(primerange(2, n))
# Goldbach conjecture: even numbers ≥ 6
even_numbers = list(range(6, n, 2))
# # Write results to CSV
# with open("goldbach_results.csv", "w", newline="") as csvfile:
# writer = csv.writer(csvfile)
# writer.writerow(["even_number", "prime1", "prime2", "index1", "index2", "loop_count"])
# for even in even_numbers:
# p1, p2, i1, i2, loops = find_goldbach_pair(even, primes)
# if p1 is not None:
# writer.writerow([even, p1, p2, i1, i2, loops])
# else:
# writer.writerow([even, "", "", "", "", loops])
for even in even_numbers:
target = even
result, steps = two_sum_log_squared_guess_improved(primes, target)
#print(f"Result: {result}, Steps: {steps}")
arr = primes
if result:
i, j = result
print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")
else:
i, j = -1,-1
print(f"arr[{i}] = {arr[i]}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")
哪怕数组每个元素读一遍 也要O(n)啊
现在本科教育怎么了?
这个看程序就是要看成本收益.比如编辑能发好的稿子,以后在委员会或业界会有更高威望.