https://github.com/wangzhikai/TwoSumProblem/blob/main/README.md
TwoSumProblem
Two Sum Problem - - the well-known tech interview question and proposed solution aiming at (log n) square time complexity and O(1) space complexity.
The high level algorithm design is documented here. text
The original work of mine, POC completed is as this file. text
I asked Chatgpt to review the code. It provided a more concise version plus more test cases. Again, it is still POC protype and I don't aim it having the industry level robustness. text
With the assistance of Chatgpt, we are able to construct worst case test data. Which makes time complexity O(n). text
I have been thinking of the worst case for days. According to the analysis in the pdf file, I decided to add one step per loop in the algorithm "opportunistic guessing". It is not purely opportunistic as discussed in the pdf file. Chatgpt quickly implemented the idea as a Python file. text It shows case we can beat the O(n) case with a O( log^2 n) approach. I have not got the chance to thouroughly review the Python file. Any comments will be welcome!
Further Research An immediate application of this problem can be for Goldbach’s conjecture as computer validation in limited scope. A large enough even number is the sum of two prime numbers. Given a large even number t and a complete list of prime numbers in scope, we can search for two prime numbers one is less than t/2 and the other is greater than t/2.
Acknowledgement Thanks to Leetcode presenting the problem to the ardent CS students and programmers! I used Node JS, VS Code + Copilot for the POC. Chatgpt greatly speeds up orginal research. Python is also used in the POC of code and visualizing important clues. Here are some chat histories with Chatgpt: https://chatgpt.com/share/682cc20d-57d0-8013-a706-0c460c77d0a2 https://chatgpt.com/share/682cc226-2e38-8013-9a4f-2260ee0c65e8 https://chatgpt.com/share/682cc258-324c-8013-bcdb-7d471a70e78e
Two Sum Problem : aiming at (log n) square complexity
此博文来自论坛版块:STEM
Postulate:
For each even number n≥4n≥4, let pp and qq be the smallest pair of prime numbers such that p+q=np+q=n (with p≤qp≤q), and let i(p)i(p) and i(q)i(q) denote their respective indices in the ascending sequence of all prime numbers starting from index 0. Then the sequence of values Δ(n)=i(q)−i(p)Δ(n)=i(q)−i(p) grows gradually and is constrained by a smooth, slowly increasing bound. When sorted, the sequence of Δ(n)Δ(n) values for n≤Nn≤N (e.g., N=500N=500) exhibits sublinear acceleration with no abrupt deviations.
Zhikai–ChatGPT Team Postulate
https://chatgpt.com/share/68311b9c-d370 ... 7029e1f157
现在猜的时候只有一个pivot,可以选left pivot, right pivot, 扫到一个新的就换上去直到两个指针会合,指针会合之后才返回结果
比较有意思的是992这个数.不猜用binary search 走了11 步
arr[20] = 73, arr[156] = 919, sum = 992 Result: (20, 156), Steps: 11
猜了,走了6步
arr[60] = 283, arr[126] = 709, sum = 992 Result: (60, 126), Steps: 6 even 992
猜的方法还应该能够改进,没有完全达到设计的初衷
python3 Two_Sum_with_opportunistic_guess-gbc-03.py
992 这个数有两个或以上答案,这个题的前提条件排好序的数列,而且知道答案一定存在.我忘了答案是不是必须唯一.
另外这个程序大部分是chatgpt 写的,我提供个思路,然后加了一些调试.
import csv
from sympy import primerange
import math
import random
# Generate worst-case data where a_i ≈ b_i, clustering around target/2
def generate_worst_case_data(n, target):
half = target // 2
spread = n // 10 # small spread to cluster around target/2
data = sorted(random.randint(half - spread, half + spread) for _ in range(n))
return data
# Binary search to find the smallest index such that arr >= x
def binary_search_closest_ge(arr, low, high, x):
while low < high:
mid = (low + high) // 2
if arr[mid] < x:
low = mid + 1
else:
high = mid
return low
# Binary search to find the largest index such that arr <= x
def binary_search_closest_le(arr, low, high, x):
while low < high:
mid = (low + high + 1) // 2
if arr[mid] > x:
high = mid - 1
else:
low = mid
return low
# Find the left pointer such that arr + arr[j] >= target
def findLeft(arr, i, j, target):
while i < j:
if arr + arr[j] < target:
i += 1
else:
break
return i
# Find the right pointer such that arr + arr[j] <= target
def findRight(arr, i, j, target):
while i < j:
if arr + arr[j] > target:
j -= 1
else:
break
return j
# Check if a valid pair exists at (i, j)
def is_valid_pair(arr, i, j, target):
return 0 <= i < j < len(arr) and arr + arr[j] == target
# Main algorithm with log^2 strategy
def two_sum_log_squared(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
max_loops = int(math.log2(len(arr))) + 1
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if i == -1 or j == -1 or i >= j:
return None, steps
pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)
stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)
guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)
gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
i, j = newI, newJ
return None, steps
def two_sum_log_squared_guess_improved(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
#max_loops = int(math.log2(len(arr))) + 1
max_loops = (j + 1)*(j+1)
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if is_valid_pair(arr, i,j, target):
return (i,j), steps
elif i == -1 or j == -1 or i >= j:
return None, steps
pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)
stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)
guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)
gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
#else:
#print(f'gi {gi} arr[gi] {arr[gi]} gj {gj} arr[gj] {arr[gj]}')
#i, j = gi, gj
# else:
# i, j = newI, newJ
return None, steps
# # Example usage
# if __name__ == "__main__":
# n = 10000
# target = 1000
# arr = generate_worst_case_data(n, target)
# result, steps = two_sum_log_squared(arr, target)
# print(f"Result: {result}, Steps: {steps}")
# if result:
# i, j = result
# print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr + arr[j]}")
n = 1000
# Generate list of primes less than 10000
primes = list(primerange(2, n))
# Goldbach conjecture: even numbers ≥ 6
even_numbers = list(range(6, n, 2))
# # Write results to CSV
# with open("goldbach_results.csv", "w", newline="") as csvfile:
# writer = csv.writer(csvfile)
# writer.writerow(["even_number", "prime1", "prime2", "index1", "index2", "loop_count"])
# for even in even_numbers:
# p1, p2, i1, i2, loops = find_goldbach_pair(even, primes)
# if p1 is not None:
# writer.writerow([even, p1, p2, i1, i2, loops])
# else:
# writer.writerow([even, "", "", "", "", loops])
for even in even_numbers:
target = even
result, steps = two_sum_log_squared_guess_improved(primes, target)
#print(f"Result: {result}, Steps: {steps}")
arr = primes
if result:
i, j = result
print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")
else:
i, j = -1,-1
print(f"arr[{i}] = {arr[i]}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")
import csv
from sympy import primerange
import math
import random
# Generate worst-case data where a_i ≈ b_i, clustering around target/2
def generate_worst_case_data(n, target):
half = target // 2
spread = n // 10 # small spread to cluster around target/2
data = sorted(random.randint(half - spread, half + spread) for _ in range(n))
return data
# Binary search to find the smallest index such that arr >= x
def binary_search_closest_ge(arr, low, high, x):
while low < high:
mid = (low + high) // 2
if arr[mid] < x:
low = mid + 1
else:
high = mid
return low
# Binary search to find the largest index such that arr <= x
def binary_search_closest_le(arr, low, high, x):
while low < high:
mid = (low + high + 1) // 2
if arr[mid] > x:
high = mid - 1
else:
low = mid
return low
# Find the left pointer such that arr + arr[j] >= target
def findLeft(arr, i, j, target):
while i < j:
if arr + arr[j] < target:
i += 1
else:
break
return i
# Find the right pointer such that arr + arr[j] <= target
def findRight(arr, i, j, target):
while i < j:
if arr + arr[j] > target:
j -= 1
else:
break
return j
# Check if a valid pair exists at (i, j)
def is_valid_pair(arr, i, j, target):
return 0 <= i < j < len(arr) and arr + arr[j] == target
# Main algorithm with log^2 strategy
def two_sum_log_squared(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
max_loops = int(math.log2(len(arr))) + 1
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if i == -1 or j == -1 or i >= j:
return None, steps
pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)
stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)
guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)
gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
i, j = newI, newJ
return None, steps
def two_sum_log_squared_guess_improved(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
#max_loops = int(math.log2(len(arr))) + 1
max_loops = j + 1
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if is_valid_pair(arr, i,j, target):
return (i,j), steps
elif i == -1 or j == -1 or i >= j:
return None, steps
# pivot = target // 2
# newI = binary_search_closest_ge(arr, i, j, pivot)
# newJ = binary_search_closest_le(arr, i, j, pivot)
# stepI = int((newI - i) * percent)
# stepJ = int((j - newJ) * percent)
# guessI = max(i, newI - stepI)
# guessJ = min(j, newJ + stepJ)
# gi = findLeft(arr, guessI, guessJ, target)
# gj = findRight(arr, guessI, guessJ, target)
# if is_valid_pair(arr, gi, gj, target):
# return (gi, gj), steps
# #i, j = gi, gj
# else:
# i, j = newI, newJ
return None, steps
# # Example usage
# if __name__ == "__main__":
# n = 10000
# target = 1000
# arr = generate_worst_case_data(n, target)
# result, steps = two_sum_log_squared(arr, target)
# print(f"Result: {result}, Steps: {steps}")
# if result:
# i, j = result
# print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr + arr[j]}")
n = 1000
# Generate list of primes less than 10000
primes = list(primerange(2, n))
# Goldbach conjecture: even numbers ≥ 6
even_numbers = list(range(6, n, 2))
# # Write results to CSV
# with open("goldbach_results.csv", "w", newline="") as csvfile:
# writer = csv.writer(csvfile)
# writer.writerow(["even_number", "prime1", "prime2", "index1", "index2", "loop_count"])
# for even in even_numbers:
# p1, p2, i1, i2, loops = find_goldbach_pair(even, primes)
# if p1 is not None:
# writer.writerow([even, p1, p2, i1, i2, loops])
# else:
# writer.writerow([even, "", "", "", "", loops])
for even in even_numbers:
target = even
result, steps = two_sum_log_squared_guess_improved(primes, target)
print(f"Result: {result}, Steps: {steps}")
arr = primes
if result:
i, j = result
print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]}")
import csv
from sympy import primerange
import math
import random
# Generate worst-case data where a_i ≈ b_i, clustering around target/2
def generate_worst_case_data(n, target):
half = target // 2
spread = n // 10 # small spread to cluster around target/2
data = sorted(random.randint(half - spread, half + spread) for _ in range(n))
return data
# Binary search to find the smallest index such that arr >= x
def binary_search_closest_ge(arr, low, high, x):
while low < high:
mid = (low + high) // 2
if arr[mid] < x:
low = mid + 1
else:
high = mid
return low
# Binary search to find the largest index such that arr <= x
def binary_search_closest_le(arr, low, high, x):
while low < high:
mid = (low + high + 1) // 2
if arr[mid] > x:
high = mid - 1
else:
low = mid
return low
# Find the left pointer such that arr + arr[j] >= target
def findLeft(arr, i, j, target):
while i < j:
if arr + arr[j] < target:
i += 1
else:
break
return i
# Find the right pointer such that arr + arr[j] <= target
def findRight(arr, i, j, target):
while i < j:
if arr + arr[j] > target:
j -= 1
else:
break
return j
# Check if a valid pair exists at (i, j)
def is_valid_pair(arr, i, j, target):
return 0 <= i < j < len(arr) and arr + arr[j] == target
# Main algorithm with log^2 strategy
def two_sum_log_squared(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
max_loops = int(math.log2(len(arr))) + 1
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if i == -1 or j == -1 or i >= j:
return None, steps
pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)
stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)
guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)
gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
i, j = newI, newJ
return None, steps
def two_sum_log_squared_guess_improved(arr, target):
i, j = 0, len(arr) - 1
percent = 0.5
steps = 0
#max_loops = int(math.log2(len(arr))) + 1
max_loops = j + 1
for _ in range(max_loops):
steps += 1
i = findLeft(arr, i, j, target)
j = findRight(arr, i, j, target)
if is_valid_pair(arr, i,j, target):
return (i,j), steps
elif i == -1 or j == -1 or i >= j:
return None, steps
pivot = target // 2
newI = binary_search_closest_ge(arr, i, j, pivot)
newJ = binary_search_closest_le(arr, i, j, pivot)
stepI = int((newI - i) * percent)
stepJ = int((j - newJ) * percent)
guessI = max(i, newI - stepI)
guessJ = min(j, newJ + stepJ)
gi = findLeft(arr, guessI, guessJ, target)
gj = findRight(arr, guessI, guessJ, target)
if is_valid_pair(arr, gi, gj, target):
return (gi, gj), steps
else:
print(f'gi {gi} arr[gi] {arr[gi]} gj {gj} arr[gj] {arr[gj]}')
i, j = gi, gj
# else:
# i, j = newI, newJ
return None, steps
# # Example usage
# if __name__ == "__main__":
# n = 10000
# target = 1000
# arr = generate_worst_case_data(n, target)
# result, steps = two_sum_log_squared(arr, target)
# print(f"Result: {result}, Steps: {steps}")
# if result:
# i, j = result
# print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr + arr[j]}")
n = 1000
# Generate list of primes less than 10000
primes = list(primerange(2, n))
# Goldbach conjecture: even numbers ≥ 6
even_numbers = list(range(6, n, 2))
# # Write results to CSV
# with open("goldbach_results.csv", "w", newline="") as csvfile:
# writer = csv.writer(csvfile)
# writer.writerow(["even_number", "prime1", "prime2", "index1", "index2", "loop_count"])
# for even in even_numbers:
# p1, p2, i1, i2, loops = find_goldbach_pair(even, primes)
# if p1 is not None:
# writer.writerow([even, p1, p2, i1, i2, loops])
# else:
# writer.writerow([even, "", "", "", "", loops])
for even in even_numbers:
target = even
result, steps = two_sum_log_squared_guess_improved(primes, target)
#print(f"Result: {result}, Steps: {steps}")
arr = primes
if result:
i, j = result
print(f"arr[{i}] = {arr}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")
else:
i, j = -1,-1
print(f"arr[{i}] = {arr[i]}, arr[{j}] = {arr[j]}, sum = {arr[i] + arr[j]} Result: {result}, Steps: {steps} even {even}")
哪怕数组每个元素读一遍 也要O(n)啊
现在本科教育怎么了?
这个看程序就是要看成本收益.比如编辑能发好的稿子,以后在委员会或业界会有更高威望.
当然比如我可以证明只有Pr = O(log n) / n的情况下才会需要O(log n)时间复杂度,其他时候都是O(log n)这样也行。需要更多分析了
得先裸奔才有一点希望做到吧。
I ask chatgpt, buddy says, still the binary search worst case is O n log n.
So there is analysis in my writings. it is not really O n log n. It is O k log n. k is the min (a_i, l -b_i).
From pytthon visualization k maximize when a_i /approx b_i /approx l/2
Here, we see when binary search degenerates, a_i and b_i are pushed nearer almost together.
So the point I trying to make is not binary search. It has been there. But, when we found binary search moves slow, we make a new guess range then confirm it with binary search. In program chatgpt wrote, it says beat the worst case 500 loop count to below 10.
Thanks all for reading.
你自己没看