Skip to content

CS3016/10. Implementing algorithms from geometry problems and large data sets

 Published: 3.5 hours read

This lab focuses on two major themes in modern computing — computational geometry and algorithms for large data sets. You’ll implement foundational algorithms that are widely used in fields like GIS, computer graphics, spatial databases, and big data analytics.

Table of contents

Open Table of contents

Introduction

Computational geometry provides the algorithmic tools needed to solve spatial problems, while large data sets require scalable, memory-efficient strategies. Mastering these domains prepares students to handle challenges in areas like:

In this lab, we implement:

  1. Convex Hull (2D) — the smallest polygon enclosing a set of points.
  2. Range Search using KD-Tree — efficient spatial querying.
  3. Streaming Algorithms for Large Data — approximate solutions using limited memory.

Geometry and Big Data Algorithms

1. Convex Hull using Graham Scan

Concept: Given a set of 2D points, the convex hull is the minimal convex boundary containing all points.

import matplotlib.pyplot as plt

def graham_scan(points):
    points = sorted(points)
    
    def cross(o, a, b):
        return (a[0] - o[0])*(b[1] - o[1]) - (a[1] - o[1])*(b[0] - o[0])

    lower, upper = [], []
    for p in points:
        while len(lower) >= 2 and cross(lower[-2], lower[-1], p) <= 0:
            lower.pop()
        lower.append(p)
    for p in reversed(points):
        while len(upper) >= 2 and cross(upper[-2], upper[-1], p) <= 0:
            upper.pop()
        upper.append(p)
    
    return lower[:-1] + upper[:-1]

2. 2D Range Search with KD-Tree

from scipy.spatial import KDTree

def build_kd_tree(points):
    return KDTree(points)

def range_query(kd_tree, query_point, radius):
    return kd_tree.query_ball_point(query_point, radius)

3. Count Distinct Elements in Streaming Data (using HyperLogLog-style estimation)

import hashlib
import random

def simulate_stream(n=100000):
    return [random.randint(0, 10**6) for _ in range(n)]

def hash_value(x):
    return int(hashlib.md5(str(x).encode()).hexdigest(), 16)

def estimate_cardinality(stream, num_buckets=64):
    max_trailing_zeros = [0]*num_buckets
    for val in stream:
        h = hash_value(val)
        bucket = h % num_buckets
        zeros = len(bin(h >> 6)) - len(bin((h >> 6).rstrip('0')))
        max_trailing_zeros[bucket] = max(max_trailing_zeros[bucket], zeros)
    avg = sum(max_trailing_zeros) / len(max_trailing_zeros)
    return int(2**avg * num_buckets * 0.77351)

Example Usage

Click to expand
# 1. Convex Hull
points = [(random.randint(0, 100), random.randint(0, 100)) for _ in range(30)]
hull = graham_scan(points)

# 2. KD-Tree Range Query
tree = build_kd_tree(points)
neighbors = range_query(tree, (50, 50), 20)

# 3. Streaming Approximation
stream_data = simulate_stream()
approx_unique = estimate_cardinality(stream_data)

print("Convex Hull Points:", hull)
print("Neighbors in range:", neighbors)
print("Estimated Unique Elements:", approx_unique)

Visualization

Click to expand
def plot_convex_hull(points, hull):
    xs, ys = zip(*points)
    hx, hy = zip(*(hull + [hull[0]]))  # Close the polygon

    plt.figure(figsize=(7, 7))
    plt.scatter(xs, ys, label="Points")
    plt.plot(hx, hy, 'r-', label="Convex Hull")
    plt.title("Convex Hull (Graham Scan)")
    plt.legend()
    plt.show()

plot_convex_hull(points, hull)

Time and Space Complexity

AlgorithmTime Complexity
Convex Hull (Graham Scan)\(O(n \log n)\)
KD-Tree Construction\(O(n \log n)\)
KD-Tree Range Query\(O(\log n + k)\)
Cardinality Estimation\(O(n)\)

Applications

  1. Convex Hull:

    • Geographic boundary detection
    • Collision detection in games
    • Object shape recognition
  2. KD-Tree Search:

    • Nearest neighbor in spatial datasets
    • Robotics and motion planning
    • Location-based services
  3. Streaming Algorithms:

    • Real-time analytics on sensor or clickstream data
    • Network traffic analysis
    • Social media trend monitoring

Conclusion

This lab bridges classic geometry algorithms with scalable data techniques. From determining geometric boundaries to handling millions of data points efficiently, these tools are foundational to both theoretical and applied computing.

References

  1. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C.
    Introduction to Algorithms, 3rd/4th Edition, MIT Press.
  1. Scalable Data Mining – Stanford CS246

Previous Post
CS3016/09. Implementing the Forward Algorithm
Next Post
Mapping of GECA's academic resources