COMPUTATIONAL ARCHITECTURE FOR AI

The Physicality of Computation

"The Cloud" is a marketing abstraction for remote data centers. All software runs on physical hardware.
Computation is a physical process requiring energy, matter, and time.
AI is particularly resource-intensive, pushing the limits of modern semiconductor physics.

Intent: Dispel the "magic" of software. AI performance is limited by physics, not just code.

COMPUTATIONAL ARCHITECTURE FOR AI

The Computational Stack: An Overview

Logic (CPU): General sequential processing.
Memory (RAM): Volatile system workspace.
Acceleration (GPU): Parallel processing units.
Throughput (VRAM Bandwidth): Speed of data movement.
Storage (SSD): Non-volatile persistence.
Bottleneck Principle: The system is only as fast as its slowest component.

Intent: Introduce the system architecture as a series of constraints and bottlenecks.

COMPUTATIONAL ARCHITECTURE FOR AI

CPU (Central Processing Unit): Sequential Logic

Optimized for Sequential Processing (complex logic, one step at a time).
Role: The "Conductor" — handles OS tasks, data preprocessing, and logic flow.
Limitation: Low core count makes it inefficient for the massive matrix multiplication required by AI.

Intent: Define the CPU's role. It coordinates but lacks the parallelism needed for AI at scale.

COMPUTATIONAL ARCHITECTURE FOR AI

RAM (System Memory): Volatile Workspace

Volatile memory that stores data currently in use by the CPU.
Role: Buffering data before it is sent to the GPU.
Constraint: Insufficient RAM leads to "swapping" (using slow SSD as fake RAM), which cripples performance.

Intent: Explain the role of system memory in the pipeline and the penalty of running out.

COMPUTATIONAL ARCHITECTURE FOR AI

GPU (Graphics Processing Unit): Parallel Acceleration

Optimized for Parallel Processing (thousands of simple operations simultaneously).
SIMD Architecture: Single Instruction, Multiple Data.
Originally for rendering pixels, now the primary engine for calculating neural network weights.

Intent: Explain why AI requires GPUs. Their architecture is mathematically aligned with matrix operations.

COMPUTATIONAL ARCHITECTURE FOR AI

VRAM (Video Random Access Memory): The Critical Bottleneck

High-bandwidth memory located physically on the GPU die.
Constraint: The entire active model must usually fit into VRAM for efficient inference.
If Model Size > VRAM Capacity = System Failure or extreme latency.

Intent: Identify VRAM as the single most critical factor for offline AI performance.

COMPUTATIONAL ARCHITECTURE FOR AI

Storage (NVMe SSD): Throughput & Latency

Non-volatile storage for model weights when not in use.
Metric: Read/Write Speed (Throughput).
Relevance: Affects Model Loading Time. Slow drives result in long wait times before inference begins.

Intent: Distinguish between "storage" and "memory." SSD speed impacts the "wake up" time of a model.

COMPUTATIONAL ARCHITECTURE FOR AI

Compute Constraints: Parameter Count vs. Hardware

Larger Models (More Parameters) = Higher VRAM requirement.
Quantization: Reducing the precision of weights (e.g., 16-bit to 4-bit) to fit larger models into smaller VRAM.
Engineering Trade-off: Precision vs. Performance.

Intent: Link software complexity to hardware requirements. Introduce quantization as an optimization technique.

COMPUTATIONAL ARCHITECTURE FOR AI

Lab Infrastructure: The "Local-First" Spec

Student Nodes: Ryzen 7 / 32GB RAM. Optimized for CPU-based inference and efficient quantized models.
Objective: Zero-latency, privacy-preserving, offline capability.
Democratization: Access to professional-grade compute without recurring subscription barriers.

Intent: Justify the specific hardware chosen for the course in terms of access and capability.

COMPUTATIONAL ARCHITECTURE FOR AI

Server Infrastructure: High-Performance Compute (HPC)

Teacher Core: NVIDIA RTX 4090 / 5070 Ti.
Role: Handling "Heavy" Inference (large parameter models) and Batch Processing.
Networked Inference: Offloading tasks from student nodes to the central core when local hardware is insufficient.

Intent: Explain the tiered architecture of the lab and the role of the central server.

COMPUTATIONAL ARCHITECTURE FOR AI

Distributed vs. Local Compute

Distributed (Cloud): Massive GPU clusters, high scalability, but high cost and privacy risk.
Local (Edge): Fixed hardware constraints, zero variable cost, total data sovereignty.
The "Edge AI" movement: Bringing intelligence to the device rather than the data center.

Intent: Contrast the lab model with the industry standard cloud model.

COMPUTATIONAL ARCHITECTURE FOR AI

Data Sovereignty & Privacy

Sovereignty: You own the input, the model processing, and the output.
No telemetry, no data scraping, no "training on user data."
Critical for sensitive fields (medical, legal, proprietary research).

Intent: Explain the "Why" of offline AI beyond just cost. It's about ownership.

COMPUTATIONAL ARCHITECTURE FOR AI

The Trade-offs of Local AI

Hardware Constraints: You cannot run massive models (like GPT-4) locally.
Maintenance: You are the sysadmin. Drivers, environments, and updates are your responsibility.
Thermodynamics: High-performance compute generates significant heat and power draw.

Intent: Maintain intellectual honesty. Great power comes with great maintenance responsibility.

COMPUTATIONAL ARCHITECTURE FOR AI

Computer Science Alignment

Understanding Hardware/Software Integration.
Resource Management & Optimization.
System Architecture Design.

Intent: Map these concepts to the curriculum standards.

COMPUTATIONAL ARCHITECTURE FOR AI

Transition to Prompt Engineering

"We have the theory and the machine. Now, how do we program the probabilistic engine?"

Intent: Set up the next module on prompting and spec-driven development.