23 1 月, 2026

From Perceptrons to Universal Approximators: Why We Stack Layers

This is Day 2 of my Machine Le…

This is Day 2 of my Machine Learning 10-Day Sprint Series.
Today’s topic: Neural Networks

“A neural network isn’t just a complex formula;
it’s a flexible mathematical fabric that reshapes itself to fit the contours of your data.”

Table of Contents

Today’s Reading Material:

But what is a neural network? by 3Blue1Brown

Why?

This is the critical moment in Computer Science – how doe we write a program to mimic how human brain works?

In this post, we’ll use the most fundamental problem in Machine Learning – Reconize Hand Written Digits as an example to walk you through the journey of learning how Neural Networks (NN) work.

Goal: Explain the transition from a simple “math formula” to a “learning machine.”

In Day 1, we looked at models that find a single pattern.

Today, we’ll learn how by stacking many simple “neurons” creates a system capable of learning complex, abstract hierarchies.

This post aims to answer three things:

1. The “What”: Defining the Architecture

Explain that a “Neuron” is just a mathematical container for Weights (importance), Biases (thresholds), and an Activation Function (the “spark”).

2. The “How”: Functional Approximation

This is the intellectual “meat” of the post. You want to communicate that a Neural Network is a Universal Function Approximator.

The Goal: Explain that by combining many small linear pieces and “bending” them with non-linearity, the network can mimic any complex shape or logic in existence.

3. The “Why”: The Magic of Backpropagation

You’ll learn the Feedback Loop.

The Goal: Describe how the model realizes it made a mistake at the end (Loss) and “whispers” back through the layers (Backpropagation) to tell each individual neuron how much it needs to change its weights to be more accurate next time.

Deep Dive Questions

While 3Blue1Brown makes the math look beautiful, the “Aha!” moment for a developer is understanding Signal Transformation.

The Question: “In a multi-layer network, we often say the early layers ‘see’ edges and the later layers ‘see’ complex objects. How does the Activation Function (the non-linearity) act as a gatekeeper for this information, and what would happen to the network’s ‘intelligence’ if we replaced every activation function with a simple linear one ($y = x$)?”

Notes from the post

Takeaway from the post

How do we bridge the gap between a “biological neuron” and a “mathematical function.”?
Use this mental framework:

The “Neuron” as a Weighted Sum: Think of a single neuron not as a brain cell, but as a coordinate transformer. It takes a bunch of inputs, assigns them importance (weights), and shifts them (bias).
The “Squish” (Activation): This is the most critical part. Without the activation function, no matter how many layers you stack, you are just doing basic algebra. The activation function allows the network to “bend” the coordinate space.
The “Universal Approximation Theorem”: This is the “holy grail” of Day 2. It states that a feed-forward network with even a single hidden layer can represent any continuous function, provided it has enough neurons and a non-linear activation.