Chapter 2

RNNs: Processing Sequences

How Recurrent Neural Networks introduced the concept of memory in neural networks for sequential data.

8 min read

The Problem: Data That Flows in Order

Early neural networks had a fundamental limitation—they treated each input independently. Feed them a sentence word by word, and they'd forget the previous word by the time they saw the next one. This made them useless for language, speech, stock prices, or anything where order matters.

Why Order Matters

Consider translating "The cat sat on the mat." To understand "sat," you need to remember "cat." To understand the sentence, you need context from every word before it. Standard neural networks couldn't do this—they had no memory.

The RNN Solution: Loops and Memory

Recurrent Neural Networks introduced a simple but powerful idea: loops. Instead of processing inputs in isolation, RNNs pass information from one step to the next. When processing the word "sat," the network also receives information about "The cat" from previous steps.

This creates a form of memory. The network maintains a hidden state that accumulates information as it processes the sequence. Each new input updates this hidden state, which then influences how the next input is processed.

How RNN Memory Works

Step 1: Process "The" → Store in hidden state
Step 2: Process "cat" + hidden state → Update hidden state
Step 3: Process "sat" + hidden state → Now knows about "The cat"
Result: Each word has context from all previous words

Where RNNs Work Well

Use CaseExampleWhy RNN Works
Character predictionAutocompleteShort sequences, simple patterns
Sensor dataIoT anomaly detectionTime-ordered readings
Simple speechBasic voice commandsShort utterances

The Critical Limitation: Vanishing Gradients

RNNs have a fatal flaw. As sequences get longer, the network struggles to remember information from early steps. During training, the error signals that help the network learn become weaker and weaker as they travel back through time—a problem called vanishing gradients.

The Vanishing Gradient Problem

In practice, RNNs work for sequences of maybe 10-20 steps. Ask them to remember something from 100 steps ago, and they fail. It's like trying to pass a message through 100 people—by the end, the original message is lost.

Technical Note

Backpropagation through time (BPTT) is how RNNs learn. Gradients must flow backward through every time step. With each step, gradients get multiplied by weights—if weights are less than 1, gradients shrink exponentially. This mathematical reality is the vanishing gradient problem.

This limitation directly led to the next evolution: LSTMs.

AI Assistant
00:00