Back to Blog

Understanding RNN: Sequence Learning and Memory

December 18, 2025 By Sangeeth Kariyapperuma
AI RNN LSTM Sequence Learning Deep Learning NLP

What is RNN?

RNN = Recurrent Neural Network

A neural network designed to process sequential data — text, time series, speech, any data where order matters.

Key difference from CNN/DNN: RNN has memory — it processes data one step at a time and remembers what it saw before.


Main Things I Learned

1. Sequential Data Has Order

Unlike images (CNNs), sequences have temporal order.

RNN processes left-to-right, understanding context builds as it goes. Order matters. Context matters.


2. What is Hidden State (Memory)?

RNN’s secret sauce is hidden state.

At each time step, the network takes:

  • Current input (current character)
  • Previous hidden state (memory from before)
  • Produces new hidden state (updated memory)

The hidden state is computed as:

$$h_t = \tanh(W_x \cdot x_t + W_h \cdot h_{t-1} + b)$$

This is memory: The network carries information from previous steps forward. Each step captures more context as it processes the sequence.


3. What Are Time Steps?

Time step = processing one element in sequence

Each step processes one character/word and passes hidden state to next step.

This is why it’s called “recurrent” — same computation happens at each step, with recurrent connection to previous hidden state.


4. Input Sequences and Targets

Model learns patterns from sequences and their targets.

Training: For each sequence, the model learns “if you see this sequence, predict that next element”


5. One-Hot Encoding

RNN doesn’t understand characters — only numbers.

Characters are mapped to integers, then converted to one-hot vectors for processing.

Why? Neural networks work with numerical representations. This encoding makes the input clear and uniform.


6. The Vanishing Gradient Problem (CRITICAL)

Problem: When RNN has long sequences, gradients get multiplied many times and become very small.

When computing gradients backwards through time:

$$\frac{\partial L}{\partial h_0} = \frac{\partial L}{\partial h_T} \prod_{t=1}^{T} \frac{\partial h_t}{\partial h_{t-1}}$$

If each $$\frac{\partial h_t}{\partial h_{t-1}} < 1$$, the product becomes exponentially small.

Result: Network forgets early elements. Long-range dependencies break.

Symptoms:

  • Can’t remember context from many steps ago
  • Fails on long sequences
  • Only learns local patterns

7. Why LSTM and GRU Exist

Solution: Add gates to control information flow.

LSTM (Long Short-Term Memory) uses multiple gates to control what gets forgotten, remembered, and output.

GRU (Gated Recurrent Unit):

  • Simpler than LSTM
  • Fewer gates, faster training
  • Similar performance

Result: Gradients flow better. Network remembers longer context.


8. How Embedding Layer Works

Instead of one-hot encoding (sparse, inefficient), use Embedding layers.

Character indices are converted to dense vector representations learned during training.

Why better?

  • Compact representation
  • Learned during training
  • Captures semantic meaning

9. Sequence Padding

Different sequences have different lengths.

Solution: Pad shorter sequences to same length so all sequences are uniform. This enables batch processing.


10. Training on Sequences

Model learns from many examples: “Given this context, predict that next element”

Over many examples, it learns language patterns and builds understanding of the data.


11. Text Generation

After training, you can generate new sequences by:

  • Starting with seed text
  • Predicting next element
  • Feeding prediction back as input
  • Repeating to generate long sequences

12. Why RNN is Limited Today

RNNs have problems:

  • Vanishing gradient (even with LSTM)
  • Can’t parallelize (must process sequentially)
  • Slow training on large datasets
  • Long-range dependencies still hard

Solution: Transformers (newer, better architecture)

  • No recurrence, processes entire sequence at once
  • Attention mechanism > RNN memory
  • Parallelizable = much faster

Key Takeaways

RNN has memory (hidden state) — carries context forward

Time steps = processing one element at a time

Sequential data has order — order matters

Hidden state is updated at each time step with recurrence

Vanishing gradient limits long-range memory

LSTM/GRU solve vanishing gradient with gates

Embedding layers compress one-hot encoding

Padding makes sequences uniform length

RNNs learn language patterns through training

Text generation works by predicting one step at a time

Transformers replaced RNNs for most NLP tasks today


Full Implementation

🔗 GitHub: RNN-Project-Next-Character-Prediction

See the repository for implementation details.


RNNs taught me that neural networks can have memory. That’s powerful. 🚀

"Exploring technology through creative projects"

— K.M.N.Sangeeth Kariyapperuma

Navigation
HomeProjectsBlog
Connect

© 2026 NipunSGeeTH. All rights reserved.

Crafted with Love ❤️