Back to Blog

Understanding CNN: What I Learned

December 18, 2025 By Sangeeth Kariyapperuma
AI CNN Deep Learning Computer Vision

What is CNN?

CNN = Convolutional Neural Network

A neural network designed specifically for images.

Core idea: Instead of treating an image as one long list of numbers (like traditional neural networks), CNN uses filters to scan the image in small regions, extracting visual patterns like edges, textures, and shapes.


Main Points I Learned

1. Why CNN Works Better Than DNN for Images

DNN (Deep Neural Network) approach:

  • Flattens image into 1D vector
  • Treats each pixel independently
  • Loses spatial information (doesn’t know pixels are neighbors)
  • Result: poor performance on images

CNN approach:

  • Uses filters to scan locally (3×3, 5×5 regions)
  • Respects that neighboring pixels matter
  • Learns what edges, textures, shapes look like
  • Result: much better performance on images

Key insight: Images have spatial structure. CNN preserves it. DNN destroys it.


2. Hierarchical Feature Learning

CNN learns in layers — each layer extracts increasingly complex features:

Layer 1 (Early layers):

  • Detects simple features
  • Edges (vertical, horizontal, diagonal lines)

Layer 2 (Middle layers):

  • Detects complex features
  • Textures, corners, curves

Layer 3 (Deep layers):

  • Detects object parts
  • Eyes, ears, nose, whiskers

Output layer:

  • Combines all features
  • Makes final classification

This hierarchical approach is powerful — it’s how humans see too.


3. What Convolutional Filters Do

A convolutional filter is a small window (3×3, 5×5) that slides across the image.

Purpose: Detect specific features (edge detector, texture detector, etc.)

How it works:

  • Move filter across image
  • Calculate similarity at each position
  • Build feature map showing where that feature appears
  • Stack many filters → learn many features

Why it’s efficient:

  • Same filter used everywhere in image (parameter sharing)
  • Only looks at local neighborhoods (respects structure)

4. Transfer Learning

Problem: Training from scratch needs lots of data and time.

Solution: Transfer learning — reuse weights from models already trained on millions of images.

How it works:

  1. Load pre-trained model (trained on ImageNet or similar)
  2. Freeze those weights (don’t change them)
  3. Add new layers for your specific task (cats vs dogs)
  4. Train only the new layers

Why it works: Pre-trained model already learned what edges, textures, shapes look like. We reuse that knowledge instead of learning from scratch.


5. Feature Extraction vs Fine-tuning

Phase 1 — Feature Extraction:

  • Keep pre-trained weights locked
  • Train only the new classification layers
  • Fast and efficient

Phase 2 — Fine-tuning (optional):

  • Unlock some pre-trained layers
  • Retrain entire network with very small learning rate
  • Adapts features to your specific problem

6. Why Deep Networks Work

The hierarchy of features (edges → textures → shapes → objects) is the reason deep learning is so powerful.

Each layer learns something more abstract and complex than the previous layer. By combining all these levels, the network can understand complicated patterns.


7. Activation Functions Matter

ReLU (Rectified Linear Unit) removes negative values:

  • Simple but effective
  • Helps network learn non-linear patterns
  • Prevents information loss

Without activation functions, stacking layers wouldn’t help — the network would still just be linear.


8. Regularization Prevents Overfitting

Dropout: Randomly ignore some neurons during training

  • Prevents network from relying on specific neurons
  • Makes network more robust
  • Improves generalization

Why it matters: Network trained on limited data can memorize instead of learning. Dropout helps prevent this.


9. Batch Size and Learning Rate Matter

Batch size: How many examples before updating weights

  • Larger batches: stable but slower convergence
  • Smaller batches: noisier but faster convergence

Learning rate: How big a step to take when updating weights

  • Too high: overshoots, training unstable
  • Too low: converges slowly or gets stuck
  • Right amount: fast and stable convergence

10. Monitoring Training is Critical

Plot these:

  • Training accuracy vs validation accuracy
  • Training loss vs validation loss

What to look for:

  • Underfitting: Both low (model too simple)
  • Overfitting: Training high, validation low (model memorized)
  • Good fit: Both high and close together

Key Takeaways

CNNs use filters to respect image spatial structure

Hierarchical learning: edges → textures → shapes → objects

Transfer learning reuses pre-trained knowledge

Feature extraction phase is fast, fine-tuning adapts to your task

Activation functions enable non-linear learning

Regularization (dropout) prevents overfitting

Learning rate and batch size significantly affect training

Always monitor training curves

Deep networks work because of hierarchical features

Deep learning today = transfer learning + smart engineering


Understanding CNN and transfer learning is fundamental to modern computer vision. 🚀

"Exploring technology through creative projects"

— K.M.N.Sangeeth Kariyapperuma

Navigation
HomeProjectsBlog
Connect

© 2026 NipunSGeeTH. All rights reserved.

Crafted with Love ❤️