Back to Blog

From ANN to DNN: Classifying Handwritten Digits with Deep Neural Networks

December 16, 2025 By Sangeeth Kariyapperuma
AI Machine Learning Deep Learning Neural Networks TensorFlow MNIST Tutorial
From ANN to DNN: Classifying Handwritten Digits with Deep Neural Networks

After building my first ANN to predict exam scores, I was ready for the next challenge: Deep Neural Networks. This time, instead of predicting one number, the network would recognize handwritten digits (0-9) from images!

🎯 What Makes This Different from ANN?

ANN (Previous Project)DNN (This Project)
Input: 1 number (hours)Input: 784 numbers (28×28 image)
Output: 1 number (score)Output: 10 classes (digits 0-9)
1 layerMultiple hidden layers
Linear problemNon-linear, complex patterns

Key insight: When problems get complex, we need depth — multiple layers that learn hierarchical features.


🖼 What is MNIST?

MNIST is the “Hello World” of machine learning — a dataset of 70,000 handwritten digit images:

  • Image size: 28 × 28 pixels
  • Color: Grayscale (0-255)
  • Labels: Digits 0 through 9
  • Training set: 60,000 images
  • Test set: 10,000 images

🧠 The DNN Architecture

Here’s the model that achieves 97%+ accuracy:

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

Let me break down each layer:


🔹 Layer 1: Flatten()

What it does: Converts the 2D image into a 1D vector.

Before: 28 × 28 image (matrix)
After:  784 numbers → [x1, x2, x3, ... x784]

Why needed? Dense layers expect a flat vector, not a 2D grid.


🔹 Layer 2: Dense(128, activation=‘relu’)

What it does:

  • 128 neurons, each connected to ALL 784 input pixels
  • Learns simple patterns (edges, curves)

ReLU activation:

output = max(0, x)

Why ReLU? Adds non-linearity. Without it, stacking layers would just be linear math — no real “depth.”


🔹 Layer 3: Dense(64, activation=‘relu’)

What it does:

  • 64 neurons receiving input from the 128 neurons above
  • Combines simple features into complex patterns
  • Learns digit-specific shapes

This is hierarchical learning — building complex understanding from simple parts!


🔹 Layer 4: Dense(10, activation=‘softmax’)

What it does:

  • 10 neurons — one for each digit (0-9)
  • Outputs probabilities that sum to 1.0

Example output:

[0.01, 0.02, 0.90, 0.01, 0.01, 0.02, 0.01, 0.01, 0.01, 0.00]

    Digit 2 has 90% probability → Prediction: 2

📊 Complete Training Code

import tensorflow as tf

# 1. Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# 2. Normalize pixel values (VERY IMPORTANT!)
x_train = x_train / 255.0
x_test = x_test / 255.0

# 3. Build the DNN model
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# 4. Compile model
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# 5. Train
model.fit(x_train, y_train, epochs=5, validation_split=0.1)

# 6. Evaluate
model.evaluate(x_test, y_test)

🤔 Questions I Asked (And Finally Understood)

Q: Why normalize pixels by dividing by 255?

Answer: Raw pixel values (0-255) are too large and vary too much. Normalizing to 0-1:

  • Makes training more stable
  • Helps gradients flow better
  • Speeds up convergence
x_train = x_train / 255.0  # Now values are 0.0 to 1.0

Q: Why use Adam instead of SGD?

Answer: Adam is “smarter” than basic SGD:

  • Adaptive Moment Estimation
  • Automatically adjusts learning rate per parameter
  • Works well out-of-the-box

For most deep learning, Adam is the default choice!


Q: What is sparse_categorical_crossentropy?

Answer: It’s the loss function for multi-class classification with integer labels.

  • Sparse = labels are integers (0, 1, 2, … 9)
  • Categorical = multiple classes
  • Crossentropy = measures how wrong the probability distribution is

If labels were one-hot encoded, we’d use categorical_crossentropy instead.


Q: What’s the difference between ANN and DNN?

Answer: DNN = ANN with multiple hidden layers.

FeatureANNDNN
Hidden layers0-12+ ✅
NeuronsFewMany ✅
LearningSimple patternsComplex, hierarchical ✅
Use caseLinear problemsImages, text, complex data

Key insight: All DNNs are ANNs, but not all ANNs are “deep.”


🧠 What Each Layer Learns (Hierarchical Features)

LayerWhat It Learns
FlattenRaw pixels
Dense 128Edges, simple curves
Dense 64Digit parts (loops, lines)
Dense 10Final digit classification

This is why deep learning works — it builds understanding layer by layer!


⚠️ Overfitting: The DNN Trap

Problem: If you add too many layers/neurons:

  • Training accuracy goes UP ↑
  • Test accuracy goes DOWN ↓

The model memorizes training data instead of learning patterns!

Solutions:

  • Dropout: Randomly disable neurons during training
  • Regularization: Penalize large weights
  • Early stopping: Stop training when validation accuracy plateaus
  • Less depth: Sometimes simpler is better

📈 Results

After just 5 epochs:

  • Training accuracy: ~98%
  • Validation accuracy: ~97%
  • Test accuracy: ~97-98%

The DNN correctly classifies handwritten digits with near-human accuracy! 🎉


🚀 What I Learned (Key Takeaways)

Depth matters — multiple layers learn complex patterns
Flatten converts images to vectors for Dense layers
ReLU adds non-linearity (essential for learning)
Softmax outputs probabilities for multi-class problems
Normalization is crucial for stable training
Adam optimizer works better than basic SGD
Overfitting is the enemy — regularize!


📁 Try It Yourself

I’ve open-sourced this project with full code and explanations:

🔗 GitHub: DNN Handwritten Digit Classification


🎯 What’s Next?

This DNN works great for simple images, but for complex images (cats, dogs, faces), we need something better: Convolutional Neural Networks (CNNs).

Stay tuned for my next project where I build a CNN to classify cats vs dogs! 🐱🐶


Questions about DNNs? Feel free to reach out — explaining concepts helps me learn too! 🧠✨

"Exploring technology through creative projects"

— K.M.N.Sangeeth Kariyapperuma

Navigation
HomeProjectsBlog
Connect

© 2026 NipunSGeeTH. All rights reserved.

Crafted with Love ❤️