CNN Flower Classification

Automated Flower Classification using Deep CNNs

Introduction

This project develops and evaluates a deep convolutional neural network (CNN) for automated classification of flower species from images using the Oxford 102 Category Flower Dataset ^[3]. The network was implemented in PyTorch 2.2.2, trained on HPC resources (Viking cluster with H100 GPU), and achieved a final accuracy of 75.12% on the test set after extensive hyperparameter optimization.

Image classification represents a fundamental task in computer vision with applications spanning from medical image analysis ^[1] to botanical research ^[3]. Convolutional Neural Networks (CNNs) have emerged as the dominant approach for image classification since their breakthrough in 2012 ^[2].

Difficulties

The task of flower species classification presents unique challenges compared to simpler classification problems such as license plate recognition or manufacturing defect detection. The primary difficulties include:

High intra-class variability: Same species appearing dramatically different under varying lighting conditions, growth stages, and environmental factors
High inter-class similarity: Different species sharing similar visual characteristics (color, shape, texture patterns)
Complex backgrounds: Natural environments with varying lighting and occlusion patterns

Applications

Automated flower classification has significant practical applications:

Botanical research: Accelerating species identification processes ^[3]
Plant breeding: Facilitating trait selection and genetic studies ^[4]
Ecological conservation: Rapid identification and monitoring of invasive species ^[5]
Agricultural optimization: Supporting crop management and yield prediction

Methodology

Dataset: Oxford 102 Category Flower Dataset

The Oxford 102 Category Flower Dataset contains images of flowers commonly occurring in the United Kingdom, with each class consisting of between 40 and 258 images. The dataset uses predefined splits:

Training set: 12.5% (1,020 images)
Validation set: 12.5% (1,020 images)
Test set: 75% (6,149 images)

Each image was preprocessed by resizing to 224 × 224 pixels to match standard CNN input dimensions.

Data Augmentation Strategy

To address the limited training data and class imbalance issues ^[13], extensive data augmentation was implemented:

training_transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.RandomRotation(30),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.ColorJitter(brightness=0.2, contrast=0.1, 
                          saturation=0.1, hue=0.1),
    transforms.RandomAffine(degrees=20, translate=(0.1, 0.1), 
                           scale=(0.8, 1.2)),
    transforms.RandomResizedCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], 
                        [0.229, 0.224, 0.225])
])

This augmentation strategy effectively doubled the training dataset size from 1,020 to 2,040 images, improving model generalization ^[12].

Network Architecture

The CNN architecture draws inspiration from VGG ^[8] and ResNet ^[9] designs, incorporating modern deep learning principles ^[6]:

CNN Architecture Diagram

Convolutional Stack (9 layers):

Progressive filter expansion: $3 \to 32 \to 32 \to 64 \to 64 \to 128 \to 128 \to 256 \to 256 \to 512$
Batch Normalization after each convolutional layer for training stability
ReLU activation ^[7]: $f(x) = \max(0, x)$
MaxPooling layers with $2 \times 2$ kernels for spatial downsampling
Dropout ( $p = 0.5$ ) for regularization

Fully Connected Classifier:

Flattened feature maps: $512 \times 7 \times 7 = 25,088$ dimensions
FC1: $25,088 \to 2,048$ with BatchNorm1d and ReLU
FC2: $2,048 \to 1,024$ with Dropout (0.5) and ReLU
FC3: $1,024 \to 512$ with BatchNorm1d, Dropout (0.5), and ReLU
Output: $512 \to 102$ classes

Loss Function and Optimization

The model optimization employs cross-entropy loss ^[10], mathematically defined as:

\mathcal{L} = -\sum_{c=1}^{M} y_{o,c} \log(p_{o,c})

where:

$M = 102$ — number of flower classes
$y_{o,c}$ — ground truth one‑hot encoded label
$p_{o,c}$ — predicted probability for class $c$

The Adam optimizer ^[11] was selected for parameter updates:

m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t

v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2

\theta_{t+1} = \theta_t - \frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon} \hat{m}_t

where:

Learning rate: $\alpha = 0.0001$
Weight decay: $\lambda = 0.0001$
Adam defaults: $\beta_1 = 0.9,\; \beta_2 = 0.999,\; \epsilon = 10^{-8}$

Training Procedure

Hyperparameter Configuration:

Parameter	Tested Values	Selected Value
Learning Rate	([0.001, 0.0001, 0.00001])	(0.0001)
Batch Size	([8, 16, 32, 64])	8 (train), 64 (val/test)
Epochs	([1, 1200])	1000
Image Size	Various	(224 \times 224)

Learning Rate Scheduling:

\text{lr}_{new} = \text{lr}_{old} \times \gamma^{\lfloor \frac{\text{epoch}}{\text{step\_size}} \rfloor}

where $\gamma = 0.99$ and step size = 500 epochs.

The training process involved:

Forward propagation through the 9-layer convolutional stack
Feature extraction via global average pooling
Classification through the 3-layer fully connected network
Backpropagation using Adam optimizer for weight updates

Implementation Details

Hardware & Software:

Cluster: Viking HPC (University of York)
GPU: 1× NVIDIA H100 (80GB HBM3)
Framework: PyTorch 2.2.2
Training Time: 5.5 hours
Language: Python 3.11

Results & Evaluation

Model Performance

The final trained model achieved a test accuracy of 75.12% across all 102 flower categories, representing significant improvement over the initial baseline accuracy of 4%.

Training Progress

Class-wise Performance Analysis

A detailed examination of the confusion matrix reveals that the model's performance varies significantly across different flower species, largely due to visual similarities and dataset imbalances.

Notably, flowers with similar appearances—such as certain lilies, orchids, and daisies—were often confused with one another, resulting in lower per-class accuracy. This trend highlights the limitations of the model when distinguishing between species with overlapping visual features, particularly in the presence of complex backgrounds or occlusions. The confusion matrix further indicates that classes with fewer training samples are more susceptible to misclassification, underscoring the importance of balanced datasets and targeted data augmentation for underrepresented categories.

Overall, while the model demonstrates strong performance on visually distinctive and well-represented classes, it struggles with species that are either underrepresented or visually similar to others. Future work could address these challenges by incorporating advanced augmentation techniques, leveraging transfer learning from larger botanical datasets, or integrating attention mechanisms to help the model focus on subtle discriminative features.

Conclusion & Future Work

This project successfully demonstrated the effectiveness of deep CNNs for automated flower species classification, achieving 75.12% accuracy on the challenging Oxford 102 dataset. The performance is particularly noteworthy given the small dataset size and high inter-class similarity.

Acknowledgments

This research was conducted using the Viking High-Performance Computing cluster provided by the University of York. I also thank the IT Services and Research IT team for computational support and infrastructure maintenance.

Charlie Piper