Building a Plant Disease Classification Web App using Keras and Tensorflow.js (PART 1)

Similoluwa Okunowo
10 min readAug 2, 2020

--

Plant diseases pose a major threat to local and national economies largely dependent on agriculture, challenge food security through reduction in crop yield, and also affect the general livelihood of farmers and agriculture practitioners. Conventional methods for identifying plant diseases, such as visual inspection by humans have proven to be very ineffective, therefore it is very imperative to develop improved techniques for plant disease identification and classification to prevent potential crop losses.

In this tutorial, I will teach you how to solve this problem comprehensively, employing deep learning methods for multi-class image classification, and Tensorflow.js to deploy the built model and make inferences on a browser. All the essential steps are well-discussed in this detailed tutorial, from downloading the datasets via Kaggle API to building a model via transfer learning with Keras (using the MobileNet architecture), and finally deploying the model for inference on the web using Tensorflow.js.

Requirement / Pre-requisites

This is a beginner-friendly guide, however, you are expected to have basic knowledge of Python, and Javascript, Working with a Jupyter notebook, and building Machine learning or Deep learning models. Please do not fret if you don’t meet these requirements, the tutorial will be explained in simplified steps to at least gain fresh insights. Other necessary requirements are:-

  • A Google account to access Google Drive and a Google Colab notebook ( We will be using this Google Colab environment and notebook to streamline the model building/training process, and access a free GPU, Check here to get started)
  • A Kaggle account to download the required datasets for training our model via Kaggle API credentials, Kindly check https://www.kaggle.com/to get started
  • An IDE (i.e VSCode) for writing the Javascript codes
  • And most importantly, the willingness to learn.

Getting started

The first step to getting started is to set up your environment.

  1. Access your Google drive on https://drive.google.com/, and Create a new folder i.e ‘PLANT DISEASE RECOGNITION’
  2. Create a new Google colab file in that folder to set up the environment i.e ‘plantdiseaserecognition.ipynb’
  3. Change your Google Colab runtime to a GPU to optimize the model training process. Runtime > Change runtime type > Select GPU as your preferred Hardware accelerator.

So folks, let’s Get started!

Codes and Implementation

The implementation section will be structured into a sequence of detailed steps, I know you are very prepared for this.

STEP 1:- Setting up the environment and Connecting Google Colab to our Google Drive account

This step involves setting up the environment and directories, where we will save the datasets which will be used for training our model, via connecting with our Google Drive account.

Firstly, you have to connect your Google Colab environment with your Google Drive account, and change your working directory to the folder you created previously on Google Drive ‘PLANT DISEASE RECOGNITION’. Run this cell in your notebook and authorize it as required.

# Connecting Google drive to Google colab environment
from google.colab import drive
drive.mount('/content/drive')
# Change working directory to folder created previously
cd '/content/drive/My Drive/PLANT DISEASE RECOGNITION'

The next step is to create the necessary folders we will be needing to structure the project well. These folders are ‘config’ (for saving our configuration files), ‘models’ (for saving our trained models and weights), ‘datasets’ (for saving our downloaded datasets), and ‘checkpoints’ (for saving our model training checkpoints). You can easily do that by running this command in another code cell:-

# Creating neccesary directories
!mkdir config datasets checkpoints models

STEP 2:- Downloading the required datasets from Kaggle using Kaggle API

This step entails downloading the required datasets for training from Kaggle. We will use the New plant diseases dataset publicly available on Kaggle, and download it via a generated API Token.

To get your API Key, sign in to your Kaggle account on https://kaggle.com and navigate to your account section:-

image from StackOverflow by Bob Smith

Scroll down to the API section on your Account page, and click the ‘Create New API Token’. A ‘kaggle.json’ file will be downloaded to your local machine which contains your API Credentials.

Navigate to your Google drive, and upload the downloaded JSON file to your ‘config’ directory. Alternatively, you can run the following codes in Google colab to automate the process:-

# Change directory to the previously created 'config' folder
cd config
# Upload the downloaded json file from your computer to Google drive
from google.colab import files
files.upload()

The ‘kaggle.json’ file has now been uploaded successfully to the ‘config’ folder.

Inside your Colab notebook, run this code cell to give your notebook access to the ‘kaggle.json’ file:-

import osos.environ['KAGGLE_CONFIG_DIR'] = "/content/drive/My Drive/PLANT DISEASE RECOGNITION/config"

Great work so far, change the working directory to ‘datasets’ in a new code cell, where we will download the datasets into.

cd '/content/drive/My Drive/PLANT DISEASE RECOGNITION/datasets'

We will be using the New Plant Diseases Dataset on Kaggle which contains 87k images of healthy and infected crop leaves categorized into 38 distinct classes. Go to > https://www.kaggle.com/vipoooool/new-plant-diseases-dataset on your browser to access the dataset. Click the three-dots icon > Copy the API Command > and paste in a new code cell to download the zipped datasets into the current directory ‘datasets’.

!kaggle datasets download -d vipoooool/new-plant-diseases-dataset

After downloading, unzip the downloaded datasets using this command in a new code cell:-

#Unzipping the zip files to extract the dataset folder and deleting the zip files!unzip \*.zip  && rm *.zip

Wonderful! After unzipping, assign the base directory for the datasets to a variable ‘base_dir’

base_dir = './New Plant Diseases Dataset(Augmented)/New Plant Diseases Dataset(Augmented)'# Check the directories in the base_dir , OUTPUT = ['train', 'valid']os.listdir(base_dir)

End of STEP 2, Let’s move to the next section

STEP 3:- Importing required libraries and Loading the Training and Validation datasets using ImageDataGenerator (for Data Augmentation)

This section entails loading the datasets required for training our model.

The first step is to Import/ Load the necessary libraries in a new code cell:-

As previously shown, The total dataset is divided into an 80/20 ratio of training and validation sets and saved in different directories to preserve the directory structure. The ‘train’ folder contains the training dataset and the ‘valid’ folder contains the validation set.

We will use the ImageDataGenerator class imported from ‘keras.preprocessing.image’ to generate random batches of tensor image data, and also perform real-time data augmentation on them. With Data Augmentation, we can perform random normalization, scaling methods, and transformations on our dataset to prevent overfitting and ensure that our model generalizes properly.

The ImageDataGenerator also provides methods to load augmented images from dataset directories using the ‘flow_from_directory()’ method, and also from pandas data frames using the ‘flow_from_dataframe()’ method. The generators can also be passed as inputs to Keras model methods that accept generator inputs such as ‘fit_generator()’ to train our model.

Run this code in a new code cell to perform data augmentation and transformations for the train and validation dataset:-

The next step is to load the images ( using the flow_from_directory() method on the generators ) from the parent directories containing the folders for each distinct category/class. The parent directories in our case are ‘train’ (For train dataset) and ‘valid’ (For validation dataset). The classes/label names will be automatically generated from the names of the sub-directories, hence we do not need to define them explicitly.

Run this in a new code cell to perform that operation:-

Output:-

Amazing !, you have successfully loaded the images from their respective directories. The next step is to save the class_indices file, which is a dictionary with the encoded index being the key and the label name as the value. This will be saved as a JSON file for future references.

import jsonclasses_dict = train_set_from_dir.class_indiceswith open('/content/drive/My Drive/PLANT DISEASE RECOGNITION/class_indices.json','w') as f:
json.dump(classes_dict, f)

Great work so far, we can then easily load a random sample from the loaded images, and plot it using matplotlib.

STEP 4:- Building and Training the MobileNet V2 model via Transfer learning

Transfer learning is a model-building strategy in Machine learning which involves ‘recycling’ a pre-trained model on a specific task to improve performance on a similar task (i.e ‘transferring’ or ‘re-using’ a model trained for a specific task to another task). We are employing transfer learning with ImageNet weights (instead of building from scratch) for this task because it helps to accelerate training time and convergence, and also enables us to leverage advanced models developed by other deep learning experts.

The MobileNet model will be used specifically for this task because of its lightweight architecture, speed, and compatibility with Tensorflow.js. You are however encouraged to try out other transfer learning models like ResNet, InceptionV3, DenseNet, and VGG to evaluate their respective performance. Please refer to the references section to gain more theoretical knowledge about the MobileNet architecture (Layers and the convolutions/computations used).

So our task now is to re-use the MobileNet model, freeze the base layers and add a few necessary top layers to train our classifier.

Run the following codes in a new code cell, PS:- You can refactor this section if required:-

We have successfully built the model architecture using pre-trained weights from the ImageNet dataset, MobileNet layers, and additional dense layers for our problem. Next, we freeze only the first 20 layers and ensure their weights are non-trainable. You are however encouraged to tweak this model further.

PS :- Please note that this is not the best model for this case, it is only a basic architecture for the purpose of this tutorial. You are advised to build a CNN model from scratch, or tweak this model via fine-tuning to get a better performance.

You can then get a summary of the model structure and parameters:-

Summary of MobileNet model layers and parameters
# Compiling the model with the optimizer and loss function
# categorical_accuracy is used as the loss function because it is a multi-class classification task
mobilenet_model.compile(optimizer = Adam(),
loss = 'categorical_crossentropy',
metrics = ['accuracy']
)

Great work so far! the next step is to set up callbacks for our built model and train the model on our generated dataset.

Run the following code in a new code cell:-

The steps_per_epoch argument is set to 128 for the training set and 100 for the validation set, this defines the number of batches of samples to train for each epoch. Since we defined our Batch size as 32 for the train and validation data generator, this implies that we are training with (128 * 32 = 2¹² samples) for each training epoch, and (100 * 32 = 3200 samples) for each validation epoch.

Wonderful !, if you have made this thus far, Get a cold bottle of your favorite drink and pause while your model trains.

Our model is done training, we can then evaluate the performance on the validation dataset.

Good performance, we can easily create a plot of the performance per epoch using matplotlib to get a more visual view.

n = 6plt.figure(figsize = (8,5))plt.plot(np.arange(1,n+1), history.history['loss'], label = 'train_loss')plt.plot(np.arange(1,n+1), history.history['val_loss'], label = 'val_loss')plt.plot(np.arange(1,n+1), history.history['accuracy'], label = 'train_accuracy')plt.plot(np.arange(1,n+1), history.history['val_accuracy'], label = 'val_accuracy')plt.grid(True)
plt.legend(loc = "best")
plt.savefig('/content/drive/My Drive/PLANT DISEASE RECOGNITION/performance.jpg')
plt.show()

Great !, You can easily test the performance of the model with random images from the test set (Kindly refer to the notebook). The next step is to save the model in the ‘models’ directory created earlier (for re-usability). The model is saved inHDF5 (.h5) format (an open-source file format that supports storage of complex/heterogenous data).

# Save model as HDF5 formatmobilenet_model.save('/content/drive/My Drive/PLANT DISEASE RECOGNITION/models/mobilenet_model.h5')

The next step involves converting the model built in Keras (Python) to a Tensorflow.js model, so we can embed it in a web application for browser-based inference.

These steps and more will be discussed in PART 2 of this series. I really hope you enjoyed the tutorial and I encourage you to read PART 2 also here. The Jupyter Notebook for this tutorial can be accessed here:- https://colab.research.google.com/drive/1_VBVthqVSvj8QqSvlfm2k1ZviOlcSK2o?usp=sharing

And all the files used are also available on this GitHub repository:-

References

  1. https://keras.io/api/applications/mobilenet/
  2. https://towardsdatascience.com/transfer-learning-using-mobilenet-and-keras-c75daf7ff299

Kindly access the second part of the article below, where we will deploy the MobileNet model on a browser using Tensorflow.js.

https://medium.com/@rexsimiloluwa/building-a-plant-disease-classification-web-app-in-keras-and-tensorflow-js-part-2-deee91b91ce4

Kindly connect with me on LinkedIn if you have any Questions or Contributions.

Stay safe folks, and keep harnessing useful knowledge. Thanks for reading !

--

--

Similoluwa Okunowo
Similoluwa Okunowo

Responses (6)