Experiment Tracking with TensorBoard
Neu.ro includes TensorBoard that lets you train ML models. If you're a beginner, then you can also use TensorBoard via Jupyter Notebooks without installing any additional components. You can run TensorFlow training processes using either CLI or JupyterLab. This guide will take you through a sample ML training task using TensorFlow and viewing the experiment in TensorBoard.
In this example, we will create a training model, deploy the model, and review the results. You must note that the logs of the project are saved on the platform storage. This lets you run or stop TensorBoard whenever required. Whenever you're done with the experiment, you should terminate the job to limit the amount of consumed GPU hours. Our example is based on the Displaying image data in TensorBoard guide.
This training lets you log tensors and arbitrary images and view them in TensorBoard. We will use a sample image from the public Fashion MNIST dataset, convert it into an image, and visualize it in TensorBoard.
To create the training:
(base) C:\Projects>cookiecutter gh:neuro-inc/cookiecutter-neuro-project --checkout release
project_name [Neuro Project]: imagesummary
preserve Neuro Flow template hints [yes]:
Once the project is initialized, we will build the code to run our model. The next steps will guide you through creating the
train.pyfile that will include the code.
- In the
<project directory>/modulesdirectory (
image/modulesin our example), add the following lines to the
from datetime import datetime
from six.moves import range
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
# Download the data. The data is already divided into train and test.
# The labels are integers representing classes.
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# Names of the integer classes, i.e., 0 -> T-short/top, 1 -> Trouser, etc.
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
"""Converts the matplotlib plot specified by 'figure' to a PNG image and
returns it. The supplied figure is closed and inaccessible after this call."""
# Save the plot to a PNG in memory.
buf = io.BytesIO()
# Closing the figure prevents it from being displayed directly #inside the notebook.
# Convert PNG buffer to TF image
image = tf.image.decode_png(buf.getvalue(), channels=4)
# Add the batch dimension
image = tf.expand_dims(image, 0)
- To build the classifier, add the following code:
model = keras.models.Sequential([
def plot_confusion_matrix(cm, class_names):
Returns a matplotlib figure containing the plotted confusion matrix.
cm (array, shape = [n, n]): a confusion matrix of integer classes
class_names (array, shape = [n]): String names of the integer classes
figure = plt.figure(figsize=(8, 8))
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names, rotation=45)
# Normalize the confusion matrix.
cm = np.around(cm.astype('float') / cm.sum(axis=1)[:, np.newaxis], decimals=2)
# Use white text if squares are dark; otherwise black.
threshold = cm.max() / 2.
for i, j in itertools.product(range(cm.shape), range(cm.shape)):
color = "white" if cm[i, j] > threshold else "black"
plt.text(j, i, cm[i, j], horizontalalignment="center", color=color)
- Now that we have created the classifier and its confusion matrix, we need to log the basic metrics and the confusion matrix at the end of every cycle. Note that we have selected
resultsas the log directory. You can select other directories too, if required.
logdir = "results/image/" + datetime.now().strftime("%Y%m%d-%H%M%S")
# Define the basic TensorBoard callback.
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
file_writer_cm = tf.summary.create_file_writer(logdir + '/cm')
def log_confusion_matrix(epoch, logs):
# Use the model to predict the values from the validation dataset.
test_pred_raw = model.predict(test_images)
test_pred = np.argmax(test_pred_raw, axis=1)
# Calculate the confusion matrix.
cm = sklearn.metrics.confusion_matrix(test_labels, test_pred)
# Log the confusion matrix as an image summary.
figure = plot_confusion_matrix(cm, class_names=class_names)
cm_image = plot_to_image(figure)
# Log the confusion matrix as an image summary.
tf.summary.image("Confusion Matrix", cm_image, step=epoch)
- Finally, let's train the classifier:
# Define the per-epoch callback.
cm_callback = keras.callbacks.LambdaCallback(on_epoch_end=log_confusion_matrix)
# Train the classifier.
verbose=0, # Suppress chatty output
Now that you have created the training code, use the following commands to run the classifier:
make setup. This creates the required framework for the experiment before the job is executed.
make train. You need to press CTRL+C to detach from the process. This starts the job required for training.
make tensorboard. This starts a TensorBoard instance that visualizes the experiment.
(base) C:\Projects\image>make tensorboard
neuro run \
--name tensorboard-image \
--preset cpu-small \
--tag "target:tensorboard" --tag "kind:project" --tag "project:image" --tag "project-id:neuro-project-d2c1fffe" \
--http 6006 \
--volume storage:image/results://project/results:ro \
tensorboard --host=0.0.0.0 --logdir=//project/results
Job ID: job-650959b2-3f85-41fc-b423-07d48cf460c2 Status: pending
Http URL: https://tensorboard-image--clarytyllc.jobs.neuro-public.org.neu.ro
neuro status tensorboard-image # check job status
neuro logs tensorboard-image # monitor job stdout
neuro top tensorboard-image # display real-time job telemetry
neuro exec tensorboard-image bash # execute bash shell to the job
neuro kill tensorboard-image # kill job
Status: pending Creating
Status: pending Scheduling
Status: pending ContainerCreating
Terminal is attached to the remote job, so you receive the job's output.
Use 'Ctrl-C' to detach (it will NOT terminate the job), or restart the
job with `--detach` option.
TensorBoard 2.2.1 at http://0.0.0.0:6006/ (Press CTRL+C to quit)
TensorBoard automatically updates every 30 seconds, or you can manually refresh the page to view the latest results. The
resultssubfolder of a project is saved on the platform storage. This lets you run and stop TensonBoard as often as you want.
The TensorBoard interface includes the following tabs:
The Scalars dashboard shows how the accuracy and loss change with each epoch. You can use it to track training speed, learning rate, and other metrics. You can move your mouse over the graph to view more details.
You can download the scalar information as a CSV or JSON file. To download, select Show data download links and then select the required file format.
The Images tab displays the confusion matrix for the current training. For our current training (in which we are classifying images into categories of clothing), the Images tab shows the confusion matrix for various clothing types.
The Graphs tab visualizes the computation of your model, such as a neural network mode. The Graph visualization lets you easily see what's happening in your model and detect any issues.
You can double-click on a code unit to open its visualization.