Join egghead, unlock knowledge.

Want more egghead?

This lesson is for members. Join us? Get access to all 3,000+ tutorials + a community with expert developers around the world.

Unlock This Lesson
1×
Become a member
to unlock all features

Level Up!

Access all courses & lessons on egghead today and lock-in your price for life.

Autoplay

    Manually Set Validation Data While Training a Keras Model

    Chris AchardChris Achard
    pythonpython
    ^3.0.0

    There are some cases where you don’t want an automatic validation sample - but you want to be able to provide your own validation data set. For example, with time series data, you may want to use a sequential set of data to do validation. We’ll manually split our training data into training and validation sets, and then train the model with that split.

    Code

    Code

    Become a Member to view code

    You must be a Member to view code

    Access all courses and lessons, track your progress, gain confidence and expertise.

    Become a Member
    and unlock code for this lesson
    Discuss

    Discuss

    Transcript

    Transcript

    Instructor: Create a manual validation set by defining two Numpy arrays -- x-val and y-val -- which will hold the inputs in the correct outputs. Validation sets need to match the format of the trained data exactly, which in this case means four numeric inputs and a single numeric output, which is the mean of the four inputs.

    Instead of using the automatic validation split, we can supply those x- and y-validation sets to the model using the validation data parameter. When we run that, we see the network training on all six input data points and a validation loss being calculated on the three new validation points that we just entered.

    When manually separating validation data, is important to get a representative sample of the data that you have, otherwise, your validation loss may not be a valid representation of your data.

    When using the validation split parameter, we achieve that by first shuffling the data. However, every time you train the network, it will shuffle in the data, so it may be difficult to get repeatable results.

    This is one reason that it may be better to manually set the validation data parameter. Another reason to manually set the validation data is if you have data where it doesn't make sense to pick a random validation set.

    For example, with some time series data sets, you may want to select a contiguous chunk of time to validate on, instead of just random data points across all of the data.