There are some cases where you don’t want an automatic validation sample - but you want to be able to provide your own validation data set. For example, with time series data, you may want to use a sequential set of data to do validation. We’ll manually split our training data into training and validation sets, and then train the model with that split.
Instructor: [00:00] Create a manual validation set by defining two Numpy arrays -- x-val and y-val -- which will hold the inputs in the correct outputs. Validation sets need to match the format of the trained data exactly, which in this case means four numeric inputs and a single numeric output, which is the mean of the four inputs.
[00:24] Instead of using the automatic validation split, we can supply those x- and y-validation sets to the model using the validation data parameter. When we run that, we see the network training on all six input data points and a validation loss being calculated on the three new validation points that we just entered.
[00:47] When manually separating validation data, is important to get a representative sample of the data that you have, otherwise, your validation loss may not be a valid representation of your data.
[00:57] When using the validation split parameter, we achieve that by first shuffling the data. However, every time you train the network, it will shuffle in the data, so it may be difficult to get repeatable results.
[01:12] This is one reason that it may be better to manually set the validation data parameter. Another reason to manually set the validation data is if you have data where it doesn't make sense to pick a random validation set.
[01:24] For example, with some time series data sets, you may want to select a contiguous chunk of time to validate on, instead of just random data points across all of the data.