What is shuffle TensorFlow?

The shuffle method in TensorFlow does one thing. It shuffles the samples in the data set. However, why do we need to shuffle that?

Especially when we want to shuffle the order in TensorFlow, there are reasons to do that.

For example, we are working with 10000 images of lions and tigers. In that collection, 7000 images are of lions, and the rest, that is 3000 are of tigers.

While working with the TensorFlow, we won’t create the tensors. The Neural Network or Deep Learning does that.

As a result, the order could affect that. That is the main reason.

Let’s see some code.

import tensorflow as tf

not_shuffled_one = tf.constant([[10, 7],
                            [3, 4],
                            [2, 5]])
# We will get different results each time
tf.random.shuffle(not_shuffled_one)

# output
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[ 2,  5],
       [ 3,  4],
       [10,  7]], dtype=int32)>

Since we randomize the shuffle method, it will produce different results each time.

For example we assign the same value to another variable. and let’s see the output.

not_shuffled_two = tf.constant([[10, 7],
                            [3, 4],
                            [2, 5]])
# We will get different results each time
tf.random.shuffle(not_shuffled_two)

# output
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[ 3,  4],
       [ 2,  5],
       [10,  7]], dtype=int32)>

What happens when we pass the seed parameter?

As we know, the seed parameter reproduces the same values each time.

# let's pass the seed parameter
tf.random.shuffle(not_shuffled_two, seed=42)

# output
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[ 2,  5],
       [ 3,  4],
       [10,  7]], dtype=int32)>

However, in our case, it does not happen.

Before learning why, let’s look at some seed examples which reproduce the same values each time.

random_example_one = tf.random.Generator.from_seed(42) 
random_example_one = random_example_one.normal(shape=(3, 2)) 
random_example_two = tf.random.Generator.from_seed(42)
random_example_two = random_example_two.normal(shape=(3, 2))

random_example_one, random_example_two, random_example_one == random_example_two

# each time they will reproduce the same values
(<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
 array([[-0.7565803 , -0.06854702],
        [ 0.07595026, -1.2573844 ],
        [-0.23193763, -1.8107855 ]], dtype=float32)>,
 <tf.Tensor: shape=(3, 2), dtype=float32, numpy=
 array([[-0.7565803 , -0.06854702],
        [ 0.07595026, -1.2573844 ],
        [-0.23193763, -1.8107855 ]], dtype=float32)>,
 <tf.Tensor: shape=(3, 2), dtype=bool, numpy=
 array([[ True,  True],
        [ True,  True],
        [ True,  True]])>)

You can run the code in Google Codelab, and test it.

However, we can set the global seed and stop this behavior.

Let’s see the code.

tf.random.set_seed(42)

tf.random.shuffle(not_shuffled_one, seed=42)

# output
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[10,  7],
       [ 3,  4],
       [ 2,  5]], dtype=int32)>