简介:Batch Normalization is a technique used in deep learning to improve the training of neural networks. In this article, we will explore the parameters of Batch Normalization in PyTorch and how they are used.
Batch Normalization (BN) is a technique used in deep learning to improve the training of neural networks. It normalizes the activations within each batch during training and applies scaling and shifting to ensure that the distribution of activations remains consistent across different batches. In PyTorch, BN is implemented in the torch.nn.BatchNorm module.
The torch.nn.BatchNorm module has several parameters that control the behavior of BN. Here’s a breakdown of the parameters:
BatchNorm module.True, the BN module will also learn a set of affine parameters (gamma and beta), which are scale and shift factors applied to the normalized activations. These parameters allow for learned scaling and shifting of the normalized activations, enabling or disabling affine transformation.True, these statistics are updated during forward passes. If set to False, the running statistics are not updated, and the module only relies on the batch statistics for normalization.BatchNorm module, and then the normalized activations are passed through an activation function such as ReLU or Tanh.In this example, we define a simple neural network with two convolutional layers followed by two BN layers. The BN layers are applied to the output of each convolutional layer using
import torchimport torch.nn as nn# Define a simple neural network with BNclass MyNet(nn.Module):def __init__(self):super(MyNet, self).__init__()self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)self.bn1 = nn.BatchNorm2d(64)self.conv2 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)self.bn2 = nn.BatchNorm2d(128)self.fc1 = nn.Linear(128 * 7 * 7, 256)self.fc2 = nn.Linear(256, 10)def forward(self, x):x = F.relu(self.bn1(self.conv1(x)))x = F.relu(self.bn2(self.conv2(x)))x = x.view(-1, 128 * 7 * 7)x = F.relu(self.fc1(x))x = self.fc2(x)return x
bn1 and bn2. The input tensor x is first passed through the convolutional layers and then through the BN layers using element-wise addition (+) and activation function ReLU (F.relu). The BN-normalized activations are then flattened using view() before being passed through the fully connected layers (fc1 and fc2).