Regularization with Augmentation, Batch Normalization and Dropout

10 min readOct 7, 2020

ในการเพิ่มประสิทธิภาพ Machine Learning Model มีวิธีการหลัก 2 อย่าง ที่ต้องให้ความสำคัญ คือ

การลด Generalization Error ด้วย Regularization
การลด Cost Value ด้วย Optimization หรือการหาค่าที่เหมาะสมที่สุด

Regularization

Regularization คือ การปรับแต่งให้ Model มีประสิทธิภาพในการทำนายที่ดี ลด Error จากข้อมูลที่มันไม่เคยเห็นมาก่อน ด้วยการเรียนรู้จาก Training Dataset และ Regularization เป็นวิธีที่ใช้เพื่อแก้ปัญหา Underfitting หรือ Overfitting ของ Machine Learning Model ก็ได้

การจัดการกับปัญหา Underfitting ของ Neural Network Model สามารถทำได้โดยการเพิ่มขีดความสามารถ (Capacity) ด้วยการเพิ่มจำนวน Layer และจำนวน Node ใน Layer ให้มากขึ้น แต่อาจทำให้เกิดปัญหา Overfitting ตามมา

ปัญหา Overfitting สามารถวินิจฉัยได้ง่ายโดยการตรวจสอบประสิทธิภาพการเรียนรู้ของ Model จาก Learning Curve

การแก้ปัญหา Overfitting อาจใช้เทคนิคอย่างเช่น การทำ Augmentation, Batch Normalization, Dropout, L1/L2 Regularization, Weight Decay, Weight Constraints , Early Stopping และอื่นๆ

Workshop

จะทดลองแก้ไขปัญหา Overfitting ของ Neural Network แบบ Classification Model ที่มีการ Train ด้วย Fashion-MNIST Dataset โดยยกตัวอย่างเทคนิคดังนี้ Augmentation, Batch Normalization และ Dropout ซึ่งในที่สุดแล้วจะทำให้สามารถเพิ่มประสิทธิภาพของ Model ได้มากน้อยเท่าไหร่

Fashion-MNIST Dataset

Fashion-MNIST เป็น Dataset ที่เป็นภาพเสื้อผ้า กระเป๋า และรองเท้า ขนาด 28x28 Pixel แบบ Grayscale แบ่งเป็นข้อมูล Train 60,000 ภาพ และข้อมูล Test อีก 10,000 ภาพ รวมทั้งหมด 10 ประเภท โดยมีการกำหนด Label ตั้งแต่ 0–9 ดังนี้

0: T-shirt/top
1: Trouser
2: Pullover
3: Dress
4: Coat
5: Sandal
6: Shirt
7: Sneaker
8: Bag
9: Ankle boot

ซึ่งต้อง Load Dataset แล้วขยายมิติของ Dataset ทำ Scaling ข้อมูล ระหว่าง 0–1 เข้ารหัสผลเฉลยแบบ One-hot Encoding และ Split Dataset สำหรับการ Train และ Validation ดังต่อไปนี้

สำหรับผู้อ่านที่มี GPU สามารถ Config การใช้งาน ด้วยคำสั่งต่อไปนี้

!nvidia-smi -Limport tensorflow as tf
tf.__version__config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)

print( 'Tensorflow Version:', tf.__version__)
print("GPU Available::", tf.config.list_physical_devices('GPU'))

Import Library และกำหนดค่า Parameter ที่จำเป็น

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Conv2D, MaxPool2D, Flatten, Dropout, BatchNormalization
from tensorflow.keras.optimizers import RMSprop,Adam
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from plotly.subplots import make_subplots
from matplotlib import pyplot
from tensorflow.keras.datasets import fashion_mnist
import plotly.graph_objs as go
from plotly.offline import iplot
import matplotlib.pyplot as plt
import numpy
import cv2

กำหนดค่าต่างๆ

IMG_ROWS = 28
IMG_COLS = 28
NUM_CLASSES = 10
TEST_SIZE = 0.2
RANDOM_STATE = 99

NO_EPOCHS = 10
BATCH_SIZE = 128

Load Dataset

(train_data, y), (test_data, y_test) = fashion_mnist.load_data()

print("Fashion MNIST train -  rows:",train_data.shape[0]," columns:", train_data.shape[1], " rows:", train_data.shape[2])
print("Fashion MNIST test -  rows:",test_data.shape[0]," columns:", test_data.shape[1], " rows:", train_data.shape[2])

for i in range(9):
    pyplot.subplot(330 + 1 + i)    
    pyplot.imshow(train_data[i], cmap=pyplot.get_cmap('gray'))

pyplot.show()

ขยายมิติของ Dataset

print(train_data.shape, test_data.shape)train_data = train_data.reshape((train_data.shape[0], 28, 28, 1))
test_data = test_data.reshape((test_data.shape[0], 28, 28, 1))print(train_data.shape, test_data.shape)

ทำ Scaling

train_data = train_data / 255.0
test_data = test_data / 255.0

เข้ารหัสผลเฉลยแบบ One-hot Encoding

print(y.shape, y_test.shape)
print(y[:10])

y = to_categorical(y)
y_test = to_categorical(y_test)

print(y.shape, y_test.shape)
y[:10]

แบ่งข้อมูลสำหรับ Train และ Validate โดยการสุ่มในสัดส่วน 80:20

X_train, X_val, y_train, y_val = train_test_split(train_data, y, test_size=TEST_SIZE, random_state=RANDOM_STATE)X_train.shape, X_val.shape, y_train.shape, y_val.shape

Baseline Model

นิยาม Model, Compile Model และ Train Model โดยยังไม่ใช้เทคนิค Regularization ดังต่อไปนี้

นิยาม Model

model = Sequential()#1. CNN LAYER
model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same', input_shape=(28, 28, 1)))
model.add(Activation("relu"))#2. CNN LAYER
model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same'))
model.add(Activation("relu"))model.add(MaxPool2D(pool_size=(2, 2)))#3. CNN LAYER
model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same'))
model.add(Activation("relu"))#4. CNN LAYER
model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same'))
model.add(Activation("relu"))model.add(MaxPool2D(pool_size=(2, 2)))#FULLY CONNECTED LAYER
model.add(Flatten())
model.add(Dense(256))
model.add(Activation("relu"))#OUTPUT LAYER
model.add(Dense(10, activation='softmax'))

Compile Model

optimizer = Adam()
model.compile(optimizer = optimizer, loss = "categorical_crossentropy", metrics=["accuracy"])model.summary()

Train Model

train_model = model.fit(X_train, y_train,
                  batch_size=BATCH_SIZE,
                  epochs=NO_EPOCHS,
                  verbose=1,
                  validation_data=(X_val, y_val))

Evaluation

def create_trace(x,y,ylabel,color):
        trace = go.Scatter(
            x = x,y = y,
            name=ylabel,
            marker=dict(color=color),
            mode = "markers+lines",
            text=x
        )
        return trace
    
def plot_accuracy_and_loss(train_model):
    hist = train_model.history
    acc = hist['accuracy']
    val_acc = hist['val_accuracy']
    loss = hist['loss']
    val_loss = hist['val_loss']
    epochs = list(range(1,len(acc)+1))
    
    trace_ta = create_trace(epochs,acc,"Training accuracy", "Green")
    trace_va = create_trace(epochs,val_acc,"Validation accuracy", "Red")
    trace_tl = create_trace(epochs,loss,"Training loss", "Blue")
    trace_vl = create_trace(epochs,val_loss,"Validation loss", "Magenta")
   
    fig = tools.make_subplots(rows=1,cols=2, subplot_titles=('Training and validation accuracy',
                                                             'Training and validation loss'))
    fig.append_trace(trace_ta,1,1)
    fig.append_trace(trace_va,1,1)
    fig.append_trace(trace_tl,1,2)
    fig.append_trace(trace_vl,1,2)
    fig['layout']['xaxis'].update(title = 'Epoch')
    fig['layout']['xaxis2'].update(title = 'Epoch')
    fig['layout']['yaxis'].update(title = 'Accuracy', range=[0,1])
    fig['layout']['yaxis2'].update(title = 'Loss', range=[0,1])    plotly.offline.iplot(fig, filename='accuracy-loss')

plot Accuracy, Loss

plot_accuracy_and_loss(train_model)

score = model.evaluate(test_data, y_test,verbose=0)
print("Test Loss:",score[0])
print("Test Accuracy:",score[1])

จากกราฟ Loss ด้านบน พบว่า Model มีปัญหา Overfitting ตั้งแต่รอบที่ 7 โดยเมื่อวัดประสิทธิภาพการ Predict ด้วย Test Dataset ได้ค่า Accuracy 92.04%

Image Augmentation

ปัญหา Overfitting ของ Model สามารถแก้ได้ด้วยการเพิ่มจำนวน Data ในการ Train แต่ด้วยจำนวน Dataset มีจำกัด ดังนั้นในบางกรณีจึงต้องสังเคราะห์ Data ขึ้นมาเอง ในกรณีของ Data แบบ Image สามารถใช้เทคนิคอย่างเช่น การหมุนภาพ การเลื่อนภาพ และการกลับภาพ และอื่นๆ ซึ่งนอกจากเป็นการขยายจำนวน Data แล้ว Image Augmentation ยังช่วยเพิ่มความหลากหลายของภาพที่จะนำไป Train อีกด้วย

โดยยกตัวอย่างการทำ Image Augmentation ในแบบต่างๆ ได้แก่

Vertical Shift
Horizontal Shift
Shear
Zoom
Vertical Flip
Horizontal Flip
Rotate
Fill Mode
8.1 Constant Values
8.2 Nearest Neighbor
8.3 Reflect Values

ก่อนอื่นอ่านไฟล์ภาพ hamster (จากภาพด้านบนสุด) มาทดลองทำ Image Augmentation ตามขั้นตอน ดังนี้

อ่านไฟล์ภาพ

hamster = cv2.imread('hamster.jpg')
cat.shape

แปลงระบบสีจาก BGR ซึ่งเป็นค่า Default ของ OpenCV Library เป็น RGB

hamster = cv2.cvtColor(hamster, cv2.COLOR_RGB2BGR)

Plot ภาพ

plt.figure(dpi=100)
plt.imshow(hamster)

ขยายมิติของภาพจาก 3 มิติเป็น 4 มิติ เพื่อเตรียมนำเข้า Function ทำ Image Augmentation

print(hamster.shape)
cat = hamster.reshape(1,cat.shape[0],hamster.shape[1],hamster.shape[2])
print(hamster.shape)

ทดลองทำ Vertical Shift ด้วยการเลื่อนภาพขึ้นลงแบบสุ่มไม่เกิน 20%

datagen = ImageDataGenerator(height_shift_range=0.2)aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')

ทดลองทำ Horizontal Shift ด้วยการเลื่อนภาพซ้ายขวาแบบสุ่มไม่เกิน 20%

datagen = ImageDataGenerator(width_shift_range=0.2)aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')

ทดลองบิดภาพ (Shear) แบบสุ่มไม่เกิน 20 องศา

datagen = ImageDataGenerator(shear_range=20)aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')

ทดลองขยายภาพ (Zoom) แบบสุ่มไม่เกิน 30%

datagen = ImageDataGenerator(zoom_range=0.3)aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')

ทดลองพลิกภาพแนวตั้ง (Vertical Flip) แบบสุ่ม

datagen = ImageDataGenerator(vertical_flip=True)aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')
    
/fig.savefig('cat.jpeg', dpi=300)

ทดลองพลิกภาพแนวนอน (Horizontal Flip) แบบสุ่ม

datagen = ImageDataGenerator(horizontal_flip=True)aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')

ทดลองหมุนภาพ (Rotate) ไม่เกิน 30 องศา แบบสุ่ม

datagen = ImageDataGenerator(rotation_range=30)aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')

Fill Mode

โดย Default เมื่อมีการเลื่อนภาพ บิดภาพ หมุนภาพ จะเกิดพื้นที่ว่างที่มุม ซึ่งจะมีการเติมภาพให้เต็มโดยใช้เทคนิคแบบ Nearest Neighbor ซึ่งเป็นการดึงสีบริเวณใกล้เคียงมาระบาย แต่เราก็ยังสามารถกำหนดวิธีการเติมสีลงในภาพ (Fill) ด้วยเทคนิคอื่นได้จาก Parameter fill_mode ดังต่อไปนี้

เติมสีดำ (Constant Values)

datagen = ImageDataGenerator(rotation_range=30, fill_mode = 'constant')aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')

เติมสีข้างเคียง (Nearest Neighbor)

datagen = ImageDataGenerator(rotation_range=30, fill_mode = 'nearest')aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')

เติมสีแบบกระจกสะท้อน (Reflect Values)

datagen = ImageDataGenerator(rotation_range=50, fill_mode = 'reflect')aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')

เติมสีจากภาพแบบต่อกัน (Wrap Values)

datagen = ImageDataGenerator(rotation_range=30, fill_mode = 'wrap')aug_iter = datagen.flow(hamster, batch_size=1)fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,15))for i in range(3):
    image = next(aug_iter)[0].astype('uint8')
    ax[i].imshow(image)
    ax[i].axis('off')

เราจะเพิ่มความหลากหลายของภาพเพื่อแก้ปัญหา Overfitting ตามขั้นตอนดังนี้

นิยามวิธีการทำ Image Augmentation

datagen = ImageDataGenerator(
        rotation_range=0.05,  #Randomly rotate images in the range
        zoom_range = 0.2, # Randomly zoom image
        width_shift_range=0.1,  #Randomly shift images horizontally
        height_shift_range=0.1,  #Randomly shift images vertically
        shear_range=0.05 #Randomly shear images
)datagen.fit(X_train)

นิยาม Model

model = Sequential()#1. CNN LAYER
model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same', input_shape=(28, 28, 1)))
model.add(Activation("relu"))#2. CNN LAYER
model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same'))
model.add(Activation("relu"))model.add(MaxPool2D(pool_size=(2, 2)))#3. CNN LAYER
model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same'))
model.add(Activation("relu"))#4. CNN LAYER
model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same'))
model.add(Activation("relu"))model.add(MaxPool2D(pool_size=(2, 2)))#FULLY CONNECTED LAYER
model.add(Flatten())
model.add(Dense(256))
model.add(Activation("relu"))#OUTPUT LAYER
model.add(Dense(10, activation='softmax'))

Compile Model

optimizer = Adam()
model.compile(optimizer = optimizer, loss = "categorical_crossentropy", metrics=["accuracy"])model.summary()

Train Model

NO_EPOCHS = 30history = model.fit_generator(datagen.flow(X_train, y_train, batch_size=BATCH_SIZE),
                              shuffle=True,
                              epochs=NO_EPOCHS, validation_data = (X_val, y_val),
                              verbose = 1, steps_per_epoch=X_train.shape[0] // BATCH_SIZE)

Plot กราฟ

plot_accuracy_and_loss(history)

วัดค่า Accuracy จาก Test Dataset

score = model.evaluate(test_data, y_test,verbose=0)
print("Test Loss:",score[0])
print("Test Accuracy:",score[1])

จากกราฟ Loss ด้านบน เมื่อมีการ Train ทั้งหมด 30 Epoch พบว่า Validation Loss ไม่พุ่งขึ้นในรอบแรกๆ เหมือนในการ Train แบบไม่ใช้เทคนิค Image Augmentation โดยเมื่อวัดประสิทธิภาพการ Predict ด้วย Test Dataset ได้ค่า Accuracy 92.25% อย่างไรก็ตาม ใน Epoch ท้ายๆ Validation Loss ก็ยังมีแนวโน้มที่จะยกสูงขึ้นจนเกิดปัญหา Overfitting

Batch Normalization

Batch Normalization คือ เทคนิคที่ใช้ระหว่างการเทรน Machine Learning เพื่อปรับ Shift, Scale ให้ Activation ที่อยู่ภายใน Hidden Layer ของ Deep Neural Network ให้มีขนาดเหมาะสม ไม่เล็ก ใหญ่เกินไป โดยดูเทียบจาก Mean และ Standard Deviation ของทุก Activation ใน Layer ของทั้ง Batch นั้น คล้ายกับ Feature Scaling ของ Input เช่น Normalize ด้วยการแปลงค่าสีของภาพแบบ Grayscale จาก 0–255 เป็น 0–1 โดยนำค่าสีเดิมหารด้วย 255 และมีการเสริมด้วย Learning Parameter เพื่อให้โมเดลเรียนรู้ ที่จะปรับ Activation ให้เป็นที่ต้องการได้เอง ทั้งยังเป็นการเพิ่มความเร็วในการ Train Model และทำให้ค่า Loss ลดลงเมื่อเทียบกันตอนที่ยังไม่ได้ทำ Normalization เพราะมีค่าข้อมูลที่เล็กกว่า

ในการทำ Data Normalization สามารถเลือกวิธีการได้หลายวิธี เช่น การทำ Min-Max Normalization หรือการทำ Standardization เป็นต้น

Min-Max Normalization = 𝑥′ = [𝑥–min(𝑥)]/[max(𝑥) — min(𝑥)]

Standardization = 𝑥′ = (𝑥–𝑥¯)/𝜎2

โดยที่ 𝑥¯ คือ Mean และ 𝜎2 คือ Variance

Batch Normalization จะใช้วิธีการแบบ Standardization ซึ่งจะมีการกำหนดค่า Mean และ Variance โดยการเรียนรู้จาก Batch ขนาดเล็ก ที่หยิบมาสอน Model ด้วย Layer พิเศษใน Neural Network เอง

ทดลองใช้ Batch Normalization ร่วมกับเทคนิค Image Augmentation เพื่อแก้ปัญหา Overfitting ตามขั้นตอนดังต่อไปนี้

นิยามวิธีการทำ Image Augmentation

datagen = ImageDataGenerator(
        rotation_range=0.05,  # Randomly rotate images in the range
        zoom_range = 0.2, # Randomly zoom image
        width_shift_range=0.1,  # Randomly shift images horizontally
        height_shift_range=0.1,  # Randomly shift images vertically
        shear_range=0.05
)datagen.fit(X_train)

นิยาม Model

model = Sequential()

#1. CNN LAYER
model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same', input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(Activation("relu"))

#2. CNN LAYER
model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same'))
model.add(BatchNormalization())
model.add(Activation("relu"))

model.add(MaxPool2D(pool_size=(2, 2)))

#3. CNN LAYER
model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same'))
model.add(BatchNormalization())
model.add(Activation("relu"))

#4. CNN LAYER
model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same'))
model.add(BatchNormalization())
model.add(Activation("relu"))

model.add(MaxPool2D(pool_size=(2, 2)))

#FULLY CONNECTED LAYER
model.add(Flatten())
model.add(Dense(256))
model.add(BatchNormalization())
model.add(Activation("relu"))

#OUTPUT LAYER
model.add(Dense(10, activation='softmax'))

Compile Model

optimizer = Adam()
model.compile(optimizer = optimizer, loss = "categorical_crossentropy", metrics=["accuracy"])model.summary()

จากภาพจะเห็นว่าใน Batch Normalization Layer จะมี Parameter สำหรับกำหนดค่า Mean และ Variance โดยการเรียนรู้จาก Batch ขนาดเล็กที่หยิบมาสอน Model

Train Model

NO_EPOCHS = 60
history = model.fit_generator(datagen.flow(X_train, y_train, batch_size=BATCH_SIZE),
                              shuffle=True,
                              epochs=NO_EPOCHS, validation_data = (X_val, y_val),
                              verbose = 1, steps_per_epoch=X_train.shape[0] // BATCH_SIZE)

Plot กราฟ

plot_accuracy_and_loss(history)

วัดค่า Accuracy จาก Test Dataset

score = model.evaluate(test_data, y_test,verbose=0)
print("Test Loss:",score[0])
print("Test Accuracy:",score[1])

จากกราฟ Loss พบว่า Training Loss มีแนวโน้มที่จะลดลง แต่ Validation Loss ค่อนข้างแกว่ง จึงอาจต้องใช้เทคนิคอื่นร่วมแก้ปัญหา Overfitting ซึ่งเทคนิคหนึ่งที่มักนำมาใช้งานร่วมกับ Batch Normalization คือ Dropout

Dropout

สามารถใช้แค่โมเดลเดียว มาจำลองเป็นหลาย ๆ โมเดลได้ โดยการสุ่มถอดบาง Node ออก ในระหว่างการเทรน วิธีนี้เรียกว่า Dropout เป็นวิธีที่ชาญฉลาด ประหยัดทั้งเวลา และทรัพยากร และที่สำคัญเราไม่ต้อง Maintain หลาย ๆ โมเดล

Dropout ถือเป็นวิธี Regularization แบบหนึ่ง ช่วยลดการจำข้อสอบ ลด Overfit และทำให้โมเดล Deep Neural Network ทุก ๆ สถาปัตยกรรม Generalization ดีขึ้น

เราจะทดลองใช้เทคนิค Dropout ร่วมกับ Batch Normalization และ Image Augmentation เพื่อแก้ปัญหา Overfitting ตามขั้นตอนดังต่อไปนี้

นิยามวิธีการทำ Image Augmentation

datagen = ImageDataGenerator(
        rotation_range=0.05,  #Randomly rotate images in the range
        zoom_range = 0.2, # Randomly zoom image
        width_shift_range=0.1,  #Randomly shift images horizontally
        height_shift_range=0.1,  #Randomly shift images vertically
        shear_range=0.05 #Randomly shear images
)datagen.fit(X_train)

นิยาม Model

model = Sequential()#1. CNN LAYER
model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same', input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Dropout(0.3))#2. CNN LAYER
model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same'))
model.add(BatchNormalization())
model.add(Activation("relu"))model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.3))#3. CNN LAYER
model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same'))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Dropout(0.3))#4. CNN LAYER
model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same'))
model.add(BatchNormalization())
model.add(Activation("relu"))model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.3))
#FULLY CONNECTED LAYER
model.add(Flatten())
model.add(Dense(256))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Dropout(0.30))#OUTPUT LAYER
model.add(Dense(10, activation='softmax'))

Compile Model

optimizer = Adam(0.0001)
model.compile(optimizer = optimizer, loss = "categorical_crossentropy", metrics=["accuracy"])model.summary()

Train Model

NO_EPOCHS = 200
history = model.fit_generator(datagen.flow(X_train, y_train, batch_size=BATCH_SIZE),
                              shuffle=True,
                              epochs=NO_EPOCHS, validation_data = (X_val, y_val),
                              verbose = 1, steps_per_epoch=X_train.shape[0] // BATCH_SIZE)

เพิ่ม Epoch ในการ Train เป็น 200 เพราะคาดว่า Validation Loss จะลดลงได้มากกว่าก่อนการใช้งาน Dropout

Plot กราฟ

plot_accuracy_and_loss(history)

วัดค่า Accuracy จาก Test Dataset

score = model.evaluate(test_data, y_test,verbose=0)
print("Test Loss:",score[0])
print("Test Accuracy:",score[1])

จากกราฟ Loss ด้านบน เมื่อมีการ Train ทั้งหมด 200 Epoch โดยใช้เทคนิค Augmentation, Batch Normalization และ Dropout พบว่า Validation Loss ไม่พุ่งขึ้นจนเกิดปัญหา Overfitting เหมือนกับในการ Train แบบไม่ใช้ Dropout เมื่อวัดประสิทธิภาพการ Predict ด้วย Test Dataset ได้ค่า Accuracy 94.24%

Reference

การทำ Regularization แบบสมัยใหม่ ด้วยเทคนิค Augmentation, Batch Normalization และ Dropout

บทความโดย อ.ดร.ณัฐโชติ พรหมฤทธิ์ ภาควิชาคอมพิวเตอร์ [https://www.cp.su.ac.th] คณะวิทยาศาสตร์ มหาวิทยาลัยศิลปากร…

blog.pjjop.org

Regularization with Augmentation, Batch Normalization and Dropout

Regularization

Workshop

Fashion-MNIST Dataset

Baseline Model

Image Augmentation

Fill Mode

Batch Normalization

Dropout

Reference

การทำ Regularization แบบสมัยใหม่ ด้วยเทคนิค Augmentation, Batch Normalization และ Dropout

บทความโดย อ.ดร.ณัฐโชติ พรหมฤทธิ์ ภาควิชาคอมพิวเตอร์ [https://www.cp.su.ac.th] คณะวิทยาศาสตร์ มหาวิทยาลัยศิลปากร…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Chawala Pancharoen

No responses yet

More from Chawala Pancharoen

การวิเคราะห์ประสิทธิภาพ Machine Learning Model ด้วย Learning Curve

Image Classification ด้วย Convolutional Neural Networks (CNN)

Image Classification

Feature Engineering with Pandas

Feature Engineering เป็นกระบวนการหนึ่งในขั้นตอน Data Collection เพื่อสร้าง Model อ้างอิงจาก

การเลือกใช้ Loss Function ในการพัฒนา Deep Learning Model

Recommended from Medium

Understanding Neural Networks: Forward Propagation and Activation Functions

How are Neural Networks trained: Forward Propagation

Understanding KL Divergence for NLP Fundamentals: A Comprehensive Guide with PyTorch Implementation

Introduction

Lists

Staff picks

Stories to Help You Level-Up at Work

Self-Improvement 101

Productivity 101

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Deep Copy vs Shallow Copy in Python

I had the Understanding Object Mutability and Memory Management to Avoid Unintended Side Effectshardest time wrapping my head around the…

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Batch Normalization in CNN

“Normal is an illusion. What is normal for the spider is chaos for the fly.” This quote captures the essence of why batch normalization is…