TFliteモデル作成時に量子化(16bit, 8bit)して保存する方法

Tensorflowの学習済みモデルをtfliteフォーマットで保存し、かつ16bitや8bitに量子化する手法についてのまとめです。

tf.lite.TFLiteConverterを使用するのですが、学習済みモデルのフォーマットに沿って変換する必要があります。

殆どの場合、TensorflowのSavedModel、あるいはtf.kerasモデルを使用していると思います。
それぞれ、以下のConverterを使用します。

SavedModelの場合：TFLiteConverter.from_saved_model()
tf.kerrasモデルの場合：TFLiteConverter.from_keras_model()

・Requirements
Tensorflow==2.3.0

・Code

import tensorflow as tf
import numpy as np

weights = 'hoge/saved_model' # saved modelのパス
output = 'hoge/hoge.tflite' # tfliteのoutputパス
quantize_mode =  'int8', # 量子化モードの選択 (int8, float16, float32)

def save_tflite():
    converter = tf.lite.TFLiteConverter.from_saved_model(weights) # SavedModelを使用する場合
    # converter =  TFLiteConverter.from_keras_model(weights) # kearasモデルを使用する場合

    if quantize_mode == 'float16':
        converter.optimizations = [tf.lite.Optimize.DEFAULT]
        converter.target_spec.supported_types = [tf.compat.v1.lite.constants.FLOAT16]
        converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
        converter.allow_custom_ops = True

    elif quantize_mode == 'int8':
        converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
        converter.optimizations = [tf.lite.Optimize.DEFAULT]
        converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
        converter.allow_custom_ops = True

    tflite_model = converter.convert()
    open(output, 'wb').write(tflite_model)

    print("Done")

if __name__ == '__main__':
    save_tflite()

・参考サイト
訓練後の量子化