Real-Time Hardware Acceleration of Low Precision Quantized Custom Neural Network Model on ZYNQ SoC
Dosyalar
Tarih
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Achieving a lower memory footprint and reduced computational density in neural network models requires the use of low-precision models. However, existing techniques typically rely on floating-point arithmetic to preserve accuracy, which can be problematic for convolutional neural network models (CNNs) with substantial memory requirements when using floating-point numbers. Additionally, larger bit widths lead to higher computational density in hardware architectures. This has resulted in the need for current models to become deeper network models with sometimes billions, of parameters to address contemporary problems, thereby increasing computational complexity and causing memory allocation issues. These challenges render existing hardware accelerators insufficient. In scenarios where hardware complexity can be traded-off for accuracy, the adoption of model quantization enables the utilization of limited hardware resources for implementing neural networks. From a hardware design standpoint, employing quantized models offers notable advantages in terms of speed, memory utilization, and power consumption compared to traditional floating-point arithmetic. To this end, we propose a method for detecting network intrusions by quantizing weight and activation functions using the Brevitas library in a custom multi-layer detector. We conducted real-time experimentation of the technique on the ZYNQ System-on-Chip (SoC) using the FINN framework, which enabled deep neural network extraction within Field Programmable Gate Arrays (FPGAs), resulting in an accuracy rate of approximately 92%. We selected the UNSW-NB15 dataset, which was generated by the Australian Cyber Security Center (ACCS), for the investigation.










