Designing Trojan Detectors in Neural Networks Using Interactive Simulations

Open Access

20 February 2021

journal article
research article
Published by MDPI AG in Applied Sciences

Vol. 11 (4), 1865
https://doi.org/10.3390/app11041865

Abstract

This paper addresses the problem of designing trojan detectors in neural networks (NNs) using interactive simulations. Trojans in NNs are defined as triggers in inputs that cause misclassification of such inputs into a class (or classes) unintended by the design of a NN-based model. The goal of our work is to understand encodings of a variety of trojan types in fully connected layers of neural networks. Our approach is: (1) to simulate nine types of trojan embeddings into dot patterns; (2) to devise measurements of NN states; and (3) to design trojan detectors in NN-based classification models. The interactive simulations are built on top of TensorFlow Playground with in-memory storage of data and NN coefficients. The simulations provide analytical, visualization, and output operations performed on training datasets and NN architectures. The measurements of a NN include: (a) model inefficiency using modified Kullback–Liebler (KL) divergence from uniformly distributed states; and (b) model sensitivity to variables related to data and NNs. Using the KL divergence measurements at each NN layer and per each predicted class label, a trojan detector is devised to discriminate NN models with or without trojans. To document robustness of such a trojan detector with respect to NN architectures, dataset perturbations, and trojan types, several properties of the KL divergence measurement are presented.

Keywords

This publication has 14 references indexed in Scilit:

Trojaning Attack on Neural Networks
Published by Internet Society ,2018
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Published by Springer Science and Business Media LLC ,2016
Quantized Convolutional Neural Networks for Mobile Devices
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Learning Deep Features for Discriminative Localization
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Understanding deep convolutional networks
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2016
Representation Learning: A Review and New Perspectives
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
Approximation capabilities of multilayer feedforward networks
Neural Networks, 1991
Influence of magnesium on the rolling texture of polycrystalline aluminum
Journal of Japan Institute of Light Metals, 1973
On Information and Sufficiency
The Annals of Mathematical Statistics, 1951
A Mathematical Theory of Communication
Bell System Technical Journal, 1948

Cited by 1 article