Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error

Celia Cintas; Skyler Speakman; Victor Akinwande; William Ogallo; Komminist Weldemariam; Srihari Sridharan; Edward McFowland

Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error

Celia Cintas, Skyler Speakman, Victor Akinwande, William Ogallo, Komminist Weldemariam, Srihari Sridharan, Edward McFowland

Information and Decision Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

12 Scopus citations

Abstract

Reliably detecting attacks in a given set of inputs is of high practical relevance because of the vulnerability of neural networks to adversarial examples. These altered inputs create a security risk in applications with real-world consequences, such as self-driving cars, robotics and financial services. We propose an unsupervised method for detecting adversarial attacks in inner layers of autoencoder (AE) networks by maximizing a non-parametric measure of anomalous node activations. Previous work in this space has shown AE networks can detect anomalous images by thresholding the reconstruction error produced by the final layer. Furthermore, other detection methods rely on data augmentation or specialized training techniques which must be asserted before training time. In contrast, we use subset scanning methods from the anomalous pattern detection domain to enhance detection power without labeled examples of the noise, retraining or data augmentation methods. In addition to an anomalous “score” our proposed method also returns the subset of nodes within the AE network that contributed to that score. This will allow future work to pivot from detection to visualisation and explainability. Our scanning approach shows consistently higher detection power than existing detection methods across several adversarial noise models and a wide range of perturbation strengths.

Original language	English (US)
Title of host publication	Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020
Editors	Christian Bessiere
Publisher	International Joint Conferences on Artificial Intelligence
Pages	876-882
Number of pages	7
ISBN (Electronic)	9780999241165
State	Published - 2020
Event	29th International Joint Conference on Artificial Intelligence, IJCAI 2020 - Yokohama, Japan Duration: Jan 1 2021 → …

Publication series

Name	IJCAI International Joint Conference on Artificial Intelligence
Volume	2021-January
ISSN (Print)	1045-0823

Conference

Conference	29th International Joint Conference on Artificial Intelligence, IJCAI 2020
Country/Territory	Japan
City	Yokohama
Period	1/1/21 → …

Bibliographical note

Publisher Copyright:
© 2020 Inst. Sci. inf., Univ. Defence in Belgrade. All rights reserved.

OpenUrl availability

Full text

Cite this

Cintas, C., Speakman, S., Akinwande, V., Ogallo, W., Weldemariam, K., Sridharan, S., & McFowland, E. (2020). Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error. In C. Bessiere (Ed.), Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020 (pp. 876-882). (IJCAI International Joint Conference on Artificial Intelligence; Vol. 2021-January). International Joint Conferences on Artificial Intelligence.

Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error. / Cintas, Celia; Speakman, Skyler; Akinwande, Victor et al.
Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020. ed. / Christian Bessiere. International Joint Conferences on Artificial Intelligence, 2020. p. 876-882 (IJCAI International Joint Conference on Artificial Intelligence; Vol. 2021-January).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Cintas, C, Speakman, S, Akinwande, V, Ogallo, W, Weldemariam, K, Sridharan, S & McFowland, E 2020, Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error. in C Bessiere (ed.), Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020. IJCAI International Joint Conference on Artificial Intelligence, vol. 2021-January, International Joint Conferences on Artificial Intelligence, pp. 876-882, 29th International Joint Conference on Artificial Intelligence, IJCAI 2020, Yokohama, Japan, 1/1/21.

Cintas C, Speakman S, Akinwande V, Ogallo W, Weldemariam K, Sridharan S et al. Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error. In Bessiere C, editor, Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020. International Joint Conferences on Artificial Intelligence. 2020. p. 876-882. (IJCAI International Joint Conference on Artificial Intelligence).

Cintas, Celia ; Speakman, Skyler ; Akinwande, Victor et al. / Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error. Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020. editor / Christian Bessiere. International Joint Conferences on Artificial Intelligence, 2020. pp. 876-882 (IJCAI International Joint Conference on Artificial Intelligence).

@inproceedings{3d3f40177c2e4eca95a4b604ba582d70,

title = "Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error",

abstract = "Reliably detecting attacks in a given set of inputs is of high practical relevance because of the vulnerability of neural networks to adversarial examples. These altered inputs create a security risk in applications with real-world consequences, such as self-driving cars, robotics and financial services. We propose an unsupervised method for detecting adversarial attacks in inner layers of autoencoder (AE) networks by maximizing a non-parametric measure of anomalous node activations. Previous work in this space has shown AE networks can detect anomalous images by thresholding the reconstruction error produced by the final layer. Furthermore, other detection methods rely on data augmentation or specialized training techniques which must be asserted before training time. In contrast, we use subset scanning methods from the anomalous pattern detection domain to enhance detection power without labeled examples of the noise, retraining or data augmentation methods. In addition to an anomalous “score” our proposed method also returns the subset of nodes within the AE network that contributed to that score. This will allow future work to pivot from detection to visualisation and explainability. Our scanning approach shows consistently higher detection power than existing detection methods across several adversarial noise models and a wide range of perturbation strengths.",

author = "Celia Cintas and Skyler Speakman and Victor Akinwande and William Ogallo and Komminist Weldemariam and Srihari Sridharan and Edward McFowland",

note = "Publisher Copyright: {\textcopyright} 2020 Inst. Sci. inf., Univ. Defence in Belgrade. All rights reserved.; 29th International Joint Conference on Artificial Intelligence, IJCAI 2020 ; Conference date: 01-01-2021",

year = "2020",

language = "English (US)",

series = "IJCAI International Joint Conference on Artificial Intelligence",

publisher = "International Joint Conferences on Artificial Intelligence",

pages = "876--882",

editor = "Christian Bessiere",

booktitle = "Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020",

}

TY - GEN

T1 - Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error

AU - Cintas, Celia

AU - Speakman, Skyler

AU - Akinwande, Victor

AU - Ogallo, William

AU - Weldemariam, Komminist

AU - Sridharan, Srihari

AU - McFowland, Edward

PY - 2020

Y1 - 2020

N2 - Reliably detecting attacks in a given set of inputs is of high practical relevance because of the vulnerability of neural networks to adversarial examples. These altered inputs create a security risk in applications with real-world consequences, such as self-driving cars, robotics and financial services. We propose an unsupervised method for detecting adversarial attacks in inner layers of autoencoder (AE) networks by maximizing a non-parametric measure of anomalous node activations. Previous work in this space has shown AE networks can detect anomalous images by thresholding the reconstruction error produced by the final layer. Furthermore, other detection methods rely on data augmentation or specialized training techniques which must be asserted before training time. In contrast, we use subset scanning methods from the anomalous pattern detection domain to enhance detection power without labeled examples of the noise, retraining or data augmentation methods. In addition to an anomalous “score” our proposed method also returns the subset of nodes within the AE network that contributed to that score. This will allow future work to pivot from detection to visualisation and explainability. Our scanning approach shows consistently higher detection power than existing detection methods across several adversarial noise models and a wide range of perturbation strengths.

AB - Reliably detecting attacks in a given set of inputs is of high practical relevance because of the vulnerability of neural networks to adversarial examples. These altered inputs create a security risk in applications with real-world consequences, such as self-driving cars, robotics and financial services. We propose an unsupervised method for detecting adversarial attacks in inner layers of autoencoder (AE) networks by maximizing a non-parametric measure of anomalous node activations. Previous work in this space has shown AE networks can detect anomalous images by thresholding the reconstruction error produced by the final layer. Furthermore, other detection methods rely on data augmentation or specialized training techniques which must be asserted before training time. In contrast, we use subset scanning methods from the anomalous pattern detection domain to enhance detection power without labeled examples of the noise, retraining or data augmentation methods. In addition to an anomalous “score” our proposed method also returns the subset of nodes within the AE network that contributed to that score. This will allow future work to pivot from detection to visualisation and explainability. Our scanning approach shows consistently higher detection power than existing detection methods across several adversarial noise models and a wide range of perturbation strengths.

UR - http://www.scopus.com/inward/record.url?scp=85097330111&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85097330111&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85097330111

T3 - IJCAI International Joint Conference on Artificial Intelligence

SP - 876

EP - 882

BT - Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020

A2 - Bessiere, Christian

PB - International Joint Conferences on Artificial Intelligence

T2 - 29th International Joint Conference on Artificial Intelligence, IJCAI 2020

Y2 - 1 January 2021

ER -

Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error

Abstract

Publication series

Conference

Bibliographical note

OpenUrl availability

Other files and links

Fingerprint

Cite this