When AI meets Cybersecurity

by Alfonso Muñoz, Raphael Labaca-Castro. Posted on Sep 14, 2023
“Recreate the Pink Floyd ‘burning man’ photo but with a scientist and AI shaking hands in the style of Magritte” by DALL-E.

“Recreate the Pink Floyd ‘burning man’ photo but with a scientist and AI shaking hands in the style of Magritte” by DALL-E.

The intersection of artificial intelligence (AI) and cybersecurity has led to a new era of both promise and peril. As organizations embrace AI solutions to improve their defenses, malicious actors are uncovering novel techniques to exploit the vulnerabilities of such systems. In this article, we delve into the intricate relationship between AI and cybersecurity, analyzing the benefits and drawbacks that arise when these two domains converge. From the potential of AI in improving threat detection and response to the increasing challenges posed by adversarial attacks. Join us as we navigate the complex relationship between AI and cybersecurity, analyze the threat models targeting AI systems, and explore the Achilles’ heel of machine learning: adversarial attacks.

How AI meets cybersecurity: Benefits and drawbacks

Artificial intelligence is a discipline that has been with us for decades. Its beginnings can be traced back to the 1950s when Alan Turing published, in the philosophical journal Mind, the article Computing machinery and intelligence. In this article, a simple question has rumbled in the minds of the brightest scientists for decades: Can machines think?

Artificial intelligence (AI), in the context of computer science, is a discipline expressed by computer systems or combinations of algorithms whose purpose is the creation of machines/software that mimic human intelligence to perform tasks, and that can improve as they gather information. To achieve this complex objective, there are specific techniques and algorithms. This is where we enter the fascinating world of machine learning.

Machine learning has different mechanisms to address or solve problems. The main ones, which you will find in any reference to this discipline, are:

  1. Supervised learning: (classification / regression). This is defined by the use of labeled data sets to train algorithms that accurately classify data or predict outcomes. As data is fed into the model, the model adjusts its weights until the model has been properly fit, which occurs as part of the cross-validation process [1]. This mechanism is very accurate if the data are more or less static (it is not modified over time) and the (human) supervisor has adequate knowledge of the context of the data. Examples of use: fraud detection, image classification, customer retention, diagnostics, forecasting, predictions, process optimization, etc.

  2. Unsupervised learning:. It is defined by the use of algorithms that allow knowledge to be extracted from unlabeled and mainly dynamic data (created or changed over time). The techniques are usually based on the grouping and clustering of the data to be analyzed into different sets. Examples of use: structure discovery, big data analysis, recommended systems, targeted marketing, customer segmentation, etc.

  3. Reinforcement learning:. Strategies that guide their own learning through rewards and punishments. In other words, it consists of an autonomous instruction system whose path is indicated according to its successes and errors. It consists of empirical learning, so the computer agent is in constant search of those decisions that reward it in some way, while avoiding those paths that, in its own experience, are penalized. You will usually see this in the form of optimisation functions and especially in their use in different neural network modalities [2]. In the last decade, some of the most significant milestones in the field of machine learning have been achieved through this approach, for example using GANs (Generative Adversarial Nets) [3]. Examples of use: real-time decisions, game AI, learning tasks, robot navigation, etc.

Logically, in the last decades, cyber security has also tried to use machine learning and its different approaches to solve or at least discover new approaches to classic cyber security problems. This is more accentuated, especially, in the field of defensive cyber security. Some of the most important efforts have been concentrated in data security (data classification, labeling and inventorying of data, combating information leakage), access control (authentication and authorization), security architecture & design security operations (automation of system configuration management, intelligent resource allocation, technical verification, etc.), software development security, information security governance & risk management/compliance, network security (IDS, IPS, UEBA, etc.) or fraud detection and malware prevention.

Unfortunately, the use of machine learning in defensive cyber security has certain limitations or at least certain restrictions that need to be worked around.

  1. Quantity versus quality: Cybersecurity domains: Cybersecurity is a discipline that offers situations that are difficult to address from an artificial intelligence point of view. In many cases we have samples of “good” behaviors but no information to model malicious behaviors, common examples are 0-days or certain APTs (Advanced Persistent Threat). How do we detect them? There is no definitive answer. Many of the current proposals analyze rare behaviors and from there infer results. This is undoubtedly a current area for improvement.

  2. Static training and limitations of supervised learning: Supervised learning can be very accurate but may not be very useful in many cybersecurity scenarios where the data needed to train a model varies rapidly. Attackers, like defenders, learn from their mistakes and improve their attack techniques. Security measures that rely exclusively on static training have a high probability that their protection will be reduced in a short space of time. Procedures based on continuous and unsupervised learning is the appropriate, although difficult, strategy to pursue.

  3. Machine learning as defensive mechanism: Artificial intelligence could introduce cybersecurity problems due to poor design or implementation. For example, the privacy of the data it handles or its internal functioning. Machine learning should be audited from a cybersecurity point of view like any other technology in our organization.

At the same time that artificial intelligence has been used in defensive security, the possibility of using it in offensive security has also been explored. There are three main areas of work in offensive security, one of which we will devote more attention to because of its current impact:

  1. Classical cybersecurity attacks improved with machine learning: A good example of this is any technique or tool that uses ML to enhance any phase of attacks, with ML being commonly used in fingerprinting, footprinting or detecting useful patterns in a specific attack. ML is useful in fuzzing techniques to discover bugs [4], password guessing (PassGAN: A Deep Learning Approach for Password Guessing [5]), network enumeration [6], phishing [7], patterns and exploits [8], etc.

  2. Synthetic content generation: Using machine learning to bypass ML-based security solutions or classical security mechanisms without non-machine learning. The generation of synthetic content has an impact on cybersecurity, especially in the creation of fake profiles and misinformation, as well as the circumvention of authentication and authorisation systems. An excellent example of the possibilities can be seen in the following reference [9].

  3. Machine learning attacks on machine learning systems: Adversarial machine learning is a discipline within the machine learning branch that tries to find out what attacks a machine learning algorithm and the model generated in the presence of a malicious adversary can suffer. We will spend more time on it below, but for this it is important to understand what the threat model is when we introduce artificial intelligence into a technological solution, whether it is a security tool or not.

Attacks on AI: Understanding the threat model

Any use of machine learning technology will have at least the following elements to consider: input data, output data (result), machine learning algorithm(s) used and training model generated. Any of these elements can be used by an attacker to force artificial intelligence to behave improperly.

  1. Attacking the algorithm: Each machine learning algorithm has a specific way of working, and manipulating it, typically through poisoned data, can lead to misbehavior. In addition, sometimes these algorithms may be based on third-party algorithms or libraries, which introduces the classic security problem of supply chain attacks.

  2. Input data: Manipulation of input data can affect not only the final result but also have effects on the machine learning model (deception, theft, etc.).

  3. Output data: Massive output data collection from a classification process may allow an attacker to infer information from the training model and even from the training data. This, if it occurs, has serious security and privacy implications.

  4. Training model: The attacker’s target will be manipulation or theft.

  5. Training location: The training of machine learning models usually requires large computational capacities. It is common to use third-party services, typically in the cloud, for these operations. The attacker will target these sites, physical or virtual, for the manipulation or theft of data or models.

With these characteristics in mind, an attacker will attempt one of the following 4 types of attacks: poisoning attacks, evasion attacks, model extractions attacks or model inversion attacks. Let’s take a brief look at what each one consists of to better understand the importance of analyzing the security of machine learning and the problems that adversarial machine learning introduces.

  1. Poisoning attacks: The adversary tries to corrupt the training set so that the learned model produces a misclassification that benefits the adversary. They can be performed in white box (the attacker knows a lot of detail about the technology to be attacked) and black box (the information known to the attacker is minimal, typically the input and output data) scenarios. Among the objectives to be pursued are destroying availability by producing incorrect predictions or creating backdoors. A classic example of this is BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain [10].

  2. Model extraction attacks: Extraction attacks consist of an adversary attempting to steal the configuration parameters of a machine learning model [11]. This type of attack allows the intellectual property and confidentiality of a model to be compromised, and subsequently allows evasion and/or reversal attacks to be carried out. This attack can be carried out in both white box and black box.

  3. Model inversion attacks: Model inversion attacks consist of an adversary attempting to exploit model predictions to compromise the user’s privacy or to infer whether or not particular data was used in the training set. It can be used in both white and black boxes. This type of attack is particularly relevant for models that have been trained with sensitive data, such as clinical data, which require special protection.

  4. Evasion attacks: The adversary’s goal is to inject a small amount of noise into the input so that a classifier predicts the output (or label) incorrectly. This type of attack works in both white-box and black-box modes. These malicious inputs are called adversarial examples. They can be created on different types of data, although the most widespread and well-known adversarial samples are on images [12].

Adversarial ML: The Achilles’ Heel

In recent years, machine learning has made remarkable advancements, enabling computers to perform tasks once thought to be exclusive to human intelligence including natural language generation [13,14,15]. From image generation [16] and autonomous vehicles [17] to computer security [18], machine learning has revolutionized various industries. However, these impressive achievements are not without their shortcomings. One of the most pressing challenges facing the field is the emergence of Adversarial Machine Learning (AML) [19].

Imagine a malicious binary file tricking a sophisticated malware classifier by subtly altering its content [20] or an autonomous vehicle, relying on machine learning algorithms to navigate safely, suddenly misinterpreting a stop sign as a speed limit sign [21]. These real-world scenarios illustrate the impact of adversarial attacks on machine learning algorithms.

AML refers to a field that explores the security and robustness of machine learning models against maliciously crafted inputs, known as adversarial examples. These adversarial examples are specifically designed to mislead and manipulate machine learning algorithms, causing them to make incorrect predictions or classifications [22]. To comprehend adversarial ML, it is crucial to grasp the concept of adversarial examples. In computer vision, for example, these are inputs created by introducing subtle perturbations to the original data in a way that they appear almost identical to the human eye. However, these seemingly innocuous changes lead the model to output incorrect labels (Fig. 1). These perturbations are carefully designed to exploit the vulnerabilities of the model and cause it to make incorrect predictions [23].

Figure 1: Traffic sign showing real-life example of a perturbation misleading the model to classify a stop sign as a speed limit with over 94% confidence [10].

Figure 1: Traffic sign showing real-life example of a perturbation misleading the model to classify a stop sign as a speed limit with over 94% confidence [10].

The vulnerability to adversarial examples is not limited to a specific machine learning algorithm; rather, it is a fundamental characteristic of many architectures including neural networks [24]. Even so-called black-box models, where the internal variables (e.g., architecture and weights) are not known, can fall prey to adversarial attacks [25].

Challenges in Defending Against Adversarial Attacks

Defending against adversarial attacks is a complex task and an ongoing area of research in machine learning. Depending on the approach, defenses could be grouped under gradient masking, robust optimization, and detection [26] and some of the major challenges include:

  1. Dealing with Unknowns: Adversarial attacks can arise from various data distributions. Defending exhaustively against all possible attacks is unmanageable, especially considering that we may not know all the unknowns [27].
  2. Transferable Attacks: Adversarial examples designed to fool one model can often be used to attack other similar models partly due to generalization, making it crucial to address universal adversarial perturbations [28] and transferability issues [29] across multiple models.
  3. Trade-offs between Robustness and Performance: Enhancing a model’s robustness against adversarial attacks might lead to a decline in overall accuracy, requiring a careful balance between robustness and performance [30].

In fact, many proposed defenses often fail against adaptive attacks when adversaries properly implement their strategies [31].

Solving robustness

Training robust classifiers that are resistant to adversarial attacks remains a more challenging task than simply traveling the vast multi-dimensional space looking for weaknesses. On the one hand, the distribution of the classes is not properly concentrated and hence can be exploited by adversarial examples [32]. While collecting more data [33] and using data augmentation techniques [34] may improve robustness, the effectiveness of these approaches can vary across different domains. For example, in computer vision, data augmentation can be successful to mitigate unexpected behavior, but aligning large language models to be robust exhibits fundamental limitations [35] including hallucinations [36] and further safety concerns [37].


Artificial intelligence can be successfully used in solutions that provide defensive or offensive security, but artificial intelligence itself must be analyzed from multiple perspectives. As machine learning continues to evolve and permeate into various aspects of our lives, understanding and addressing its limitations becomes increasingly critical. While the presence of adversarial attacks and the AI threat model highlight the vulnerabilities of current machine learning techniques, they also serve as a catalyst for further research and innovation in developing more robust models. As the field progresses, the collaboration between machine learning experts, cybersecurity specialists, and domain-specific practitioners will be instrumental in building a safer and more reliable AI-powered future.


[1] Cross-validation (statistics) - https://en.wikipedia.org/wiki/Cross-validation_(statistics). Last access 09/2023

[2] Van Veen, Fjodor. The neural network Zoo - https://www.asimovinstitute.org/neural-network-zoo. Last access 09/2023

[3] Goodfellow, Ian, et al. Generative Adversarial Nets - https://arxiv.org/abs/1406.2661. Last access 09/2023

[4] American fuzzy lop - https://lcamtuf.coredump.cx/afl. Last access 09/2023

[5] Hitaj, Briland, et al. PassGAN: A Deep Learning Approach for Password Guessing - https://arxiv.org/pdf/1709.00440.pdf. Last access 09/2023

[6] NMAP network enumeration - https://nmap.org/book/osdetect-guess.html#osdetect-guess-ipv6. Last access 09/2023

[7] Munoz, Alfonso. Urideep tool - https://github.com/mindcrypt/uriDeep. Last access 09/2023

[8] Darpa - https://www.darpa.mil/program/cyber-grand-challenge. Last access 09/2023

[9] Fake websites - (https://thisxdoesnotexist.com. Last access 09/2023

[10] Gu, Tianyu, Brendan Dolan-Gavitt, and Siddharth Garg. “Badnets: Identifying vulnerabilities in the machine learning model supply chain.” arXiv preprint arXiv:1708.06733 (2017) - https://arxiv.org/abs/1708.06733. Last access 09/2023

[11] Tramer, Florian, et al. Stealing Machine Learning Models via Prediction APIs - https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer. Last access 09/2023

[12] Jiefeng Chen, et al. Robust attribution Regularization - www.altacognita.com/robust-attribution. Last access 09/2023

[13] Radford, Alec, et al. “Improving language understanding by generative pre-training.” (2018).

[14] Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.

[15] Brown, Tom, et al. “Language models are few-shot learners.” Advances in neural information processing systems 33 (2020): 1877-1901.

[16] Rombach, Robin, et al. “High-resolution image synthesis with latent diffusion models.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

[17] Yurtsever, Ekim, et al. “A survey of autonomous driving: Common practices and emerging technologies.” IEEE access 8 (2020): 58443-58469.

[18] Alazab, Mamoun, and MingJian Tang, eds. Deep learning applications for cyber security. Springer, 2019.

[19] Joseph, Anthony D., et al. Adversarial machine learning. Cambridge University Press, 2018.

[20] Labaca-Castro, Raphael. Machine Learning Under Malware Attack. Springer Nature, 2023.

[21] Biggio, Battista, and Fabio Roli. “Wild patterns: Ten years after the rise of adversarial machine learning.” Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 2018.

[22] Huang, Ling, et al. “Adversarial machine learning.” Proceedings of the 4th ACM workshop on Security and artificial intelligence. 2011.

[23] Szegedy, Christian, et al. “Intriguing properties of neural networks.” arXiv preprint arXiv:1312.6199 (2013).

[24] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples.” arXiv preprint arXiv:1412.6572 (2014).

[25] Papernot, Nicolas, et al. “Practical black-box attacks against machine learning.” Proceedings of the 2017 ACM on Asia conference on computer and communications security. 2017.

[26] Xu, Han, et al. “Adversarial attacks and defenses in images, graphs and text: A review.” International Journal of Automation and Computing 17 (2020): 151-178.

[27] Rumsfeld, Donald. Known and unknown: a memoir. Penguin, 2011.

[28] Moosavi-Dezfooli, Seyed-Mohsen, et al. “Universal adversarial perturbations.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[29] Tramèr, Florian, et al. “The space of transferable adversarial examples.” arXiv preprint arXiv:1704.03453 (2017).

[30] Zhang, Hongyang, et al. “Theoretically principled trade-off between robustness and accuracy.” International conference on machine learning. PMLR, 2019.

[31] Tramer, Florian, et al. “On adaptive attacks to adversarial example defenses.” Advances in neural information processing systems 33 (2020): 1633-1645.

[32] Shafahi, Ali, et al. “Are adversarial examples inevitable?.” arXiv preprint arXiv:1809.02104 (2018).

[33] Schmidt, Ludwig, et al. “Adversarially robust generalization requires more data.” Advances in neural information processing systems 31 (2018).

[34] Li, Lin, and Michael Spratling. “Data augmentation alone can improve adversarial training.” arXiv preprint arXiv:2301.09879(2023).

[35] Wolf, Yotam, et al. “Fundamental limitations of alignment in large language models.” arXiv preprint arXiv:2304.11082 (2023).

[36] OpenAI, “GPT-4 technical report,” OpenAI, 2023.

[37] Zhao, Wayne Xin, et al. “A survey of large language models.” arXiv preprint arXiv:2303.18223 (2023).