Meta's Llama Framework Flaw Exposes AI Systems to Remote Code Execution Risks

Age
6 months ago
Threat Information
Summary

A significant security flaw in Meta's Llama large language model framework has been identified, potentially allowing attackers to execute arbitrary code on the llama-stack inference server. This vulnerability, designated as CVE-2024-50050, involves the deserialization of untrusted data through the Python Inference API's use of the pickle format, which is vulnerable to malicious data. Meta has since addressed the issue by switching to the JSON format for serialization, and the flaw has been patched in the ZeroMQ messaging library. The vulnerability underscores ongoing concerns about AI framework security, as similar issues have been reported in other AI systems like TensorFlow's Keras framework and OpenAI's ChatGPT. Additionally, research has highlighted how large language models can be integrated into the cyber attack lifecycle, enhancing the speed and accuracy of cyber threats. Security experts continue to emphasize the need for robust security measures to manage AI infrastructure and mitigate potential risks.

How BlueRock Helps

This security issue gives an attacker the ability to execute arbitrary code on AI inference servers by exploiting insecure deserialization in frameworks like Meta's Llama. The following protection guardrails can further prevent the following steps an attacker can take: An attacker first sends crafted malicious data, often containing a serialized payload, to a vulnerable network service like the exposed ZeroMQ socket used by the Llama Stack's Python Inference API. Upon receiving this data, the application improperly deserializes it using an unsafe method like Python's pickle, triggering remote code execution on the host machine. Should the attacker's code attempt to establish interactive command-line access back to their own machine by binding shell streams to the network socket, Reverse Shell Protection detects and blocks this common post-exploitation technique. Furthermore, if the attacker, having gained initial execution, tries to download or create new malicious tools, scripts, or binaries onto the compromised system and then run them to escalate privileges, exfiltrate data, or move laterally, Container Drift Protection (Binaries & Scripts) prevents the execution of these non-original files, effectively neutralizing the payload.

MITRE ATT&CK Techniques Inferred
  • T1203: Exploitation for Client Execution: The article describes a vulnerability in Meta's Llama framework that allows an attacker to execute arbitrary code by exploiting deserialization of untrusted data. This aligns with the MITRE ATT&CK technique for Exploitation for Client Execution (T1203), as the attacker can execute code by sending malicious data that is deserialized by the application.
  • T1648: Serverless Execution: The article mentions that the vulnerability in the Llama framework involves the deserialization of untrusted data using the pickle library in Python. This is directly related to the MITRE ATT&CK technique for Insecure Deserialization (T1648), as the flaw is due to the unsafe handling of serialized data.
  • T1021: Remote Services: The use of ZeroMQ sockets over the network, which could be exploited by attackers to send crafted malicious objects, indicates the technique of Remote Services (T1021). This is because the vulnerability allows remote code execution via network-exposed services.
  • T1601: Modify System Image: The article discusses how Meta addressed the issue by switching from the pickle serialization format to JSON for socket communication. This reflects the technique of Update Software (T1601), where the vulnerability is mitigated by updating the software to use a safer serialization format.
  • T1498: Network Denial of Service: The article also touches on a separate issue where OpenAI's ChatGPT crawler could be manipulated to initiate a distributed denial-of-service (DDoS) attack. This aligns with the MITRE ATT&CK technique for Network Denial of Service (T1498), as the vulnerability can be used to overwhelm a target site's resources.
Fact-Based Attack Chains

F1: Exploitation of CVE-2024-50050 in Meta's Llama framework via unsafe deserialization using pickle over a ZeroMQ socket.

  • Attacker identifies a target system running an affected version of meta-llama (prior to 0.0.41). (Cited from: "Affected versions of meta-llama", "issue was addressed by Meta on October 10 in version 0.0.41")
  • Attacker discovers that the ZeroMQ socket used by the Llama Stack Python Inference API implementation is exposed over the network. (Cited from: "In scenarios where the ZeroMQ socket is exposed over the network")
  • Attacker crafts malicious data in the form of a Python object designed to execute arbitrary code upon deserialization by pickle. (Cited from: "attacker can execute arbitrary code by sending malicious data that is deserialized", "sending crafted malicious objects to the socket", "automatically deserialize Python objects using pickle")
  • Attacker sends the crafted malicious object to the exposed ZeroMQ socket of the target inference server. (Cited from: "sending crafted malicious objects to the socket")
  • The recv_pyobj function within the server's Python Inference API implementation receives the malicious object. (Cited from: "Since recv_pyobj will unpickle these objects")
  • The server automatically deserializes the received object using the unsafe pickle library, triggering the vulnerability. (Cited from: "vulnerable to deserialization of untrusted data", "automatically deserialize Python objects using pickle", "recv_pyobj will unpickle these objects")
    • BR-76: Python Deserialization Protection - This mechanism is applicable because it intercepts the Python deserialization process for objects originating from the network, matching the vulnerability description (Python + deserialization/CWE-502). It can block subsequent OS command execution attempts by the deserialized object.
  • The embedded malicious code within the crafted object is executed on the host machine running the llama-stack inference server, achieving Remote Code Execution (RCE). (Cited from: "attacker could achieve arbitrary code execution (RCE) on the host machine")
    • BR-76: Python Deserialization Protection - This mechanism is applicable because it prevents deserialized Python objects (originating from the network) from executing system-native binaries, directly blocking the RCE attempt described. The LLM correlation rule (Python + deserialization/CWE-502) is met.
    • BR-77: Python OS Command Injection Prevention - This mechanism is applicable because RCE achieved through Python deserialization often involves executing OS-level commands from within the Python process. This mechanism detects and blocks such attempts. The LLM correlation rule (Python + OS command injection/CWE-78) is met.
    • BR-54: Container Drift Protection (Binaries & Scripts) - If this software ran inside a container, then this mechanism applies because it prevents the execution of any new executable binaries or scripts dropped onto the container filesystem by the attacker using the RCE. The LLM correlation rule (RCE achieved) is met.
    • BR-55: Reverse Shell Protection - This mechanism is applicable because attackers commonly establish reverse shells after achieving RCE. This mechanism blocks the binding of shell I/O to network sockets. The LLM correlation rule (RCE achieved) is met.
    • BR-90: Process Exec Deny - This mechanism is applicable because the RCE payload might attempt to execute common tools like /nc/wget, or /curl for subsequent actions (e.g., downloading more tools, C2), which this mechanism blocks by default based on path suffix.
    • BR-88: Process Path Exec Allow - This mechanism is applicable because if the attacker uses the RCE to download and execute a tool from a non-standard, non-allowlisted path (like /tmp), this mechanism would block the exec() call. The LLM correlation rule (execute from non-standard path) is met.
    • BR-87: Process Socket Deny - This mechanism is applicable because the RCE payload might run as, or spawn, a specific unauthorized process that then attempts to initiate network connections (e.g., for C2 or data exfiltration). This mechanism can block socket operations for denied processes. The LLM correlation rule (unauthorized process performs network activity) is met.
    • BR-82: Process Runtime Execution Guardrails - This mechanism is applicable because the RCE payload might attempt to start new, unauthorized processes on the system to perform further actions. This mechanism prevents unauthorized process creation. The LLM correlation rule (unauthorized process execution) is met.
    • BR-80: Tainted File Download Protection - This mechanism is applicable if the RCE payload specifically uses wget or curl to download a file containing code and then attempts to execute that downloaded file. This mechanism monitors these specific processes and blocks subsequent execution. The LLM correlation rule (file download + execute code) is met.
    • BR-62: Linux/Host Drift Protection - This mechanism is applicable if the RCE occurs directly on the host (not container) and the attacker attempts to execute code (binary or script) that was added to the filesystem outside of a trusted package manager post-boot. The LLM correlation rule (RCE allows adding/executing unauthorized files) is met.
    • BR-65: Container Host Drift Prevention - If this vulnerability affects a process running directly on the container host, then this mechanism applies because it prevents unauthorized processes (not on allow list) from executing new or modified files added to the host filesystem post-boot. The LLM correlation rule (unauthorized file execution) is met.
See Blue Rock In Action