Critical Remote Code Execution Vulnerability in vLLM via Mooncake Integration

Age
5 months ago
Summary

A critical remote code execution vulnerability, identified as CVE-2025-29783, has been discovered in vLLM, a widely used library for Large Language Model inference and serving, particularly when integrated with Mooncake for distributed deployments. This vulnerability, which has a maximum CVSS score of 10, stems from an unsafe deserialization process using pickle.loads() over ZMQ/TCP, allowing attackers to execute remote code on distributed hosts. The Mooncake integration's network exposure and lack of controls exacerbate the issue, making deployments vulnerable to arbitrary user payloads. The affected vLLM versions are 0.6.5 to 0.8.0, with a patch available in version 0.8.0. Users are advised to upgrade immediately to mitigate the risk. The vulnerability has been addressed through a pull request, PR #14228.

How BlueRock Helps

This security issue gives an attacker the ability to remotely execute code on vLLM deployments using the Mooncake feature by sending a specially crafted payload that exploits an unsafe deserialization process. The following protection guardrails can further prevent the following steps an attacker can take: Initially, an attacker would send malicious data designed to be processed by Python's pickle.loads() function; Python Deserialization Protection directly counters this by intercepting the deserialization attempt and applying security policies to block the execution of harmful function calls embedded within the attacker's payload, thus preventing the initial remote code execution. Should an attacker somehow achieve code execution and the malicious code attempts to run operating system commands for reconnaissance (like gathering system information using uname or whoami) or to manipulate files (such as compressing model data for exfiltration using tar), Python OS Command Injection Prevention would monitor the Python runtime and block these unauthorized OS command execution attempts. To establish persistent control, an attacker might then try to create a reverse shell, connecting the compromised system back to their command-and-control server; Reverse Shell Protection is designed to detect and prevent this by blocking attempts to bind shell file descriptors to network sockets. If the vLLM service is running in a containerized environment and the attacker, having gained code execution, attempts to download and run new tools not part of the original container image—such as a more robust backdoor, a network scanning utility to find other vulnerable hosts, or a data exfiltration tool like rcloneContainer Drift Protection (Binaries & Scripts) would block the execution of these unauthorized binaries or scripts. Furthermore, if the attacker's code execution involves placing a malicious script or executable in a non-standard directory like /tmp and then attempting to run it, Process Path Exec Allow would intercept this execution attempt and block it if the path is not on an approved allowlist, preventing the attacker from running tools from unexpected locations.

MITRE ATT&CK Techniques Inferred
  • T1059.006: Command and Scripting Interpreter: Python: The vulnerability in vLLM via Mooncake integration allows attackers to execute remote code by exploiting an unsafe deserialization process. The use of pickle.loads() for deserializing network data is the core issue, which aligns with the MITRE ATT&CK technique for exploiting deserialization vulnerabilities to execute arbitrary code. This technique is identified as T1059.006 (Command and Scripting Interpreter: Python).
  • T1590: Gather Victim Network Information: The article mentions that the Mooncake integration exposes sockets on all interfaces without network controls, allowing arbitrary users to send payloads to the affected service. This indicates a lack of proper network segmentation and filtering, which is relevant to the MITRE ATT&CK technique T1590 (Gather Victim Network Information) as attackers can exploit network configurations to identify vulnerable services.
  • T1040: Network Sniffing: The network exposure of the Mooncake pipe using ZMQ over TCP suggests that attackers could potentially perform network sniffing or traffic analysis to gather information about the communication protocols and data being transmitted. This aligns with MITRE ATT&CK technique T1040 (Network Sniffing).
  • T1570: Lateral Tool Transfer: The vulnerability allows for remote code execution on distributed hosts, indicating that attackers could use this to move laterally across the network. This aligns with the MITRE ATT&CK technique T1570 (Lateral Tool Transfer), as attackers could transfer tools or payloads across the network using the compromised service.
Fact-Based Attack Chains

F1: Exploitation of CVE-2025-29783 in vLLM by sending a crafted pickled payload over ZMQ/TCP to the Mooncake integration, leading to remote code execution due to unsafe deserialization.

  • Attacker identifies a vLLM deployment using the Mooncake feature for distributed LLM deployments and running an affected version (vLLM > 0.6.5 and < 0.8.0). (Cited from: "vulnerability impacts vLLM versions greater than or equal to 0.6.5 and less than 0.8.0", "vulnerability lies in vLLM’s integration with Mooncake, a feature used for distributed LLM deployments")
  • Attacker confirms that the Mooncake integration is active, exposing a ZMQ/TCP service on the network. (Cited from: "When vLLM is configured to use Mooncake, it exposes an unsafe deserialization process directly over ZMQ/TCP.", "The mooncake pipe is exposed over the network by design, using ZMQ over TCP.")
  • Attacker notes the lack of network or authentication controls on the Mooncake service. (Cited from: "it does not appear that there are any controls (network, authentication, etc) to prevent arbitrary users from sending this payload to the affected service.", "The mooncake integration opens sockets on all interfaces")
  • Attacker crafts a malicious payload. This payload is a Python object that, when serialized using pickle and then deserialized using pickle.loads(), will execute arbitrary code. (Cited from: "The root problem is recv tensor() calls recv impl which passes the raw network bytes to pickle.loads().", "This exposure allows attackers to execute remote code on distributed hosts.")
  • Attacker connects to the exposed ZMQ/TCP socket used by the vLLM Mooncake integration. (Cited from: "The mooncake integration opens sockets on all interfaces")
  • Attacker sends the raw bytes of the serialized malicious pickle payload over the network to the target vLLM service's Mooncake endpoint. (Cited from: "sending this payload to the affected service")
  • On the vulnerable vLLM host, the recv_tensor() (or recv_impl) function receives these raw network bytes. (Cited from: "The root problem is recv tensor() calls recv impl")
  • The received bytes are passed directly to pickle.loads(). (Cited from: "which passes the raw network bytes to pickle.loads()")
    • BR-76: Python Deserialization Protection - This mechanism is applicable because it is designed to limit the actions that Python deserialized objects are allowed to take by intercepting the deserialization process (like pickle.loads()) and applying security policies to restrict the execution of function calls originating from deserialized objects. The article explicitly states the vulnerability involves pickle.loads() in Python.
  • The pickle.loads() function deserializes the malicious object, causing the embedded arbitrary code to execute on the vLLM distributed host with the privileges of the vLLM process. (Cited from: "This exposure allows attackers to execute remote code on distributed hosts.", "allows for remote code execution on any deployments using Mooncake to distribute KV across distributed hosts.")
    • BR-76: Python Deserialization Protection - This mechanism is applicable because it would attempt to block the malicious function calls originating from the deserialized Python object, thereby preventing or limiting the arbitrary code execution. The vulnerability is centered around Python deserialization with pickle.loads().
    • BR-77: Python OS Command Injection Prevention - This mechanism is applicable because if the arbitrary code executed via pickle deserialization attempts to run OS-level commands or system-native binaries, this mechanism would monitor the Python runtime for such patterns and block unauthorized command execution attempts.
    • BR-54: Container Drift Protection (Binaries & Scripts) - This mechanism is applicable because if the vulnerable software ran inside a container, and the arbitrary code execution attempts to run any new executable binary or script not present in the original container image, this mechanism would block that execution.
    • BR-82: Process Runtime Execution Guardrails - This mechanism is applicable because if the arbitrary code execution attempts to start any new unauthorized processes within a Linux-based container environment, this mechanism, leveraging NSJail, would prevent such processes from starting based on predefined rules.
    • BR-88: Process Path Exec Allow - This mechanism is applicable because if the arbitrary code execution involves placing an executable or script in a non-allowed filesystem path (e.g., /tmp) and then attempting to run it, this mechanism would intercept the exec() call and block it if the path is not on the allowlist.
    • BR-90: Process Exec Deny - This mechanism is applicable because if the arbitrary code execution attempts to run a process whose final path component matches predefined suffixes (like '/nc', '/wget', '/curl' by default), this mechanism would block that execution.
    • BR-55: Reverse Shell Protection - This mechanism is applicable because the vulnerability allows for remote code execution. As per the mechanism's LLM correlation rule, if RCE is achieved, it's assumed the attacker can also establish a reverse shell, which this mechanism aims to prevent by blocking the binding of shell file descriptors to network sockets.
See Blue Rock In Action