CVE-2025-32444 (CVSS 10): Critical RCE Flaw in vLLM’s Mooncake Integration Exposes AI Infrastructure
CVE-2025-32444 is a critical security vulnerability identified in vLLM, an open-source library for serving large language models, specifically affecting the Mooncake integration component. This flaw, with a CVSS score of 10.0, poses a severe risk of Remote Code Execution (RCE) due to insecure handling of serialized data within the recv_pyobj()
function, which uses Python's pickle.loads()
on untrusted data received over unsecured ZeroMQ sockets. The vulnerability impacts vLLM deployments utilizing the Mooncake integration in versions 0.6.5 and above, while deployments not using this integration remain unaffected. To address this issue, the vLLM team has released a patched version, v0.8.5, and users of affected versions are strongly urged to upgrade to mitigate the RCE risk. No specific Indicators of Compromise (IOCs) such as malicious file hashes or IP addresses were provided.
This security issue gives an attacker the ability to execute arbitrary code remotely on vLLM instances due to insecure deserialization within the Mooncake integration. The following protection guardrails can further prevent the following steps an attacker can take: When an attacker sends a specially crafted serialized Python object to the vulnerable recv_pyobj()
function, which then attempts to deserialize it using pickle.loads()
, Python Deserialization Protection helps prevent the initial remote code execution by intercepting this deserialization process and restricting potentially harmful function calls embedded within the payload, such as those designed to initiate system commands or access sensitive files. If the attacker's code, perhaps through a partially successful deserialization, then attempts to execute operating system commands directly from within the Python environment, Python OS Command Injection Prevention would monitor and block these unauthorized system-level actions, for instance, preventing the execution of shell commands intended to download additional malware or exfiltrate data. To establish persistent control or interactive access after gaining an initial foothold, an attacker might try to set up a reverse shell; Reverse Shell Protection thwarts this by preventing the compromised process from binding shell input, output, and error streams to a network socket, thereby blocking the creation of such interactive command channels. In scenarios where the vulnerable vLLM service is running inside a container, and the attacker, having achieved code execution, attempts to introduce and run new malicious binaries or scripts that were not part of the original, trusted container image, Container Drift Protection (Binaries & Scripts) would block their execution, thus preserving the integrity of the containerized environment. Finally, should an attacker manage to place malicious tools or scripts in non-standard file system locations, such as temporary directories, and then attempt to execute them to further their attack, Process Path Exec Allow enforces execution policies based on pre-approved path allowlists, preventing these unauthorized programs from running from untrusted locations.
- T1203: Exploitation for Client Execution: The attacker exploits a vulnerability in the vLLM library's Mooncake integration, specifically targeting the insecure handling of serialized data in the recv_pyobj() function. This function uses pickle.loads() on data received over unsecured ZeroMQ sockets, which is a known method for Remote Code Execution (RCE) if the data is untrusted. The attacker can send malicious serialized data to trigger arbitrary code execution on the target system. This attack method aligns with MITRE ATT&CK Technique ID T1203, which covers exploitation for client execution, including exploiting software vulnerabilities to execute code remotely.
- T1059.006: Command and Scripting Interpreter: Python: The vulnerability involves the use of Python's pickle.loads() function on untrusted data, which is inherently insecure as it can deserialize data that leads to code execution. This aligns with MITRE ATT&CK Technique ID T1059.006, which covers the use of Python for execution. The attacker can craft serialized data that, when deserialized using pickle.loads(), executes arbitrary Python code.
F1: Direct exploitation of CVE-2025-32444 by sending a malicious pickle payload over an unsecured ZeroMQ socket to the recv_pyobj()
function in vLLM's Mooncake integration, leading to Remote Code Execution.
- Attacker identifies a vLLM deployment actively utilizing the Mooncake integration for distributed key-value (KV) transfer in an affected version (greater than or equal to 0.6.5). (Cited from: "vLLM deployments that actively utilize the Mooncake integration for distributed key-value (KV) transfer in versions greater than or equal to 0.6.5.")
- Attacker crafts a malicious serialized Python object (pickle payload) designed to execute arbitrary code upon deserialization. (Cited from: "Using
pickle.loads()
on untrusted data is a known security risk as it can allow an attacker to execute arbitrary code on the system.") - Attacker establishes a connection and sends the crafted malicious pickle payload to the vulnerable vLLM instance over an unsecured ZeroMQ socket targeting the Mooncake integration. (Cited from: "process incoming data received over unsecured ZeroMQ sockets.")
- The
recv_pyobj()
function withinvllm/vllm/distributed/kvtransfer/kvpipe/mooncake_pipe.py
receives the attacker's data. (Cited from: "The flaw stems from the insecure handling of serialized data within therecv_pyobj()
function located in thevllm/vllm/distributed/kvtransfer/kvpipe/mooncake_pipe.py
file.") - The
recv_pyobj()
function insecurely usespickle.loads()
to deserialize the received payload. (Cited from: "This function implicitly uses Python'spickle.loads()
to process incoming data received over unsecured ZeroMQ sockets.")- BR-76: Python Deserialization Protection - This mechanism is applicable because the vulnerability explicitly involves Python deserialization (
pickle.loads()
). BR-76 is designed to limit the actions that Python deserialized objects are allowed to take by intercepting the deserialization process and applying policies to restrict function calls from deserialized objects, potentially preventing the RCE.
- BR-76: Python Deserialization Protection - This mechanism is applicable because the vulnerability explicitly involves Python deserialization (
- The malicious code embedded within the deserialized pickle payload executes on the target system, resulting in Remote Code Execution (RCE). (Cited from: "Using
pickle.loads()
on untrusted data is a known security risk as it can allow an attacker to execute arbitrary code on the system.", "indicating a severe risk of Remote Code Execution (RCE).")- BR-76: Python Deserialization Protection - This mechanism is applicable because it aims to prevent the execution of function calls from deserialized Python objects. By blocking these calls, it can prevent the malicious code within the pickle payload from achieving RCE. The LLM rule is met as the vulnerability involves Python and deserialization.
- BR-77: Python OS Command Injection Prevention - This mechanism is applicable if the RCE achieved through the pickle payload involves executing system-native binaries or shell commands from within the Python application. BR-77 monitors Python runtime environments for such patterns and blocks unauthorized command execution. The LLM rule is met as the vulnerability involves Python and leads to RCE, which often translates to Python command execution or OS command injection.
- BR-55: Reverse Shell Protection - This mechanism is applicable because the vulnerability leads to RCE. According to its LLM Correlation Rule, if RCE is achieved, it's assumed the attacker can establish a reverse shell. BR-55 prevents the binding of shell STDIN/STDOUT/STDERR to network sockets, thus blocking reverse shells.
- BR-54: Container Drift Protection (Binaries & Scripts) - This mechanism is applicable if the vulnerable vLLM software runs inside a container and the RCE involves executing new executable binaries or scripts not present in the original container image. BR-54 prevents such executions. The LLM rule is met due to the RCE.
- BR-88: Process Path Exec Allow - This mechanism is applicable if the RCE involves the attacker placing and then executing a binary or script from a non-standard or temporary path. BR-88 enforces an allowlist for execution paths, blocking
exec()
calls from disallowed locations. The LLM rule is met if execution from such a path occurs due to RCE. - BR-90: Process Exec Deny - This mechanism is applicable if the RCE attempts to execute specific denied processes, such as
/nc
,/wget
, or/curl
. BR-90 blocks execution of processes whose path ends with these predefined suffixes.