{"id":106095,"date":"2025-12-16T15:00:00","date_gmt":"2025-12-16T13:00:00","guid":{"rendered":"https:\/\/staging.checkmarx.com\/?post_type=zero-post&#038;p=106095"},"modified":"2026-02-27T20:38:44","modified_gmt":"2026-02-27T18:38:44","slug":"turning-ai-safeguards-into-weapons-with-hitl-dialog-forging","status":"publish","type":"zero-post","link":"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/","title":{"rendered":"Turning AI Safeguards Into\u00a0Weapons with HITL Dialog Forging"},"content":{"rendered":"<style type=\"text\/css\">@import url(\"https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/highlight.js\/11.11.1\/styles\/vs2015.min.css\");@font-face{font-family:'Hack';src:url('https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/hack-font\/3.3.0\/web\/fonts\/hack-regular-subset.woff2') format('woff2')}:root{--code-font:'Hack','Menlo','Consolas',monospace !important;--code-bg:#1e1e1e;--code-color:#0c1;--code-dim:#071;--text-color:#121185;--highlight-color:#f8ff91;--highlight-color-alt:#736ca0}article.content{max-width:100% !important;min-width:80% !important;width:99% !important}.wp-block-code code{text-wrap:nowrap !important}figure{margin-top:1.5rem;margin-bottom:1.5rem}p.caption,figcaption{font-size:1rem !important;font-style:italic !important;color:var(--code-dim) !important}p.caption *,figcaption *{font-size:inherit !important}div.callout{max-width:80% !important;padding-top:.5rem;padding-bottom:.5rem;margin-top:1rem;margin-bottom:1rem;display:block;margin-left:10%;border-top:.3rem solid #121185;border-bottom:.3rem solid #121185}div.callout p{font-size:x-large;text-align:left;font-weight:bold}.cxzero-video-include{display:block;max-width:1920px;width:100%;padding-top:1rem;padding-bottom:1rem}.cxzero-video-include video{display:block;padding:.5rem;background-color:var(--code-bg);width:98%;object-fit:cover}pre.wp-block-code,pre.highlighted-code,pre.sourceCode,pre{border:1px solid var(--code-color);width:90%;background-color:var(--code-bg);color:var(--code-color);margin:1em;padding:2em;overflow-x:scroll;font-family:var(--code-font);font-size:10.5pt;line-height:1.1em;text-wrap:nowrap !important;box-shadow:5px 5px 13px 0 var(--code-bg)}* kbd,* code,* tt{font-family:var(--code-font);padding-inline:.5em;color:var(--code-dim);font-size:85%}pre code{color:var(--code-color);font-size:90%}pre.highlighted-code span{font-family:var(--code-font);font-size:10.5pt;color:var(--code-color)}pre.highlighted-code span.comment{font-style:italic;color:var(--code-dim)}pre.highlighted-code span.keyword,pre.highlighted-code span.preproc{font-weight:bold;font-style:oblique}blockquote,blockquote *{font-size:1.375rem !important;font-style:italic !important}blockquote{border-left:.1rem solid;padding-left:1rem}mark,mark *{background-color:var(--highlight-color) !important}mark.ai-content,mark.ai-content *{background-color:var(--highlight-color-alt) !important;color:#fff !important}.cxzero-cve-block{border:1px solid var(--code-color,#0c1);padding:.5rem;p{padding:0;margin:0}span.vulndesc{display:block;font-size:.9rem;font-weight:400;font-style:italic}span.cvss::before{content:\"  \"}span.cvss{background:#fe0}span.cvss.critical{background:#c00;color:#eee}span.cvss.high{background:#ffac1c;color:#0015ff}span.vector::before{content:\"\u25b8\"}span.vector,span.vector *{overflow-wrap:break-word;font-family:var(--code-font);font-size:10pt}.kev{display:block;font-weight:bold}.kev::before{content:\"\u203c\ufe0f\"}}.print-source-info{display:none}@media print{.header,.header *,.article-nav,.article-nav *,.aticle-nav,.aticle-nav *,.section_latest,.section-latest *,footer,footer *,.section-menu-page,.section-menu-page *,.top-menu,.top-menu *,.top-menu__container,.top-menu__container *,.section-zero-article,.section-zero-article *{display:none}@page{margin:13mm !important}.section-aticle-header__image-or-video{max-width:125mm}.print-source-info{display:block;border-left:.2rem solid #000;font-style:italic !important;font-size:85%;padding-left:1rem}}<\/style> <script src=\"https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/highlight.js\/11.11.1\/highlight.min.js\" integrity=\"sha512-EBLzUL8XLl+va\/zAsmXwS7Z2B1F9HUHkZwyS\/VKwh3S7T\/U0nF4BaU29EP\/ZSf6zgiIxYAnKLu6bJ8dqpmX5uw==\" crossorigin=\"anonymous\" referrerpolicy=\"no-referrer\"><\/script> <script>hljs.highlightAll();<\/script> \n\n\n\n<p class=\"print-source-info\"><script>document.write(\"Copyright Checkmarx, all rights reserved. Retrieved \"+new Date().toLocaleDateString()+\" from<br\/>\"+window.location.href);<\/script><noscript>This document copyright Checkmarx, all rights reserved.<\/noscript><\/p>\n\n\n\n<p>\n  Here\u2019s an interaction with an AI code-assistant agent that flags a command\n  injection vulnerability in a codebase and offers to fix it. It uses a\n  Human-in-the-Loop (HITL) control; that is, it presents an HITL dialog that\n  explains what it intends to do and asks for permission. Helpful, right?\n<\/p>\n<figure>\n  <img decoding=\"async\" src=\"\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_image1.webp\" alt=\"Screen capture of an HITL dialog from Claude Code\">\n  <figcaption aria-hidden=\"true\">\n    Screen capture of an HITL dialog from Claude Code\n  <\/figcaption>\n<\/figure>\n<p>\n  Unfortunately for you, approving that seemingly benign dialog just executed\n  arbitrary code, supplied by an attacker, on your machine.\n<\/p>\n<p>\n  Skeptical that an AI agent has such a remote code execution flaw? Watch this:\n<\/p>\n<figure>\n  <img decoding=\"async\" src=\"\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_image2_anim.webp\" alt=\"Animation of malicious code that only starts calc.exe; it could be worse\u2026\">\n  <figcaption aria-hidden=\"true\">\n    Animation of malicious code that only starts calc.exe; it could be worse\u2026\n  <\/figcaption>\n<\/figure>\n<p>\n  What you\u2019re seeing is a typical safety pattern for AI agents: an agent asks\n  for user confirmation before performing potentially harmful operations. This\n  Human-in-the-Loop (HITL) dialog acts as a safeguard, intended to ensure that\n  agents don\u2019t execute these potentially harmful actions without explicit\n  approval.\n<\/p>\n<p>\n  The\n  <a href=\"https:\/\/checkmarx.com\/zero-post\/bypassing-ai-agent-defenses-with-lies-in-the-loop\/\">Lies-in-the-Loop (LITL) attack<\/a>\n  exploits the trust users place in these dialogs by forging their content. This\n  technique is also known as <em>HITL Dialog Forging<\/em>.\n<\/p>\n<p>\n  When the LITL attack is not involved, this is how Claude Code\u2019s HITL dialog\n  looks when it runs in VS Code terminal:\n<\/p>\n<figure>\n  <img decoding=\"async\" src=\"\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_image3.webp\" alt=\"Regular behavior: git init command displayed and HITL dialog asks permission\">\n  <figcaption aria-hidden=\"true\">\n    Regular behavior: git init command displayed and HITL dialog asks permission\n  <\/figcaption>\n<\/figure>\n\n\n    <div class=\"section-zero-article light-theme\">\n        <div class=\"section-zero-article__wrapper\">\n            <div class=\"section-zero-article__nav-wrapper\">\n\t\t\t\t<div class=\"section-article-title\">Get Checkmarx Zero in your Inbox<\/div>\n                <button class=\"section-article-button\">Subscribe Now                    <img decoding=\"async\" src=\"https:\/\/checkmarx.com\/wp-content\/themes\/checkmarx\/assets\/images\/subscribe-zero\/right_up_big.svg\" alt=\"right\">\n                <\/button>\n            <\/div>\n            <img decoding=\"async\" class=\"visual-image\" src=\"https:\/\/checkmarx.com\/wp-content\/themes\/checkmarx\/assets\/images\/subscribe-zero\/visual-article.png\" alt=\"visual\">\n        <\/div>\n    <\/div>\n\t<!-- zero-subscribe-form-modal -->\n<div class=\"modal zero-subscribe-modal\" id=\"zero-subscribe-modal\">\n    <div class=\"modal__overlay modal__header-overlay\" tabindex=\"-1\">\n        <div class=\"modal__container\">\n            <header class=\"modal__header\" tabindex=\"2\">\n                <button class=\"modal__close-zero\" title=\"Close window\" aria-label=\"Close window\"><\/button>\n                <div class=\"section-subscribe\">\n                    <div class=\"section-subscribe__wrap-form\">\n                        <div class=\"section-subscribe__leftPart\">\n                            <div class=\"zero-modal-container\">\n                                <span class=\"zero-modal-container__title\">Never Miss Checkmarx <br> Zero Research Updates.<\/span>\n                                <span class=\"zero-modal-container__description\">Subscribe today!<\/span>\n                            <\/div>\n                            <img decoding=\"async\" class=\"zero-visual\" src=\"https:\/\/checkmarx.com\/wp-content\/themes\/checkmarx\/assets\/images\/subscribe-zero\/cx_zero_subscribe_visual.webp\" alt=\"visual\">\n                        <\/div>\n                        <div class=\"section-subscribe__form hbsp-form form-with-multi-tags-select\">\n                            <script charset=\"utf-8\" type=\"text\/javascript\" src=\"\/\/js.hsforms.net\/forms\/embed\/v2.js\"><\/script>\n                            <script>\n                                hbspt.forms.create({\n                                    region: \"na1\",\n                                    portalId: \"146169\",\n                                    formId: \"fefb6730-994f-41bf-84ae-79460279a306\",\n                                    onFormReady: function ($form) {\n                                        [\n                                            ...document.querySelectorAll('.hs_firstname'),\n                                            ...document.querySelectorAll('.hs_lastname'),\n                                            ...document.querySelectorAll('.hs_company'),\n                                            ...document.querySelectorAll('.hs_jobtitle'),\n                                            ...document.querySelectorAll('.hs-dependent-field')\n                                        ].forEach(elem => elem.style.display = 'none');\n\n\n                                    },\n                                    onFormSubmit: function ($form) {\n                                        document.querySelector('.zero-visual').style.display = 'none';\n                                        document.querySelector('.section-subscribe__leftPart').style.display = 'none';\n                                        document.querySelector('.form-description').style.display = 'none';\n                                        document.querySelector('.section-subscribe__form').style.margin = 0;\n                                        document.querySelector('.section-subscribe__form').style.padding = 0;\n                                        document.querySelector('.section-subscribe').style.minHeight = '132px';\n                                        document.querySelector('.section-subscribe__wrap-form').style.minHeight = '132px';\n                                        document.querySelector('.subscribe-zero-button__description-wrapper')\n                                            .classList\n                                            .add('subscribe-zero-button__description-hide');\n                                    }\n                                });\n                                document.addEventListener('change', (e) => {\n                                    if (e.target.closest('.hs-input')) {\n                                        [\n                                            ...document.querySelectorAll('.hs_firstname'),\n                                            ...document.querySelectorAll('.hs_lastname'),\n                                            ...document.querySelectorAll('.hs_company'),\n                                            ...document.querySelectorAll('.hs_jobtitle'),\n                                            ...document.querySelectorAll('.hs-dependent-field')\n                                        ].forEach(elem => elem.style.display = 'block');\n                                    }\n\n                                })\n                            <\/script>\n                            <p class=\"form-description\">By submitting my information to Checkmarx, I hereby consent to the terms and conditions found in the <a href=\"\/legal\/privacy-policy\/\" target=\"_blank\">Checkmarx\u00a0Privacy\u00a0Policy<\/a> and to the processing of my personal data as described therein. By clicking submit above, you consent to allow Checkmarx to store and process the personal information submitted above to provide you the content requested.<\/p>\n                        <\/div>\n                    <\/div>\n                <\/div>\n            <\/header>\n        <\/div>\n    <\/div>\n<\/div>\n\n\n\n<h2 id=\"background\" class=\"article-anchor\">Background<\/h2>\n<p>\n  This article provides a deeper technical analysis of the novel agentic AI\n  attack vector:\u200athe LITL attack, which we recently developed and documented in\n  <a href=\"https:\/\/checkmarx.com\/zero-post\/bypassing-ai-agent-defenses-with-lies-in-the-loop\/\"><em>Bypassing AI Agent Defenses With Lies-In-The-Loop<\/em><\/a>. The LITL attack directly targets the HITL component, causing the agent to\n  prompt the user with a seemingly benign HITL dialog that can deceive users\n  into approving a remote code execution attack originating from indirect prompt\n  injections.\n<\/p>\n<figure>\n  <img decoding=\"async\" src=\"\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_image4.webp\" alt=\"LITL attack workflow\">\n  <figcaption aria-hidden=\"true\">LITL attack workflow<\/figcaption>\n<\/figure>\n<p>\n  Why is that an issue? Well, because HITL dialogs are one of the recommended\n  mitigations by OWASP for two vulnerabilities from the OWASP LLM Top 10 list:\n<\/p>\n<ol type=\"1\">\n  <li>\n    <p>\n      <a href=\"https:\/\/genai.owasp.org\/llmrisk\/llm01-prompt-injection\/\">LLM01: Prompt Injection<\/a>\n    <\/p>\n  <\/li>\n  <li>\n    <p>\n      <a href=\"https:\/\/genai.owasp.org\/llmrisk\/llm062025-excessive-agency\/\">LLM06: Excessive Agency<\/a>\n    <\/p>\n  <\/li>\n<\/ol>\n<p>\n  Generally speaking, the HITL dialog can be thought of as the last line of\n  defense the agent has before executing a sensitive operation, whether\n  maliciously or unintentionally introduced (e.g., due to an agent&#8217;s mistake).\n<\/p>\n<p>This is Google\u2019s security philosophy for HITL dialogs:<\/p>\n<p>\n  This human-in-the-loop (HITL) approach acts as a final safeguard against\n  unauthorized or unintended actions resulting from a successful prompt\n  injection attack.<br>\n  <strong><em>Source:<\/em><\/strong>\n  <a href=\"https:\/\/support.google.com\/a\/answer\/16479560?hl=en#what-is-the-user-confirmation-framework\"><em><u>Google\u2019s layered defense strategy for Gemini<\/u><\/em><\/a>\n<\/p>\n<p>\n  However, the Lies-in-the-Loop (LITL) attack exploits the trust users place in\n  these approval dialogs. By manipulating what the dialog displays, attackers\n  turn the safeguard into a weapon\u200a\u2014\u200aonce the prompt looks safe, users approve\n  it without question. The LITL attack is a particular concern for privileged\n  agents, such as code assistants, which can perform very sensitive actions like\n  running OS commands and, as a result, usually lack other safeguards\n  recommended by OWASP.\n<\/p>\n<p>\n  Today, we would like to expand on additional risks and mitigations that both\n  users and developers of agentic AI can take to reduce the risks.\n<\/p>\n<h2 id=\"additional-risks\" class=\"article-anchor\">Additional Risks<\/h2>\n<h3 id=\"hitl-dialog-forging-using-padding\">\n  HITL Dialog Forging using\u00a0Padding\n<\/h3>\n<p>\n  The original post primarily discussed the fact that an attacker can tamper\n  with the dialog by <em>appending<\/em> very long text to the malicious command,\n  pushing the payload above the visible part in the terminal. However, in\n  practice, nothing prevents an attacker from also\n  <em>prepending<\/em> benign-looking text to the malicious payload, further\n  concealing its malicious intent.\n<\/p>\n<p>\n  In other words, not only is the payload out of sight, but scrolling up to the\n  start of the dialog will also lead the user to benign-looking text, further\n  reducing the victim&#8217;s suspicions.\n<\/p>\n<h3 id=\"metadata-tampering\">Metadata Tampering<\/h3>\n<p>\n  Another interesting fact not mentioned in the first blog post is that\n  sometimes metadata is attached to the dialog.\n<\/p>\n<p>\n  For example, in Claude Code, this is a one-line description that summarizes\n  what the agent is trying to do. And yes, a remote attacker can tamper with it\n  as well, via indirect prompt injection:\n<\/p>\n<figure>\n  <img decoding=\"async\" src=\"\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_image5.webp\" alt=\"HITL dialog in Claude Code, edited to highlight descriptive line\">\n  <figcaption aria-hidden=\"true\">\n    HITL dialog in Claude Code, edited to highlight descriptive line\n  <\/figcaption>\n<\/figure>\n<h3 id=\"markdown-injection\">Markdown Injection<\/h3>\n<p>\n  This concept is not new and has already been discussed in the\n  <a href=\"https:\/\/cheatsheetseries.owasp.org\/cheatsheets\/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html#html-and-markdown-injection\">OWASP Prompt Injection Prevention Cheat Sheet<\/a>.<br>\n  But when it comes to the LITL attack, it is particularly interesting.\n  Exploiting HITL dialog manipulations involves not only the content shown to\n  the user, but also the interface\u2019s design and presentation. Typically, the UI\n  is built with Markdown (or HTML); the fact that attackers can theoretically\n  break out of the Markdown syntax used for the HITL dialog, presenting the user\n  with fake UI, can lead to much more sophisticated LITL attacks that can go\n  practically undetected.\n<\/p>\n<p>\n  Let\u2019s see that in practice. As it turns out, the Copilot Chat VS Code\n  extension fails to properly sanitize Markdown:\n<\/p>\n<figure>\n  <img decoding=\"async\" src=\"\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_image6_anim.webp\" alt=\"Copilot Chat evaluating Markdown and rendering it\">\n  <figcaption aria-hidden=\"true\">\n    Copilot Chat evaluating Markdown and rendering it\n  <\/figcaption>\n<\/figure>\n<p>\n  This finding deserves a closer look. Once Copilot Chat is confirmed to be\n  vulnerable to Markdown injection, two primary attack vectors emerge:\n<\/p>\n<ol type=\"1\">\n  <li>\n    <p>\n      Injecting page elements that could trigger XSS or expose sensitive\n      data\u200a\u2014\u200aa vector unrelated to the LITL attack (see\n      <a href=\"https:\/\/checkmarx.com\/zero-post\/exploiting-markdown-injection-in-ai-agents-microsoft-copilot-chat-and-google-gemini\/\"><em>Exploiting Markdown Injection in Microsoft Copilot Chat and Google\n          Gemini<\/em><\/a>).\n    <\/p>\n  <\/li>\n  <li>\n    <p>\n      The primary focus of this section &#8211; Manipulating the HITL dialog through\n      Markdown.\n    <\/p>\n  <\/li>\n<\/ol>\n<p>\n  Let\u2019s see how Copilot Chat handles the HITL dialog when you combine it with\n  Markdown. Below is a PoC showing Markdown injected into the HITL dialog using\n  <strong>direct prompt injection<\/strong> (just for demonstration). As you can\n  see, when Copilot Chat first displays the dialog, everything shows up properly\n  inside a code block, making it clear what command is about to run\u200a\u2014\u200ano\n  Markdown injection yet. But right after the user approves or skips the\n  command, those same elements suddenly render as Markdown.\n<\/p>\n<figure>\n  <img decoding=\"async\" src=\"\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_image7_anim.webp\" alt=\"Copilot Chat session rendering injected markdown only after approval\">\n  <figcaption aria-hidden=\"true\">\n    Copilot Chat session rendering injected markdown only after approval\n  <\/figcaption>\n<\/figure>\n<p>\n  In this specific case, there\u2019s no additional risk because the markdown is only\n  rendered <strong>after<\/strong> the user responds to the dialog. However, if\n  attackers can inject Markdown into the HITL dialog through\n  <strong>indirect<\/strong> prompt injections that is rendered\n  <strong>before<\/strong> the user\u2019s response, they could execute a far more\n  sophisticated LITL attack, for example:\n<\/p>\n<ol type=\"1\">\n  <li>\n    <p>\n      Prematurely closing the code block containing the actual malicious\n      command.\n    <\/p>\n  <\/li>\n  <li>\n    <p>\n      Inserting lengthy, innocuous explanatory text to push the malicious\n      command out of view.\n    <\/p>\n  <\/li>\n  <li>\n    <p>\n      Opening a new code block displaying benign commands (like git status or\n      bug fixes).\n    <\/p>\n  <\/li>\n<\/ol>\n<p>\n  With the malicious command hidden, the user reviews only what appears to be\n  safe commands, approves the execution, and unknowingly authorizes malicious\n  code to run.<br>\n  Our guess? An agent that does precisely this either already exists or will in\n  the future.\n<\/p>\n<h2 id=\"defensive-measures-for-agentic-ai-users\" class=\"article-anchor\">\n  Defensive Measures for Agentic AI\u00a0Users\n<\/h2>\n<h3 id=\"awareness-education\">Awareness &amp; Education<\/h3>\n<p>\n  As we highlighted in the first blog post, this is a user-deception attack, and\n  like other similar attacks (for example, phishing), the first step is\n  education and awareness. That means spreading the word of this risk will help\n  minimize loss. It\u2019s also encouraged for organizations to educate their\n  employees on this threat, ensuring everyone is aware of the risk.\n<\/p>\n<h3 id=\"pay-attention-to-details\">Pay Attention to\u00a0Details<\/h3>\n<p>\n  HITL dialog components have distinct visual styling, though these differences\n  may be subtle. Recognizing these indicators is crucial for identifying\n  tampering attempts\u200a (as long as the agent isn\u2019t susceptible to Markdown\n  injection vulnerabilities).\n<\/p>\n<p>\n  It\u2019s also worth considering that agents operating in feature-rich UI\n  environments offer advantages over command-line terminals. For instance, VS\n  Code extensions provide full Markdown rendering capabilities, whereas\n  terminals typically display content using basic ASCII characters. Choosing\n  agents with more sophisticated UIs can make it easier to identify deceptive\n  behavior early.\n<\/p>\n<p>\n  Consider the following example: In Claude Code (which runs in the VS Code\n  <strong>terminal<\/strong> in my setup), the HITL dialog is distinguished only\n  by a thin 1-pixel border. It shows a lengthy shell command where most of the\n  command consists of injected, benign-looking text. The terminal\u2019s limited UI\n  features make it remarkably easy to overlook or misinterpret:\n<\/p>\n<figure>\n  <img decoding=\"async\" src=\"\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_image8.webp\" alt=\"Claude Code HITL dialog edited to highlight the thin border around dialog content\">\n  <figcaption aria-hidden=\"true\">\n    Claude Code HITL dialog edited to highlight the thin border around dialog\n    content\n  <\/figcaption>\n<\/figure>\n<p>\n  In contrast, here\u2019s the presentation in the Copilot Chat VS Code\n  <strong>extension<\/strong>. While this approach isn\u2019t foolproof, the visual\n  distinction is considerably more apparent:\n<\/p>\n<figure>\n  <img decoding=\"async\" src=\"\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_image9.webp\" alt=\"Copilot Chat in VSCode with stronger visual distinction\">\n  <figcaption aria-hidden=\"true\">\n    Copilot Chat in VSCode with stronger visual distinction\n  <\/figcaption>\n<\/figure>\n<h2 id=\"defensive-measures-for-agent-developers\" class=\"article-anchor\">\n  Defensive Measures for Agent Developers\n<\/h2>\n<p>\n  Agents use HITL dialogs to involve users in decision-making, which naturally\n  places some responsibility on the user. That\u2019s understandable and reasonable.\n  However, agent developers can still make this process safer by helping their\n  users distinguish between legitimate and potentially malicious dialogs. This\n  section provides recommendations for doing so.\n<\/p>\n<h3 id=\"dialog-clarity\">Dialog Clarity<\/h3>\n<p>\n  UI matters significantly. Implementing a straightforward, well-designed user\n  interface that enables users to easily distinguish between HITL dialog\n  elements is essential. This visual distinction empowers users to detect\n  suspicious behavior more effectively (as outlined in the \u201cPay Attention to\n  Details\u201d section above).\n<\/p>\n<h3 id=\"classic-appsec-still-matters\">Classic AppSec Still\u00a0Matters<\/h3>\n<p>\n  Don\u2019t forget the basics: Input should always be validated and sanitized. While\n  the following vulnerabilities aren\u2019t the only ones that should be addressed,\n  they\u2019re particularly worth highlighting:\n<\/p>\n<ul>\n  <li>\n    <p>\n      <strong>OS Command Execution:<\/strong> When an agent executes OS commands\n      based on indirect prompt injection, it\u2019s crucial to use safe OS APIs that\n      clearly separate the command from its arguments. This approach provides\n      two layers of protection: it reduces the impact if an attacker\n      successfully injects their payload into an argument (rather than the\n      command itself), and in LITL attacks specifically, it constrains the\n      payload in ways that make malicious commands more visible and easier to\n      spot.\n    <\/p>\n  <\/li>\n  <li>\n    <p>\n      <strong>Markdown Injection:<\/strong> As noted above, any Markdown or HTML\n      from users or external sources must be escaped or sanitized appropriately\n      before being incorporated into the conversation. Otherwise, they can be\n      weaponized in LITL attacks.\n    <\/p>\n  <\/li>\n<\/ul>\n<h3 id=\"metadata-tampering-prevention\">Metadata Tampering Prevention<\/h3>\n<p>\n  First of all, we don\u2019t really know how Claude or any other agent constructs\n  their metadata. Still, we can generally suggest that the description of the\n  HITL dialog (or any other metadata attached to it) will be derived from the\n  dialog only after it has been entirely constructed. Thus, the metadata can\n  also reflect the risk in the dialog, which also works pretty well with the\n  following suggestion.\n<\/p>\n<h3 id=\"guardrails-for-hitl-dialog-validation\">\n  Guardrails for HITL Dialog Validation\n<\/h3>\n<p>\n  Another potential mitigation is to implement guardrails between the agent and\n  the user to validate HITL dialogs before they\u2019re displayed. These guardrails\n  could include content validation (checking commands against allowlists and\n  detecting suspicious patterns), metadata consistency checks (ensuring\n  descriptions accurately reflect the actual operations), and prompt injection\n  detection (scanning for known attack patterns or obfuscation techniques).\n  However, this approach faces its own challenges: false positives may block\n  legitimate operations, attackers can evolve to bypass detection, and if the\n  guardrail itself uses an LLM for validation, it is also subject to prompt\n  injection attacks. While guardrails can add a valuable layer of defense, they\n  shouldn\u2019t be relied upon as the sole protection mechanism.\n<\/p>\n<h3 id=\"hitl-dialog-length-restriction\">HITL Dialog Length Restriction<\/h3>\n<p>\n  Restricting dialog length is not a substantial mitigation on its own. However,\n  since part of the risk comes from malicious content being pushed out of sight\n  in overly long HITL dialogs, and because extremely long dialogs rarely provide\n  real benefit, imposing a reasonable length limit can help reduce this risk.\n<\/p>\n<p>\n  When long commands are genuinely required, they can be handled more securely\n  by relying on safe OS APIs (as noted above), which allow commands to be\n  separated into multiple smaller HITL dialogs. This separation not only makes\n  it easier for users to identify commands and their arguments but also reduces\n  the risk of hidden or confusing input.\n<\/p>\n<h3 id=\"sandboxing\">Sandboxing<\/h3>\n<p>\n  This <strong><em>could<\/em><\/strong> be the best solution, but we find it hard\n  to believe that agents (especially code assistants and the like) can be\n  sandboxed appropriately and still be helpful. Nevertheless, if sandboxing can\n  be used without affecting the agent&#8217;s usability, it should be employed.\n<\/p>\n<h2 id=\"final-notes-on-hitl-dialog-forging-and-litl\" class=\"article-anchor\">\n  Final Notes on HITL Dialog Forging and LITL\n<\/h2>\n<p>\n  There is no silver bullet against Lies-in-the-Loop attacks. As long as systems\n  rely on human-in-the-loop safeguards, vulnerabilities from over-trust,\n  complacency, and divided attention will persist. The fundamental issue is that\n  <strong>humans can only respond to what the agent presents<\/strong>\u200a\u2014\u200aand that\n  presentation can be manipulated through indirect prompt injections. By\n  poisoning the agent\u2019s context, attackers trick users into believing they\u2019re\n  approving benign actions while actually authorizing malicious ones. Once the\n  HITL dialog itself is compromised, the human safeguard becomes trivially easy\n  to bypass.\n<\/p>\n<p>\n  However, developers adopting a <strong>defense-in-depth strategy<\/strong> with\n  multiple protective layers, as listed above, can significantly reduce the\n  risks for their users. At the same time, users can strengthen resilience\n  through greater awareness, attentiveness, and a healthy degree of skepticism.\n<\/p>\n<p>\n  Ultimately, the goal is not to eliminate the risk but to\n  <strong>manage and mitigate it effectively<\/strong>.\n<\/p>\n<h3 id=\"disclosure-timeline\">Disclosure Timeline<\/h3>\n<h4 id=\"anthropic-claude-code\">Anthropic (Claude\u00a0Code)<\/h4>\n<ul>\n  <li>\n    <p>\n      <strong>27 August 2025:<\/strong> Reported arbitrary command injection via\n      the Bash() utility.\n    <\/p>\n  <\/li>\n  <li>\n    <p>\n      <strong>28 August 2025:<\/strong> Reported the HITL dialog forging issue.\n    <\/p>\n  <\/li>\n<\/ul>\n<p>\n  Both reports were acknowledged by Anthropic and classified as \u201cInformative,\u201d\n  falling outside their current threat model.\n<\/p>\n<h4 id=\"microsoft-copilot-chat\">Microsoft (Copilot\u00a0Chat)<\/h4>\n<ul>\n  <li><p>Report was submitted &#8211; 15 Oct 2025<\/p><\/li>\n  <li><p>Microsoft acknowledges the report &#8211; 15 Oct 2025<\/p><\/li>\n  <li>\n    <p>\n      Microsoft notified us that the engineering team is still working on the\n      issue &#8211; 28 Oct 2025\n    <\/p>\n  <\/li>\n  <li>\n    <p>\n      Microsoft marks the report as Completed without fixing the issue 04 Nov\n      2025\n    <\/p>\n  <\/li>\n<\/ul>\n<p>\n  <strong><u>Final response (text version below)<\/u><\/strong>\n<\/p>\n<figure>\n  <img decoding=\"async\" src=\"\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_image10.webp\" alt=\"Screenshot of response from MSRC; text follows\">\n  <figcaption aria-hidden=\"true\">\n    Screenshot of response from MSRC; text follows\n  <\/figcaption>\n<\/figure>\n<blockquote>\n  <p>Dear Ori,<\/p>\n  <p>Thank you for your submission and for continuing to engage with MSRC.<\/p>\n  <p>\n    After careful review, we\u2019ve determined that the behavior demonstrated does\n    not meet our classification for a security vulnerability. It requires\n    multiple non-default user actions, does not reliably reproduce across\n    environments, and includes warnings designed to mitigate risk.\n  <\/p>\n  <p>\n    Our assessment also considers the role of Workplace Trust, which assumes\n    users operate in environments where they review and trust the code they\n    choose to run. This principle is reflected in Microsoft\u2019s AI Vulnerability\n    Severity Classification, which evaluates both impact and exploitability.\n  <\/p>\n  <p>\n    That said, we agree this is a thoughtful observation. While not classified\n    as a vulnerability, we\u2019ve shared it with the engineering team to explore\n    ways we can make this behavior more transparent to users.\n  <\/p>\n  <p>\n    We appreciate your efforts to highlight potential concerns and welcome\n    future submissions that demonstrate broader impact or bypass existing\n    safeguards.\n  <\/p>\n  <p>\n    Sincerely,<br>\n    Justin\n  <\/p>\n  <p>Microsoft Security Response Center<\/p>\n<\/blockquote>\n\n\n\n<style type=\"text\/css\">.cxzero-social{margin-top:1em;padding-top:1em;border-top:1px solid #121086;border-bottom:1px solid #121086;padding-bottom:1em}.cxzero-social p{padding-top:.8em}.cxzero-social .cxzero-social-links{margin-left:.8em}.cxzero-social .social-link{margin-left:.6em}.cxzero-social .social-button{padding:.6em;margin:.2em .2em .2em .2em;white-space:nowrap}.cxzero-social .social-button svg,.cxzero-social .social-link svg{vertical-align:middle;height:1.3em}.cxzero-social .social-button a,.cxzero-social .social-link a{text-decoration:none !important}<\/style> <div class=\"cxzero-social\">\n<p> <span class=\"social-button\"><a class=\"social-action\" href=\"https:\/\/www.linkedin.com\/sharing\/share-offsite\/?url={url}\" onload=\"\"><svg id=\"Layer_1\" data-name=\"Layer 1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" alt=\"LinkedIn Icon\" viewbox=\"0 0 122.88 122.31\"><defs><style>.cls-1{fill:#0a66c2}.cls-1,.cls-2{fill-rule:evenodd}.cls-2{fill:#fff}<\/style><\/defs><title>linkedin-app<\/title>\n<path class=\"cls-1\" d=\"M27.75,0H95.13a27.83,27.83,0,0,1,27.75,27.75V94.57a27.83,27.83,0,0,1-27.75,27.74H27.75A27.83,27.83,0,0,1,0,94.57V27.75A27.83,27.83,0,0,1,27.75,0Z\"><\/path><path class=\"cls-2\" d=\"M49.19,47.41H64.72v8h.22c2.17-3.88,7.45-8,15.34-8,16.39,0,19.42,10.2,19.42,23.47V98.94H83.51V74c0-5.71-.12-13.06-8.42-13.06s-9.72,6.21-9.72,12.65v25.4H49.19V47.41ZM40,31.79a8.42,8.42,0,1,1-8.42-8.42A8.43,8.43,0,0,1,40,31.79ZM23.18,47.41H40V98.94H23.18V47.41Z\"><\/path><\/svg> Share on LinkedIn<\/a><\/span> <span class=\"social-button\"><a class=\"social-action\" href=\"https:\/\/bsky.app\/intent\/compose?text=I%20just%20read%20%22{title}%22%20from%20Checkmarx%20Zero%20{url}\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" shape-rendering=\"geometricPrecision\" text-rendering=\"geometricPrecision\" image-rendering=\"optimizeQuality\" fill-rule=\"evenodd\" clip-rule=\"evenodd\" alt=\"Bluesky Icon\" viewbox=\"0 0 511.999 452.266\"> <path fill=\"#0085FF\" fill-rule=\"nonzero\" d=\"M110.985 30.442c58.695 44.217 121.837 133.856 145.013 181.961 23.176-48.105 86.322-137.744 145.016-181.961 42.361-31.897 110.985-56.584 110.985 21.96 0 15.681-8.962 131.776-14.223 150.628-18.272 65.516-84.873 82.228-144.112 72.116 103.55 17.68 129.889 76.238 73 134.8-108.04 111.223-155.288-27.905-167.385-63.554-3.489-10.262-2.991-10.498-6.561 0-12.098 35.649-59.342 174.777-167.382 63.554-56.89-58.562-30.551-117.12 72.999-134.8-59.239 10.112-125.84-6.6-144.112-72.116C8.962 184.178 0 68.083 0 52.402c0-78.544 68.633-53.857 110.985-21.96z\"><\/path><\/svg> Share on Bluesky<\/a><\/span> <\/p>\n<p class=\"cxzero-social-links\">Follow <a href=\"\/zero\/\">Checkmarx Zero<\/a>: <span class=\"social-link\"><a class=\"social-con\" href=\"https:\/\/www.linkedin.com\/showcase\/checkmarx-zero\"><svg id=\"Layer_1\" data-name=\"Layer 1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" alt=\"Checkmarx Zero on LinkedIn\" viewbox=\"0 0 122.88 122.31\"><defs><style>.cls-1{fill:#0a66c2}.cls-1,.cls-2{fill-rule:evenodd}.cls-2{fill:#fff}<\/style><\/defs><title>linkedin-app<\/title>\n<path class=\"cls-1\" d=\"M27.75,0H95.13a27.83,27.83,0,0,1,27.75,27.75V94.57a27.83,27.83,0,0,1-27.75,27.74H27.75A27.83,27.83,0,0,1,0,94.57V27.75A27.83,27.83,0,0,1,27.75,0Z\"><\/path><path class=\"cls-2\" d=\"M49.19,47.41H64.72v8h.22c2.17-3.88,7.45-8,15.34-8,16.39,0,19.42,10.2,19.42,23.47V98.94H83.51V74c0-5.71-.12-13.06-8.42-13.06s-9.72,6.21-9.72,12.65v25.4H49.19V47.41ZM40,31.79a8.42,8.42,0,1,1-8.42-8.42A8.43,8.43,0,0,1,40,31.79ZM23.18,47.41H40V98.94H23.18V47.41Z\"><\/path><\/svg> <\/a><\/span> <span class=\"social-link\"><a class=\"social-icon\" href=\"https:\/\/bsky.app\/profile\/checkmarxzero.bsky.social\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" shape-rendering=\"geometricPrecision\" text-rendering=\"geometricPrecision\" image-rendering=\"optimizeQuality\" fill-rule=\"evenodd\" clip-rule=\"evenodd\" alt=\"Checkmarx Zero on Bluesky\" viewbox=\"0 0 511.999 452.266\"> <path fill=\"#0085FF\" fill-rule=\"nonzero\" d=\"M110.985 30.442c58.695 44.217 121.837 133.856 145.013 181.961 23.176-48.105 86.322-137.744 145.016-181.961 42.361-31.897 110.985-56.584 110.985 21.96 0 15.681-8.962 131.776-14.223 150.628-18.272 65.516-84.873 82.228-144.112 72.116 103.55 17.68 129.889 76.238 73 134.8-108.04 111.223-155.288-27.905-167.385-63.554-3.489-10.262-2.991-10.498-6.561 0-12.098 35.649-59.342 174.777-167.382 63.554-56.89-58.562-30.551-117.12 72.999-134.8-59.239 10.112-125.84-6.6-144.112-72.116C8.962 184.178 0 68.083 0 52.402c0-78.544 68.633-53.857 110.985-21.96z\"><\/path><\/svg> <\/a><\/span> <span class=\"social-link\"><a class=\"social-con\" href=\"https:\/\/x.com\/CheckmarxZero\"><svg alt=\"Checkmarx Zero on X\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" shape-rendering=\"geometricPrecision\" text-rendering=\"geometricPrecision\" image-rendering=\"optimizeQuality\" fill-rule=\"evenodd\" clip-rule=\"evenodd\" viewbox=\"0 0 512 462.799\"><path fill-rule=\"nonzero\" d=\"M403.229 0h78.506L310.219 196.04 512 462.799H354.002L230.261 301.007 88.669 462.799h-78.56l183.455-209.683L0 0h161.999l111.856 147.88L403.229 0zm-27.556 415.805h43.505L138.363 44.527h-46.68l283.99 371.278z\"><\/path><\/svg> <\/a><\/span> <\/p> <script>function social_action_template(a){const b=encodeURIComponent(window.location.href);const c=document.querySelector(\"h1\");let headContent=(c==null?\"\":c.textContent);let processed=a.replace(\/\\{title\\}\/g,encodeURIComponent(headContent));processed=processed.replace(\/\\{url\\}\/g,b);return processed}var socialAction=document.getElementsByClassName(\"social-action\");console.log(socialAction);for(e=0;e<socialAction.length;e++){element=socialAction.item(e);console.log(element);element.href=social_action_template(element.href)};<\/script> <\/div>","protected":false},"excerpt":{"rendered":"<p>Human-in-the-Loop safeguards can be turned against the users of AI agents. Learn how the concepts of Lies in the Loop and HITL Dialog Forging can be turned against developers using agentic AI code assistants.<\/p>\n","protected":false},"author":121,"featured_media":106101,"template":"","zero-category":[1067,1176,1104],"zero-tag":[1097,1408,1082,1396,1468,1467,1466,1465,1406],"class_list":["post-106095","zero-post","type-zero-post","status-publish","has-post-thumbnail","hentry","zero-category-blog","zero-category-security-blogs","zero-category-technical-blog","zero-tag-ai","zero-tag-ai-agent","zero-tag-ai-security","zero-tag-claude-code","zero-tag-copilot-chat","zero-tag-hitl","zero-tag-human-in-the-loop","zero-tag-lies-in-the-loop","zero-tag-litl"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Turning AI Safeguards Into\u00a0Weapons with HITL Dialog Forging - Checkmarx<\/title>\n<meta name=\"description\" content=\"Human-in-the-Loop safeguards can be turned against the users of AI agents. Learn how the concepts of Lies in the Loop and HITL Dialog Forging can be turned against developers using agentic AI code assistants.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Turning AI Safeguards Into\u00a0Weapons with HITL Dialog Forging - Checkmarx\" \/>\n<meta property=\"og:description\" content=\"Human-in-the-Loop safeguards can be turned against the users of AI agents. Learn how the concepts of Lies in the Loop and HITL Dialog Forging can be turned against developers using agentic AI code assistants.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/\" \/>\n<meta property=\"og:site_name\" content=\"Checkmarx\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Checkmarx.Source.Code.Analysis\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-27T18:38:44+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_feature.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1280\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@checkmarx\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/\",\"url\":\"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/\",\"name\":\"Turning AI Safeguards Into\u00a0Weapons with HITL Dialog Forging - Checkmarx\",\"isPartOf\":{\"@id\":\"https:\/\/checkmarx.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/checkmarx.com\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_feature.webp\",\"datePublished\":\"2025-12-16T13:00:00+00:00\",\"dateModified\":\"2026-02-27T18:38:44+00:00\",\"description\":\"Human-in-the-Loop safeguards can be turned against the users of AI agents. Learn how the concepts of Lies in the Loop and HITL Dialog Forging can be turned against developers using agentic AI code assistants.\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/#primaryimage\",\"url\":\"https:\/\/checkmarx.com\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_feature.webp\",\"contentUrl\":\"https:\/\/checkmarx.com\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_feature.webp\",\"width\":2560,\"height\":1280,\"caption\":\"Street art\u2013style illustration showing a humanoid robot in profile on the left, facing a cluster of floating speech bubbles and a stylized AI panel on the right. The robot\u2019s head and upper torso are rendered with rough, graffiti-like textures in teal, purple, and off-white, suggesting mechanical detail without realism. Between the robot and the AI panel, multiple overlapping speech bubbles imply an exchange of messages. The AI panel contains a simplified brain icon and abstract markings rather than readable text. The background is dark and gritty, fading to near black toward the lower-right corner, emphasizing a tense, mediated dialogue between a humanlike machine and an AI system.\"},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/checkmarx.com\/#website\",\"url\":\"https:\/\/checkmarx.com\/\",\"name\":\"Checkmarx\",\"description\":\"The world runs on code. We secure it.\",\"publisher\":{\"@id\":\"https:\/\/checkmarx.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/checkmarx.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/checkmarx.com\/#organization\",\"name\":\"Checkmarx\",\"url\":\"https:\/\/checkmarx.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/checkmarx.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/checkmarx.com\/wp-content\/uploads\/2024\/02\/logo-dark.svg\",\"contentUrl\":\"https:\/\/checkmarx.com\/wp-content\/uploads\/2024\/02\/logo-dark.svg\",\"width\":1,\"height\":1,\"caption\":\"Checkmarx\"},\"image\":{\"@id\":\"https:\/\/checkmarx.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/Checkmarx.Source.Code.Analysis\",\"https:\/\/x.com\/checkmarx\",\"https:\/\/www.youtube.com\/user\/CheckmarxResearchLab\",\"https:\/\/www.linkedin.com\/company\/checkmarx\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Turning AI Safeguards Into\u00a0Weapons with HITL Dialog Forging - Checkmarx","description":"Human-in-the-Loop safeguards can be turned against the users of AI agents. Learn how the concepts of Lies in the Loop and HITL Dialog Forging can be turned against developers using agentic AI code assistants.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/","og_locale":"en_US","og_type":"article","og_title":"Turning AI Safeguards Into\u00a0Weapons with HITL Dialog Forging - Checkmarx","og_description":"Human-in-the-Loop safeguards can be turned against the users of AI agents. Learn how the concepts of Lies in the Loop and HITL Dialog Forging can be turned against developers using agentic AI code assistants.","og_url":"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/","og_site_name":"Checkmarx","article_publisher":"https:\/\/www.facebook.com\/Checkmarx.Source.Code.Analysis","article_modified_time":"2026-02-27T18:38:44+00:00","og_image":[{"width":2560,"height":1280,"url":"https:\/\/checkmarx.com\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_feature.webp","type":"image\/webp"}],"twitter_card":"summary_large_image","twitter_site":"@checkmarx","twitter_misc":{"Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/","url":"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/","name":"Turning AI Safeguards Into\u00a0Weapons with HITL Dialog Forging - Checkmarx","isPartOf":{"@id":"https:\/\/checkmarx.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/#primaryimage"},"image":{"@id":"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/#primaryimage"},"thumbnailUrl":"https:\/\/checkmarx.com\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_feature.webp","datePublished":"2025-12-16T13:00:00+00:00","dateModified":"2026-02-27T18:38:44+00:00","description":"Human-in-the-Loop safeguards can be turned against the users of AI agents. Learn how the concepts of Lies in the Loop and HITL Dialog Forging can be turned against developers using agentic AI code assistants.","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/checkmarx.com\/zero-post\/turning-ai-safeguards-into-weapons-with-hitl-dialog-forging\/#primaryimage","url":"https:\/\/checkmarx.com\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_feature.webp","contentUrl":"https:\/\/checkmarx.com\/wp-content\/uploads\/2025\/12\/cxzero-blog_hitl-dialog-forging_feature.webp","width":2560,"height":1280,"caption":"Street art\u2013style illustration showing a humanoid robot in profile on the left, facing a cluster of floating speech bubbles and a stylized AI panel on the right. The robot\u2019s head and upper torso are rendered with rough, graffiti-like textures in teal, purple, and off-white, suggesting mechanical detail without realism. Between the robot and the AI panel, multiple overlapping speech bubbles imply an exchange of messages. The AI panel contains a simplified brain icon and abstract markings rather than readable text. The background is dark and gritty, fading to near black toward the lower-right corner, emphasizing a tense, mediated dialogue between a humanlike machine and an AI system."},{"@type":"WebSite","@id":"https:\/\/checkmarx.com\/#website","url":"https:\/\/checkmarx.com\/","name":"Checkmarx","description":"The world runs on code. We secure it.","publisher":{"@id":"https:\/\/checkmarx.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/checkmarx.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/checkmarx.com\/#organization","name":"Checkmarx","url":"https:\/\/checkmarx.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/checkmarx.com\/#\/schema\/logo\/image\/","url":"https:\/\/checkmarx.com\/wp-content\/uploads\/2024\/02\/logo-dark.svg","contentUrl":"https:\/\/checkmarx.com\/wp-content\/uploads\/2024\/02\/logo-dark.svg","width":1,"height":1,"caption":"Checkmarx"},"image":{"@id":"https:\/\/checkmarx.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Checkmarx.Source.Code.Analysis","https:\/\/x.com\/checkmarx","https:\/\/www.youtube.com\/user\/CheckmarxResearchLab","https:\/\/www.linkedin.com\/company\/checkmarx"]}]}},"_links":{"self":[{"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/zero-post\/106095","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/zero-post"}],"about":[{"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/types\/zero-post"}],"author":[{"embeddable":true,"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/users\/121"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/media\/106101"}],"wp:attachment":[{"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/media?parent=106095"}],"wp:term":[{"taxonomy":"zero-category","embeddable":true,"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/zero-category?post=106095"},{"taxonomy":"zero-tag","embeddable":true,"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/zero-tag?post=106095"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}