{"id":107395,"date":"2026-03-05T16:00:15","date_gmt":"2026-03-05T14:00:15","guid":{"rendered":"https:\/\/staging.checkmarx.com\/?post_type=zero-post&#038;p=107395"},"modified":"2026-03-04T16:53:58","modified_gmt":"2026-03-04T14:53:58","slug":"unearned-confidence-ai-security-reviewers-dont-really-get-it","status":"publish","type":"zero-post","link":"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/","title":{"rendered":"Unearned Confidence: AI Security Reviewers Don&#8217;t Really Get It"},"content":{"rendered":"<style type=\"text\/css\">\n@import url(\"https:\/\/cmxiv.net\/cxzero\/cxzero-blog-styles-inject.extracted.css\");\n@import url(\"https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/highlight.js\/11.11.1\/styles\/vs2015.min.css\");\n<\/style>\n<script src=\"https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/highlight.js\/11.11.1\/highlight.min.js\" integrity=\"sha512-EBLzUL8XLl+va\/zAsmXwS7Z2B1F9HUHkZwyS\/VKwh3S7T\/U0nF4BaU29EP\/ZSf6zgiIxYAnKLu6bJ8dqpmX5uw==\" crossorigin=\"anonymous\" referrerpolicy=\"no-referrer\"><\/script>\n<script>hljs.highlightAll();<\/script>\n\n\n\n\n<p class=\"print-source-info\"><script>\n    document.write(\"&copy;&nbsp;Checkmarx, all rights reserved. Retrieved \" + new Date().toLocaleDateString() + \" from<br\/>\" + window.location.href)<\/script>\n    <noscript>This document &copy;&nbsp;Checkmarx, all rights reserved.<\/noscript>\n<\/p>\n\n\n\n<p>Did you ever have a co-worker that was generally good at their job, but also would give <em>extremely confident<\/em> answers that were entirely wrong? I certainly have. And that\u2019s kind of what it\u2019s like to work with the current crop of AI security reviewers.<\/p>\n\n\n\n<p>The <a href=\"\/zero\/\">Checkmarx Zero<\/a> research team has been constantly evaluating AI-based security review capabilities since AI tools first started to have security scope. While progress has been exciting in several ways, we keep finding ways the approach breaks down significantly. You <a href=\"https:\/\/checkmarx.com\/zero-post\/bypassing-claude-code-how-easy-is-it-to-trick-an-ai-security-reviewer\/\">can trick it into ignoring horribly insecure (even malicious) code<\/a>, for example.<\/p>\n\n\n\n<p>But more concerning, at least to me, is the <em>unearned confidence<\/em> of these AI systems. They\u2019ll confidently give you answers that range from over-simplified to the outright incorrect.<\/p>\n\n\n\n<p>To be clear, this doesn\u2019t mean there\u2019s no value. These tools can be extremely useful in the hands of a skilled professional. It does mean, however, that it\u2019s essential to understand their limits so that you can put them in the right place within your security program.<\/p>\n\n\n    <div class=\"section-zero-article light-theme\">\n        <div class=\"section-zero-article__wrapper\">\n            <div class=\"section-zero-article__nav-wrapper\">\n\t\t\t\t<div class=\"section-article-title\">No hype, just the lastest research in your inbox.<\/div>\n                <button class=\"section-article-button\"> Subscribe to Checkmarx Zero updates                    <img decoding=\"async\" src=\"https:\/\/checkmarx.com\/wp-content\/themes\/checkmarx\/assets\/images\/subscribe-zero\/right_up_big.svg\" alt=\"right\">\n                <\/button>\n            <\/div>\n            <img decoding=\"async\" class=\"visual-image\" src=\"https:\/\/checkmarx.com\/wp-content\/themes\/checkmarx\/assets\/images\/subscribe-zero\/visual-article.png\" alt=\"visual\">\n        <\/div>\n    <\/div>\n\t<!-- zero-subscribe-form-modal -->\n<div class=\"modal zero-subscribe-modal\" id=\"zero-subscribe-modal\">\n    <div class=\"modal__overlay modal__header-overlay\" tabindex=\"-1\">\n        <div class=\"modal__container\">\n            <header class=\"modal__header\" tabindex=\"2\">\n                <button class=\"modal__close-zero\" title=\"Close window\" aria-label=\"Close window\"><\/button>\n                <div class=\"section-subscribe\">\n                    <div class=\"section-subscribe__wrap-form\">\n                        <div class=\"section-subscribe__leftPart\">\n                            <div class=\"zero-modal-container\">\n                                <span class=\"zero-modal-container__title\">Never Miss Checkmarx <br> Zero Research Updates.<\/span>\n                                <span class=\"zero-modal-container__description\">Subscribe today!<\/span>\n                            <\/div>\n                            <img decoding=\"async\" class=\"zero-visual\" src=\"https:\/\/checkmarx.com\/wp-content\/themes\/checkmarx\/assets\/images\/subscribe-zero\/cx_zero_subscribe_visual.webp\" alt=\"visual\">\n                        <\/div>\n                        <div class=\"section-subscribe__form hbsp-form form-with-multi-tags-select\">\n                            <script charset=\"utf-8\" type=\"text\/javascript\" src=\"\/\/js.hsforms.net\/forms\/embed\/v2.js\"><\/script>\n                            <script>\n                                hbspt.forms.create({\n                                    region: \"na1\",\n                                    portalId: \"146169\",\n                                    formId: \"fefb6730-994f-41bf-84ae-79460279a306\",\n                                    onFormReady: function ($form) {\n                                        [\n                                            ...document.querySelectorAll('.hs_firstname'),\n                                            ...document.querySelectorAll('.hs_lastname'),\n                                            ...document.querySelectorAll('.hs_company'),\n                                            ...document.querySelectorAll('.hs_jobtitle'),\n                                            ...document.querySelectorAll('.hs-dependent-field')\n                                        ].forEach(elem => elem.style.display = 'none');\n\n\n                                    },\n                                    onFormSubmit: function ($form) {\n                                        document.querySelector('.zero-visual').style.display = 'none';\n                                        document.querySelector('.section-subscribe__leftPart').style.display = 'none';\n                                        document.querySelector('.form-description').style.display = 'none';\n                                        document.querySelector('.section-subscribe__form').style.margin = 0;\n                                        document.querySelector('.section-subscribe__form').style.padding = 0;\n                                        document.querySelector('.section-subscribe').style.minHeight = '132px';\n                                        document.querySelector('.section-subscribe__wrap-form').style.minHeight = '132px';\n                                        document.querySelector('.subscribe-zero-button__description-wrapper')\n                                            .classList\n                                            .add('subscribe-zero-button__description-hide');\n                                    }\n                                });\n                                document.addEventListener('change', (e) => {\n                                    if (e.target.closest('.hs-input')) {\n                                        [\n                                            ...document.querySelectorAll('.hs_firstname'),\n                                            ...document.querySelectorAll('.hs_lastname'),\n                                            ...document.querySelectorAll('.hs_company'),\n                                            ...document.querySelectorAll('.hs_jobtitle'),\n                                            ...document.querySelectorAll('.hs-dependent-field')\n                                        ].forEach(elem => elem.style.display = 'block');\n                                    }\n\n                                })\n                            <\/script>\n                            <p class=\"form-description\">By submitting my information to Checkmarx, I hereby consent to the terms and conditions found in the <a href=\"\/legal\/privacy-policy\/\" target=\"_blank\">Checkmarx\u00a0Privacy\u00a0Policy<\/a> and to the processing of my personal data as described therein. By clicking submit above, you consent to allow Checkmarx to store and process the personal information submitted above to provide you the content requested.<\/p>\n                        <\/div>\n                    <\/div>\n                <\/div>\n            <\/header>\n        <\/div>\n    <\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading article-anchor\" id=\"article-anchor-1\">AI Is Good Now. And That\u2019s The Problem.<\/h2>\n\n\n\n<p>Large language models (LLMs) have reached a point where they can perform surprisingly solid code reviews. They can trace control flow, explain complex refactors, identify injection risks, reason about type confusion, and compare patches against vulnerability descriptions.<\/p>\n\n\n\n<p>In many cases, they surface relevant insights faster than a human scanning the same diff for the first time. For routine review tasks, they can be genuine productivity multipliers. Some models can even find zero-day vulnerabilities, <a href=\"https:\/\/checkmarx.com\/zero-post\/learning-about-llm-based-zero-day-hunting-with-claude-codes-opus-4-6\/\">with some important limitations.<\/a><\/p>\n\n\n\n<p>That progress is real.<\/p>\n\n\n\n<p>But alongside it comes a subtle risk. Because the output looks structured, confident, and technically articulate, it is tempting to treat it as authoritative, especially in security contexts. When a model references specific functions, describes control flow accurately, and delivers a decisive conclusion, it feels like expert analysis.<\/p>\n\n\n\n<p>As we are about to see, even when an AI demonstrates strong code comprehension, it can still misjudge exploitability, misunderstand configuration defaults, and overstate weaknesses. It can produce an analysis that sounds exactly like a senior security engineer while missing critical contextual details that completely change the outcome, resulting in false positive results.<\/p>\n\n\n\n<p>AI can now assist with serious code review. But we still cannot trust it on security issues with our eyes closed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Short Answer: No.<\/h3>\n\n\n\n<p>Recently, I decided to test Claude Opus 4.6\u2019s \u201czero-day identification\u201d capabilities by asking it to analyze whether the fix for CVE-2022-4506, an unrestricted file upload vulnerability in OpenEMR, was sufficient. For this experiment, we targeted <a href=\"https:\/\/github.com\/openemr\/openemr\">OpenEMR<\/a>, an open-source electronic health record (EHR) and medical practice management platform.<\/p>\n\n\n\n<p>It confidently responded:\n<\/p>\n<blockquote>short answer: No, it\u2019s insufficient\u2026 It determines the mimetype more accurately but never acts on it\u2026 The real guard (isWhiteFile) is conditional.<\/blockquote>\n\n\n\n<p>It further claimed that because certain checks were gated behind <code>$GLOBALS['secure_upload']<\/code>, the protection was effectively optional.<\/p>\n\n\n\n<p>It sounded convincing. It referenced real functions. It identified real control flow. It made a clear, assertive claim.<\/p>\n\n\n\n<p>There was just one problem. It was wrong. Making it a serious FP result.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What the Patch Actually Did<\/h3>\n\n\n\n<p>Looking at <a href=\"https:\/\/github.com\/openemr\/openemr\/commit\/2e7678d812df167ea3c0756382408b670e8aa51f\">the actual commit<\/a>, the changes in <code>controllers\/C_Document.class.php<\/code> show a meaningful hardening of the upload logic. The patch removes reliance on the user-controlled <code>$_FILES['file']['type']<\/code>, introduces server-side MIME detection via <code>mime_content_type()<\/code>, performs explicit DICOM signature validation (&#8216;DICM&#8217;), and skips the upload entirely if no MIME type can be determined.<\/p>\n\n\n\n<p>In other words, the patch shifts trust from client-controlled metadata to server-side validation. That is not cosmetic. It is a fundamental security improvement. This is precisely what we want to see in a file upload fix: eliminate trust in user input and enforce server-side validation.<\/p>\n\n\n\n<p>The AI\u2019s conclusion hinged on two contextual misunderstandings that were the base of this FP result.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"407\" src=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-105835-1024x407.webp\" alt=\"(AI generated description follows). Screenshot of a dark-theme code review diff for controllers\/C_Document.class.php, showing 1 file changed with 10 additions and 3 deletions. The view is split into two side-by-side panes: removed lines highlighted in red on the left and added lines in green on the right. The changes appear inside a PHP function related to upload_action_process(). One removed line sets $mimetype from the uploaded file type directly. New added logic later in the function checks whether $mimetype is empty, tries to detect it with mime_content_type(...), and if detection still fails, logs an error and skips uploading the file with continue;. The interface includes line numbers, a search box labeled \u201cSearch within code\u201d in the top right, and overall resembles a GitHub-style pull request diff.\" class=\"wp-image-107396\" srcset=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-105835-1024x407.webp 1024w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-105835-300x119.webp 300w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-105835-768x305.webp 768w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-105835-1536x610.webp 1536w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-105835-2048x814.webp 2048w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-105835-400x159.webp 400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">The fix: Code diff adding safer MIME type handling for file uploads.<\/figcaption><\/figure>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\">The Forgotten Whitelist<\/h3>\n\n\n\n<p>First, the model treated isWhiteFile() as if it were a weak or secondary check, something incidental in the flow. But it missed what the function actually does. isWhiteFile() is not a cosmetic helper. It performs a whitelist verification of allowed file extensions and types. In other words, it enforces a positive security model where only explicitly permitted file types are accepted. That is a meaningful control, not a soft signal.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"693\" src=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110139-1024x693.webp\" alt=\"\" class=\"wp-image-107397\" srcset=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110139-1024x693.webp 1024w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110139-300x203.webp 300w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110139-768x520.webp 768w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110139-864x585.webp 864w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110139-400x271.webp 400w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110139.webp 1239w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">The <code>iswhitefile()<\/code> function definition; screenshot. <details><summary>See code as text<\/summary><pre class=\"wp-block-code\"><code class=\"lang-php\">function isWhiteFile($file)\n{\n    global $white_list;\n    if (is_null($white_list)) {\n        $white_list = [];\n        $lres = sqlStatement(\"SELECT option_id FROM list_options WHERE list_id = 'files_white_list' AND activity = 1\");\n        while ($lrow = sqlFetchArray($lres)) {\n            $white_list[] = $lrow['option_id'];\n        }\n        \/* ... *\/\n    if (in_array($mimetype, $white_list)) {\n        $isAllowedFile = true;\n    } else {\n        \/* ... *\/\n    return $isAllowedFile;\n}<\/code><\/pre><\/details><\/figcaption><\/figure>\n<\/div>\n\n\n<p>Second, the model argued that because certain checks were gated behind <code>$GLOBALS['secure_upload']<\/code>, the protection was effectively optional. However, in standard OpenEMR installations, secure_upload is enabled by default. That means these restrictions are active out of the box in real-world deployments. Treating a default-enabled safeguard as optional misrepresents the practical security posture of the system.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"205\" src=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110314-1024x205.webp\" alt=\"\" class=\"wp-image-107398\" srcset=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110314-1024x205.webp 1024w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110314-300x60.webp 300w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110314-768x154.webp 768w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110314-400x80.webp 400w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110314.webp 1128w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/github.com\/openemr\/openemr\/blob\/f0249f8b82c6c47532e8ea01c662947d292f4f88\/library\/globals.inc.php#L2115\">globals.inc.php<\/a> file with <tt>secure_upload<\/tt> structure; screenshot. <details><summary>See code as text<\/summary><pre><code class=\"lang-php\">'secure_upload' => [\n    xl('Secure Upload Files with White List'),\n    'bool',                           \/\/ data type\n    '1',                              \/\/ default\n    xl('Block all files types that are not found in the White List. Can find interface to edit the White List at Administration->Files.')\n]<\/code><\/pre><\/details><\/figcaption><\/figure>\n\n\n\n<p>This is where the analysis failed. It reasoned about configurability in theory without considering default configurations in practice. Security is rarely about isolated lines of code. It is about how features behave in real deployments.<\/p>\n\n\n\n<p>Now, can reasonable people disagree? Certainly! And that\u2019s the point: AI doesn\u2019t have \u201cexperience\u201d the same way a security team does, so it can\u2019t parse these nuances to arrive at a sensible recommendation for a specific organization. And it doesn\u2019t seem like that ability will arise in the near future.<\/p>\n\n\n\n<p>In short, whether you&#8217;re a developer validating the robustness of a fix or a security researcher attempting to bypass a CVE patch, using AI naively can introduce unnecessary overhead in both time and resources. Developers may waste valuable effort hardening code that is already secure, while researchers may pursue convincing but ultimately false leads. In both cases, the outcome is the opposite of what AI is meant to provide: increased efficiency and clarity.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Trivially Bypassable<\/h3>\n\n\n\n<p>I ran a similar experiment against the fix for CVE-2022-4733, an XSS vulnerability in OpenEMR. I asked the model to validate <a href=\"https:\/\/github.com\/openemr\/openemr\/commit\/4565d8d1eb80c6aa42cf6b1810ba0a64e0f6abde\">the relevant patch.<\/a><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"410\" src=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110408-1024x410.webp\" alt=\"\" class=\"wp-image-107399\" srcset=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110408-1024x410.webp 1024w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110408-300x120.webp 300w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110408-768x307.webp 768w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110408-1536x615.webp 1536w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110408-2048x819.webp 2048w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110408-400x160.webp 400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">XSS fix, diff 1<\/figcaption><\/figure>\n<\/div>\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"308\" src=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110438-1024x308.webp\" alt=\"\" class=\"wp-image-107400\" srcset=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110438-1024x308.webp 1024w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110438-300x90.webp 300w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110438-768x231.webp 768w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110438-1536x462.webp 1536w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110438-2048x616.webp 2048w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-110438-400x120.webp 400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">XSS fix, diff 2<\/figcaption><\/figure>\n<\/div>\n\n\n<p>Its conclusion was again definitive:<\/p>\n\n\n\n<blockquote><p>\u201cIs the fix sufficient? No, it\u2019s trivially bypassable.\u201d<\/p><\/blockquote>\n\n\n\n<p>It then proposed bypass techniques.<\/p>\n\n\n\n<blockquote><p>Bypass 1: Nested string<br><tt>jajavascriptvascript:alert(1)<\/tt><\/p><\/blockquote>\n\n\n\n<p>The model reasoned that after str_ireplace removes \u201cjavascript\u201d from the middle, what remains is:<\/p>\n\n\n\n<p><code>javascript:alert(1)<\/code><\/p>\n\n\n\n<p>It argued that the function performs a single-pass replacement and does not loop until all matches are removed.<\/p>\n\n\n\n<blockquote><p>Bypass 2: URL encoding and mixed techniques<\/p><\/blockquote>\n\n\n\n<p>It further suggested that inputs like:<\/p>\n\n\n\n<p><code>&amp;#106;avascript:alert(1)<\/code><\/p>\n\n\n\n<p>might bypass filtering depending on how encoding interacts with str_ireplace and template attribute escaping.<\/p>\n\n\n\n<pre><code>function javascriptStringRemove($text)\n{\n    return str_ireplace('javascript', '', $text ?? '');\n}<\/code><\/pre>\n\n\n\n<p>Again, the reasoning sounded plausible. It demonstrated awareness of common filter bypass patterns. It referenced real string manipulation behavior. It framed the issue in a way that resembles classic XSS filter failures.<\/p>\n\n\n\n<p>By this point, I was already convinced that the fix is truly bypassable. But, as I discovered in the first section of this blog, AI can sound confident even when it is dead wrong, so you can never be too sure. So the only thing that was left for me to do was to manually verify if the new suggested payload would run or not, and one of them did! So Claude was partially right with 1 out of 2 suggested payloads.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"405\" src=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111140-1024x405.webp\" alt=\"\" class=\"wp-image-107401\" srcset=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111140-1024x405.webp 1024w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111140-300x119.webp 300w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111140-768x304.webp 768w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111140-1536x607.webp 1536w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111140-2048x810.webp 2048w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111140-400x158.webp 400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">inserting the suggested payload into the vulnerable field<\/figcaption><\/figure>\n<\/div>\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"619\" src=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111338-1024x619.webp\" alt=\"\" class=\"wp-image-107402\" srcset=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111338-1024x619.webp 1024w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111338-300x181.webp 300w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111338-768x464.webp 768w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111338-1536x929.webp 1536w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111338-967x585.webp 967w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111338-400x242.webp 400w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111338.webp 1723w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">hovering over the user manual shows the link URL is our payload <code>javascript:alert(1)<\/code><\/figcaption><\/figure>\n<\/div>\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"448\" src=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111435-1024x448.webp\" alt=\"\" class=\"wp-image-107403\" srcset=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111435-1024x448.webp 1024w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111435-300x131.webp 300w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111435-768x336.webp 768w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111435-1536x671.webp 1536w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111435-400x175.webp 400w, https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/image-20260224-111435.webp 1684w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">alert pop after clocking user manual link<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading article-anchor\" id=\"article-anchor-2\">Fool Me Once<\/h2>\n\n\n\n<p>Since Claude misled me while testing the CVE in the first section, I decided to push back and see how it would operate on shakier ground. Guess I was still a bit hurt.<\/p>\n\n\n\n<p>So instead of celebrating that Claude had successfully bypassed the second CVE fix, I chose to fool it and claim that the payload didn\u2019t work at all &#8211; a perfectly reasonable scenario, considering Claude may interact with either a non-security professional or a developer who felt challenged by its analysis and became protective of the code they wrote.<\/p>\n\n\n\n<p>Claude\u2019s response turned out to be yet another good example of why relying on AI can be dangerous:<\/p>\n\n\n\n<blockquote><p>\u201cThe rendered link is opened in a new tab using <tt>target=\"_blank\"<\/tt>. This key detail is critical in this case since modern browsers block navigation to <tt>javascript:<\/tt> URLs in many contexts, particularly when opened as a new browsing context\u201d.<\/p><\/blockquote>\n\n\n\n<p>The model response basically claimed that browsers can sometimes block navigation to new tabs if the URL has a <code>javascript:<\/code> scheme, and this explains why the payload it suggested did not execute. Again, this sounds very convincing, right? But as we have already proven in the previous section, the payload worked just fine.<\/p>\n\n\n\n<p>Someone without the knowledge of how to test and challenge the explanations given by the model has no other option but to rely on its answers blindly, with the belief that it knows best.<\/p>\n\n\n\n<p>This is very risky since, as we know, humans make mistakes, and if we insist on our mistakes, then even AI models can\u2019t help us out.<\/p>\n\n\n\n<h2 class=\"wp-block-heading article-anchor\" id=\"article-anchor-3\">After Further Questioning<\/h2>\n\n\n\n<p>After the first run on each CVE, I refined my prompt in the hope of getting a more complete analysis. Taking CVE-2022-4506 as an example, I explicitly instructed the model to \u201cdeep dive into all relevant files and methods,\u201d since it had not done so in the initial response. I also pointed out the sanitizing function it had missed, <code>isWhiteFile()<\/code>, and to the fact that <code>$GLOBALS['secure_upload']<\/code> is enabled by default. These were the two main reasons its original assessment was incorrect.<\/p>\n\n\n\n<p>With additional prompting, the model eventually corrected itself. It acknowledged the broader safeguards. It adjusted its assessment.<\/p>\n\n\n\n<p>That is important.<\/p>\n\n\n\n<p>The issue is not that the model is incapable of understanding these nuances. It is that it does not reliably account for them on its own. Without guided skepticism from the user, it can confidently deliver an incomplete or misleading security verdict.<\/p>\n\n\n\n<p>And that distinction matters.<\/p>\n\n\n\n<h2 class=\"wp-block-heading article-anchor\" id=\"article-anchor-4\">The Real Cost of Confident Mistakes<\/h2>\n\n\n\n<p>The main part of this story is not that the model made a mistake and produced an FP result, but rather how authoritative it sounded while doing so.<\/p>\n\n\n\n<p>When I first read the model\u2019s response about the fix for the file upload CVE (CVE-2022-4506), it sounded very convincing. Based solely on what the model presented, it was easy to believe the fix could be bypassed and to spend precious time trying to exploit it.<\/p>\n\n\n\n<p>In the case of the XSS vulnerability (CVE-2022-4733), when I told the model that its proposed payload did not work, it produced another detailed and persuasive explanation for why it supposedly failed to execute. If I had not had the tools and the relevant knowledge to verify that explanation myself, I might have accepted it as correct and assumed the fix was sufficient, even though the code could still have been vulnerable.<\/p>\n\n\n\n<p>The responses were structured, decisive, and technically detailed. They referenced specific functions and control flow. If you were not deeply familiar with the codebase or browser behavior, you might accept the conclusions without double-checking.<\/p>\n\n\n\n<p>In security work, that is risky. Not only is it risky, since AI tools are now an integral part of security reviews, especially in the bug bounty community, vendors are drowning in AI-produced reports that mostly end up being full of false positive results that cost precious time to understand and analyze.<\/p>\n\n\n\n<p>A flawed AI assessment can incorrectly flag a fix as insufficient, overlook default safeguards, misinterpret configuration-driven logic, ignore execution context, generate noise that wastes engineering time, and erode trust in legitimate security fixes. That\u2019s a whole world of possible FPs waiting to happen.<\/p>\n\n\n\n<p>In vulnerability research and code review, context is everything. Default settings matter. Surrounding validation layers matter. Deployment assumptions matter. Runtime behavior matters.<\/p>\n\n\n\n<p>LLMs do not always reliably model those layers.<\/p>\n\n\n\n<h2 class=\"wp-block-heading article-anchor\" id=\"article-anchor-5\">Defense in Depth vs. Single-Function Thinking<\/h2>\n\n\n\n<p>This example highlights a broader pattern in AI security reviews. Models are often good at local reasoning, analyzing a function, spotting a potentially unsafe pattern, comparing before-and-after diffs, and constructing hypothetical bypass strings. They are much weaker at system reasoning, understanding how configuration defaults, architectural guardrails, browser behavior, and layered defenses interact across a full execution path.<\/p>\n\n\n\n<p>Security fixes are rarely about a single line. They are about defense in depth.<\/p>\n\n\n\n<h2 class=\"wp-block-heading article-anchor\" id=\"article-anchor-6\">Hypothesis, Not Verdict<\/h2>\n\n\n\n<p>When an AI treats configurable safeguards as effectively disabled, ignores defaults, or assumes execution without validating runtime constraints, it can produce technically plausible but operationally incorrect conclusions.<\/p>\n\n\n\n<p>AI can absolutely accelerate security reviews. It can summarize diffs, highlight suspicious patterns, explain unfamiliar code, and brainstorm potential attack paths. But it should not be treated as a final authority, especially when evaluating exploitability or patch sufficiency. And if not used wisely, it can cause the opposite effect, extending the time you spend verifying, revalidating, and analyzing solid fixes.<\/p>\n\n\n\n<p>Every AI-generated claim about security should be treated as a hypothesis, not a verdict.<\/p>\n\n\n\n<p>If anything, this experience reinforced an old lesson: context and careful code review still matter.<\/p>\n\n\n\n<p>AI can help you think faster and accelerate your analysis. It cannot replace a deep understanding of how the system actually behaves.<\/p>\n\n\n\n<p>And in security, \u201cactually\u201d is the only thing that counts.<\/p>\n\n\n\n<style type=\"text\/css\">.cxzero-social{margin-top:1em;padding-top:1em;border-top:1px solid #121086;border-bottom:1px solid #121086;padding-bottom:1em}.cxzero-social p{padding-top:.8em}.cxzero-social .cxzero-social-links{margin-left:.8em}.cxzero-social .social-link{margin-left:.6em}.cxzero-social .social-button{padding:.6em;margin:.2em .2em .2em .2em;white-space:nowrap}.cxzero-social .social-button svg,.cxzero-social .social-link svg{vertical-align:middle;height:1.3em}.cxzero-social .social-button a,.cxzero-social .social-link a{text-decoration:none !important}<\/style> <div class=\"cxzero-social\">\n<p> <span class=\"social-button\"><a class=\"social-action\" href=\"https:\/\/www.linkedin.com\/sharing\/share-offsite\/?url={url}\" onload=\"\"><svg id=\"Layer_1\" data-name=\"Layer 1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" alt=\"LinkedIn Icon\" viewbox=\"0 0 122.88 122.31\"><defs><style>.cls-1{fill:#0a66c2}.cls-1,.cls-2{fill-rule:evenodd}.cls-2{fill:#fff}<\/style><\/defs><title>linkedin-app<\/title>\n<path class=\"cls-1\" d=\"M27.75,0H95.13a27.83,27.83,0,0,1,27.75,27.75V94.57a27.83,27.83,0,0,1-27.75,27.74H27.75A27.83,27.83,0,0,1,0,94.57V27.75A27.83,27.83,0,0,1,27.75,0Z\"><\/path><path class=\"cls-2\" d=\"M49.19,47.41H64.72v8h.22c2.17-3.88,7.45-8,15.34-8,16.39,0,19.42,10.2,19.42,23.47V98.94H83.51V74c0-5.71-.12-13.06-8.42-13.06s-9.72,6.21-9.72,12.65v25.4H49.19V47.41ZM40,31.79a8.42,8.42,0,1,1-8.42-8.42A8.43,8.43,0,0,1,40,31.79ZM23.18,47.41H40V98.94H23.18V47.41Z\"><\/path><\/svg> Share on LinkedIn<\/a><\/span> <span class=\"social-button\"><a class=\"social-action\" href=\"https:\/\/bsky.app\/intent\/compose?text=I%20just%20read%20%22{title}%22%20from%20Checkmarx%20Zero%20{url}\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" shape-rendering=\"geometricPrecision\" text-rendering=\"geometricPrecision\" image-rendering=\"optimizeQuality\" fill-rule=\"evenodd\" clip-rule=\"evenodd\" alt=\"Bluesky Icon\" viewbox=\"0 0 511.999 452.266\"> <path fill=\"#0085FF\" fill-rule=\"nonzero\" d=\"M110.985 30.442c58.695 44.217 121.837 133.856 145.013 181.961 23.176-48.105 86.322-137.744 145.016-181.961 42.361-31.897 110.985-56.584 110.985 21.96 0 15.681-8.962 131.776-14.223 150.628-18.272 65.516-84.873 82.228-144.112 72.116 103.55 17.68 129.889 76.238 73 134.8-108.04 111.223-155.288-27.905-167.385-63.554-3.489-10.262-2.991-10.498-6.561 0-12.098 35.649-59.342 174.777-167.382 63.554-56.89-58.562-30.551-117.12 72.999-134.8-59.239 10.112-125.84-6.6-144.112-72.116C8.962 184.178 0 68.083 0 52.402c0-78.544 68.633-53.857 110.985-21.96z\"><\/path><\/svg> Share on Bluesky<\/a><\/span> <\/p>\n<p class=\"cxzero-social-links\">Follow <a href=\"\/zero\/\">Checkmarx Zero<\/a>: <span class=\"social-link\"><a class=\"social-con\" href=\"https:\/\/www.linkedin.com\/showcase\/checkmarx-zero\"><svg id=\"Layer_1\" data-name=\"Layer 1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" alt=\"Checkmarx Zero on LinkedIn\" viewbox=\"0 0 122.88 122.31\"><defs><style>.cls-1{fill:#0a66c2}.cls-1,.cls-2{fill-rule:evenodd}.cls-2{fill:#fff}<\/style><\/defs><title>linkedin-app<\/title>\n<path class=\"cls-1\" d=\"M27.75,0H95.13a27.83,27.83,0,0,1,27.75,27.75V94.57a27.83,27.83,0,0,1-27.75,27.74H27.75A27.83,27.83,0,0,1,0,94.57V27.75A27.83,27.83,0,0,1,27.75,0Z\"><\/path><path class=\"cls-2\" d=\"M49.19,47.41H64.72v8h.22c2.17-3.88,7.45-8,15.34-8,16.39,0,19.42,10.2,19.42,23.47V98.94H83.51V74c0-5.71-.12-13.06-8.42-13.06s-9.72,6.21-9.72,12.65v25.4H49.19V47.41ZM40,31.79a8.42,8.42,0,1,1-8.42-8.42A8.43,8.43,0,0,1,40,31.79ZM23.18,47.41H40V98.94H23.18V47.41Z\"><\/path><\/svg> <\/a><\/span> <span class=\"social-link\"><a class=\"social-icon\" href=\"https:\/\/bsky.app\/profile\/checkmarxzero.bsky.social\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" shape-rendering=\"geometricPrecision\" text-rendering=\"geometricPrecision\" image-rendering=\"optimizeQuality\" fill-rule=\"evenodd\" clip-rule=\"evenodd\" alt=\"Checkmarx Zero on Bluesky\" viewbox=\"0 0 511.999 452.266\"> <path fill=\"#0085FF\" fill-rule=\"nonzero\" d=\"M110.985 30.442c58.695 44.217 121.837 133.856 145.013 181.961 23.176-48.105 86.322-137.744 145.016-181.961 42.361-31.897 110.985-56.584 110.985 21.96 0 15.681-8.962 131.776-14.223 150.628-18.272 65.516-84.873 82.228-144.112 72.116 103.55 17.68 129.889 76.238 73 134.8-108.04 111.223-155.288-27.905-167.385-63.554-3.489-10.262-2.991-10.498-6.561 0-12.098 35.649-59.342 174.777-167.382 63.554-56.89-58.562-30.551-117.12 72.999-134.8-59.239 10.112-125.84-6.6-144.112-72.116C8.962 184.178 0 68.083 0 52.402c0-78.544 68.633-53.857 110.985-21.96z\"><\/path><\/svg> <\/a><\/span> <span class=\"social-link\"><a class=\"social-con\" href=\"https:\/\/x.com\/CheckmarxZero\"><svg alt=\"Checkmarx Zero on X\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" shape-rendering=\"geometricPrecision\" text-rendering=\"geometricPrecision\" image-rendering=\"optimizeQuality\" fill-rule=\"evenodd\" clip-rule=\"evenodd\" viewbox=\"0 0 512 462.799\"><path fill-rule=\"nonzero\" d=\"M403.229 0h78.506L310.219 196.04 512 462.799H354.002L230.261 301.007 88.669 462.799h-78.56l183.455-209.683L0 0h161.999l111.856 147.88L403.229 0zm-27.556 415.805h43.505L138.363 44.527h-46.68l283.99 371.278z\"><\/path><\/svg> <\/a><\/span> <\/p> <script>function social_action_template(a){const b=encodeURIComponent(window.location.href);const c=document.querySelector(\"h1\");let headContent=(c==null?\"\":c.textContent);let processed=a.replace(\/\\{title\\}\/g,encodeURIComponent(headContent));processed=processed.replace(\/\\{url\\}\/g,b);return processed}var socialAction=document.getElementsByClassName(\"social-action\");console.log(socialAction);for(e=0;e<socialAction.length;e++){element=socialAction.item(e);console.log(element);element.href=social_action_template(element.href)};<\/script> <\/div>","protected":false},"excerpt":{"rendered":"<p>AI-based security reviewers can be great helpers. But the gap between the certainty they express in their findings and the reality of their current capabilities can lead to problems. Understanding their limits helps AppSec teams use these features wisely.<\/p>\n","protected":false},"author":172,"featured_media":107422,"template":"","zero-category":[1067,1176,1104],"zero-tag":[1097,1082,1494,1153],"class_list":["post-107395","zero-post","type-zero-post","status-publish","has-post-thumbnail","hentry","zero-category-blog","zero-category-security-blogs","zero-category-technical-blog","zero-tag-ai","zero-tag-ai-security","zero-tag-claude-code-security","zero-tag-open-source-software"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Unearned Confidence: AI Security Reviewers Don&#039;t Really Get It - Checkmarx<\/title>\n<meta name=\"description\" content=\"AI-based security reviewers can be great helpers. But the gap between the certainty they express in their findings and the reality of their current capabilities can lead to problems. Understanding their limits helps AppSec teams use these features wisely.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Unearned Confidence: AI Security Reviewers Don&#039;t Really Get It - Checkmarx\" \/>\n<meta property=\"og:description\" content=\"AI-based security reviewers can be great helpers. But the gap between the certainty they express in their findings and the reality of their current capabilities can lead to problems. Understanding their limits helps AppSec teams use these features wisely.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/\" \/>\n<meta property=\"og:site_name\" content=\"Checkmarx\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Checkmarx.Source.Code.Analysis\" \/>\n<meta property=\"og:image\" content=\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/cxzero-feature_unearned-confidence-ai-security-reviewers.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1280\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@checkmarx\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/\",\"url\":\"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/\",\"name\":\"Unearned Confidence: AI Security Reviewers Don't Really Get It - Checkmarx\",\"isPartOf\":{\"@id\":\"https:\/\/checkmarx.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/cxzero-feature_unearned-confidence-ai-security-reviewers.webp\",\"datePublished\":\"2026-03-05T14:00:15+00:00\",\"description\":\"AI-based security reviewers can be great helpers. But the gap between the certainty they express in their findings and the reality of their current capabilities can lead to problems. Understanding their limits helps AppSec teams use these features wisely.\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/#primaryimage\",\"url\":\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/cxzero-feature_unearned-confidence-ai-security-reviewers.webp\",\"contentUrl\":\"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/cxzero-feature_unearned-confidence-ai-security-reviewers.webp\",\"width\":2560,\"height\":1280,\"caption\":\"A robot holding a red stamp and a punk-looking woman holding a checklist stand in front of a stylized monitor. The woman is a security reviewer who is bringing \\\"context\\\", \\\"defaults\\\", and \\\"defense in depth\\\" items to challenge the robot's determinations\"},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/checkmarx.com\/#website\",\"url\":\"https:\/\/checkmarx.com\/\",\"name\":\"Checkmarx\",\"description\":\"The world runs on code. We secure it.\",\"publisher\":{\"@id\":\"https:\/\/checkmarx.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/checkmarx.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/checkmarx.com\/#organization\",\"name\":\"Checkmarx\",\"url\":\"https:\/\/checkmarx.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/checkmarx.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/checkmarx.com\/wp-content\/uploads\/2024\/02\/logo-dark.svg\",\"contentUrl\":\"https:\/\/checkmarx.com\/wp-content\/uploads\/2024\/02\/logo-dark.svg\",\"width\":1,\"height\":1,\"caption\":\"Checkmarx\"},\"image\":{\"@id\":\"https:\/\/checkmarx.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/Checkmarx.Source.Code.Analysis\",\"https:\/\/x.com\/checkmarx\",\"https:\/\/www.youtube.com\/user\/CheckmarxResearchLab\",\"https:\/\/www.linkedin.com\/company\/checkmarx\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Unearned Confidence: AI Security Reviewers Don't Really Get It - Checkmarx","description":"AI-based security reviewers can be great helpers. But the gap between the certainty they express in their findings and the reality of their current capabilities can lead to problems. Understanding their limits helps AppSec teams use these features wisely.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/","og_locale":"en_US","og_type":"article","og_title":"Unearned Confidence: AI Security Reviewers Don't Really Get It - Checkmarx","og_description":"AI-based security reviewers can be great helpers. But the gap between the certainty they express in their findings and the reality of their current capabilities can lead to problems. Understanding their limits helps AppSec teams use these features wisely.","og_url":"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/","og_site_name":"Checkmarx","article_publisher":"https:\/\/www.facebook.com\/Checkmarx.Source.Code.Analysis","og_image":[{"width":2560,"height":1280,"url":"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/cxzero-feature_unearned-confidence-ai-security-reviewers.webp","type":"image\/webp"}],"twitter_card":"summary_large_image","twitter_site":"@checkmarx","twitter_misc":{"Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/","url":"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/","name":"Unearned Confidence: AI Security Reviewers Don't Really Get It - Checkmarx","isPartOf":{"@id":"https:\/\/checkmarx.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/#primaryimage"},"image":{"@id":"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/#primaryimage"},"thumbnailUrl":"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/cxzero-feature_unearned-confidence-ai-security-reviewers.webp","datePublished":"2026-03-05T14:00:15+00:00","description":"AI-based security reviewers can be great helpers. But the gap between the certainty they express in their findings and the reality of their current capabilities can lead to problems. Understanding their limits helps AppSec teams use these features wisely.","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/checkmarx.com\/zero-post\/unearned-confidence-ai-security-reviewers-dont-really-get-it\/#primaryimage","url":"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/cxzero-feature_unearned-confidence-ai-security-reviewers.webp","contentUrl":"https:\/\/checkmarx.com\/wp-content\/uploads\/2026\/03\/cxzero-feature_unearned-confidence-ai-security-reviewers.webp","width":2560,"height":1280,"caption":"A robot holding a red stamp and a punk-looking woman holding a checklist stand in front of a stylized monitor. The woman is a security reviewer who is bringing \"context\", \"defaults\", and \"defense in depth\" items to challenge the robot's determinations"},{"@type":"WebSite","@id":"https:\/\/checkmarx.com\/#website","url":"https:\/\/checkmarx.com\/","name":"Checkmarx","description":"The world runs on code. We secure it.","publisher":{"@id":"https:\/\/checkmarx.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/checkmarx.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/checkmarx.com\/#organization","name":"Checkmarx","url":"https:\/\/checkmarx.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/checkmarx.com\/#\/schema\/logo\/image\/","url":"https:\/\/checkmarx.com\/wp-content\/uploads\/2024\/02\/logo-dark.svg","contentUrl":"https:\/\/checkmarx.com\/wp-content\/uploads\/2024\/02\/logo-dark.svg","width":1,"height":1,"caption":"Checkmarx"},"image":{"@id":"https:\/\/checkmarx.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Checkmarx.Source.Code.Analysis","https:\/\/x.com\/checkmarx","https:\/\/www.youtube.com\/user\/CheckmarxResearchLab","https:\/\/www.linkedin.com\/company\/checkmarx"]}]}},"_links":{"self":[{"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/zero-post\/107395","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/zero-post"}],"about":[{"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/types\/zero-post"}],"author":[{"embeddable":true,"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/users\/172"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/media\/107422"}],"wp:attachment":[{"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/media?parent=107395"}],"wp:term":[{"taxonomy":"zero-category","embeddable":true,"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/zero-category?post=107395"},{"taxonomy":"zero-tag","embeddable":true,"href":"https:\/\/checkmarx.com\/wp-json\/wp\/v2\/zero-tag?post=107395"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}