Table of Contents
How I Replaced Google Cloud Armor with a Custom $0 WAF (and Saved $3k/mo)

If you host on Google Cloud, the path of least resistance for security is usually Cloud Armor. It’s right there in the console, it integrates with your Load Balancer, and it promises to stop the bad guys.

That was my plan, too. Until I turned it on.

I set up a standard policy using Google’s pre-configured WAF rules. I wanted to be safe, so I enabled the works: xss-v33-stable, sqli-v33-stable, rce-v33-stable, and protocolattack-v33-stable.

I hit save, navigated to my homepage, and… 403 Forbidden.

I wasn’t hacking my own site. I wasn’t sending malicious payloads. I was just visiting the home page. It turns out, Cloud Armor’s standard rules are incredibly aggressive. They don’t just catch attacks; they catch cookies, JSON headers, and sometimes just the wind blowing the wrong way.

To fix this properly (using “tuning mode” and advanced exclusions) or to access dynamic threat intelligence lists (to block known bad IP reputations automatically), Google wants you on Cloud Armor Enterprise.

The price tag? A minimum of $3,000 / month.

I’m building a sophisticated web application, not funding a small nation, LOL. So, I decided to build my own solution. I kept Cloud Armor for what it’s good at (L3/L4 DDoS protection) and moved the heavy lifting (L7 WAF) into my application architecture.

Here is how I built a “Defense in Depth” system using Node.js, Redis, Recaptcha Enterprise, and Cloudflare Turnstile, for $0 in added costs.

The Problem with “Enterprise” Rules

Beyond the false positives (and the price), there were two functional deal-breakers.

1. The “Context” Problem
My application deals with AI and coding. Users frequently ask the AI things like:

“How do I fix this SQL Injection vulnerability?”
“Explain how DROP TABLE works.”

To a standard WAF like Cloud Armor (or even Cloudflare), that request looks exactly like an attack. It sees DROP TABLE in the POST body and kills the connection. I needed a WAF that was context-aware. I needed to know who was sending the request and where it was going before deciding to block it.

2. The Arbitrary Limits
Cloud Armor’s Standard tier has baffling limitations. You are limited to 5 items per rule and 10 IPs per rule. If you want to block a list of 50 bad IPs, you have to create 5 separate rules or manage them via disjointed reference lists. It’s a management nightmare.

Why Not Just Use Open Source WAFs?

Before I committed to writing my own security layer, I did what any sane developer would do: I tried to use existing tools. I spun up Docker containers for Safeline, BunkerWeb, and a few others.

I deleted them almost immediately or in couple of hours after getting up and running.

  • The “Community” Trap: The free versions often felt artificially throttled or feature-limited to push you toward their paid tiers. In a high-performance app, I can’t have my WAF adding latency just because I’m on the “Community Edition.”
  • The Branding Nightmare: I am obsessed with UX. I want every pixel-even the 403 Forbidden page-to match my app’s branding. With off-the-shelf WAFs, customizing error pages usually involves hacking Nginx configs or replacing HTML templates inside Docker volumes. It’s clunky.
  • Context Switching: Managing a separate WAF means maintaining separate config files, separate logs, and a separate dashboard.

By building the WAF directly into my application logic, I control everything in one place. My “Block” page isn’t a static HTML file; it’s a React component that shares the same theme, fonts, and “Sci-Fi” aesthetic as the rest of the app.

The Architecture

I avoided Cloudflare’s WAF (layer 7) for personal reasons, even though I use them for almost everything, so I needed this to run entirely within my existing infrastructure. I ended up with a 3-layer funnel that now handles an alphabet soup of vectors: SQLi, XSS, RCE, CRLF injection, LDAP injection, XPath injection, XXE, SSRF, Path Traversal, Backdoors, Brute Force, and HTTP Floods.

Layer 1: The Atomic IP Blacklist (Redis)

I needed to block roughly 200,000+ known malicious IPs (botnets, spammers), 200k is me being nice or just lazy, idk.

  • The Cloud Armor Way: Limit of 10 IPs per rule.
  • The Enterprise Way: Pay $3k/mo for Managed Threat Intelligence.
  • My Way: Redis Sets.

I set up a scheduler that downloads open-source bad IP lists (like IPsum) daily. But you can’t just iterate over 200k strings in a Node.js request loop; that would tank latency.

Instead, I load them into Redis. Redis SISMEMBER (Set Is Member) lookups are O(1). Whether I have 10 IPs or 10 million, the lookup time is virtually instant.

To handle updates without downtime, I use an Atomic Swap strategy:

  1. Download the new list.
  2. Push IPs to a temporary Redis key (blocked_ips:temp).
  3. Use the Redis RENAME command to atomically swap the temp key with the live key.

The firewall never blinks.

Layer 2: The “VIP Lane” (A Hybrid Approach)

WAFs burn a lot of CPU regex-matching every single request. But 99% of my traffic comes from legitimate, logged-in users. I implemented a “VIP” system that prioritizes user experience above all else. Although IP’s on blacklist get immediate redirect to challenge page (I did not want to use block), Google and Search Crawlers and Good Bots are also setup through GoodBot IP List get a pass straight to the app but are blocked from accessing any API Endpoint.

God knows how much I hate solving CAPTCHAs-especially the ReCaptcha and it’s damn badge, even hCaptcha is a saint in comparison. So I built a hybrid system using both Google and Cloudflare.

The Happy Path: Invisible Scoring

When a user first arrives, we show a brief, stylized “Checking your browser” screen. Behind the scenes, reCAPTCHA Enterprise assesses their “score” (0.0 to 1.0).

The UX: I hid the reCAPTCHA badge. It disrupts the UI, and quite frankly, it’s ugly.

The Pass: If they are human (Score > 0.5), I sign a JWT containing their verification status and IP address.

The Ticket: This JWT is stored as a Secure, HttpOnly cookie with a 40-minute expiry.

Why only 40 minutes? It’s a security trade-off. That is what Cloudflare does, The industry standard 12 hours is too long; if a session gets hijacked, the attacker has all day. 40 minutes is the sweet spot-long enough for a typical session, but short enough to limit exposure. I admit, I love overdoing security.

The Result: My app checks for this cookie first. If you have a valid token, the app loads instantly. No “Checking your browser” screen. No delay. You are in the Fast Lane.

The Sad Path: The Challenge Page

If the background score drops below 0.5, I don’t show them a “Click the Firehydrant” grid. I cannot stress enough how much I hate ReCaptcha, I cannot imagine putting anyone through that torture. I actually tried using it and I changed my mind 5 minutes after I got it up and running and it showed me the “skip” puzzle on “Balanced Mode” if you can believe that.

Instead, I redirect them immediately to a custom Challenge Page.

  • The Engine: I switch providers here. I use Cloudflare Turnstile because it’s faster, friendlier, and usually non-interactive. It’s the mother of modern bot protection (okay, I exaggerate, but I love it).
  • The Design: I hate those generic, white “Checking Browser” pages. I built a custom, Sci-Fi inspired challenge interface. If I have to stop you, I might as well make it look cool.
  • The Logic: Turnstile starts immediately. If you pass, you are redirected back to the app with that same 40-minute VIP cookie. If you fail, the page reloads.

We also use reCAPTCHA Action Tokens to monitor for suspicious behavior during the session, allowing us to seamlessly trigger a re-check if a user starts acting like a bot.
After I was almost done, I could already see bot requests hitting my app.

Logs
1|backend | [Security] Invalid Token for 149.*.180.1: BROWSER_ERROR
1|backend | [Security] Invalid Token for 149.*.180.63: BROWSER_ERROR
1|backend | [Security] Invalid Token for 23.*.145.237: BROWSER_ERROR

Layer 3: The Context-Aware Application WAF

This is where the magic happens. I wrote a custom middleware in Node.js to replace the functionality of those expensive OWASP rules.

Because this code runs in my app, it understands Context.

JavaScript
// Simplified logic
const securityWaf = (req, res, next) => {
    // 1. Sanitize: Decode URL components

    // 2. Context Check: Is this a Chat API endpoint?
    const isChatEndpoint = /^\/api\/(chat|generate)/.test(req.path);

    // 3. Scan for Attacks
    if (!isChatEndpoint) {
        // If it's NOT a chat, be strict. 
        // Block SQLi, XSS, RCE patterns in the body.
        if (hasAttackPattern(req.body, rules.sqli)) {
            return res.status(403).json({ error: 'Malicious payload.' });
        }
    }

    // 4. Parameterization Sanity Check
    next();
};

This solves my “AI Problem.”

  • If a user tries to inject SQL into my Admin Login, the WAF catches it.
  • If a user asks the AI about SQL in the Chat Interface, the WAF knows to relax and let the prompt through.

A Note on SQL Injection

It’s worth noting that a WAF is a safety net, not a solution. Regardless of this middleware, every single database query in my application is parameterized.

Using prepared statements is the only true cure for SQL injection. The WAF just helps keep the logs clean and reduces load on the database by rejecting obvious garbage early.

Layer 4: The “Photocopy” Defense (Anti-Replay & Hijacking)

A common criticism of “VIP Cookie” systems is the Replay Attack.

If a hacker sits in a coffee shop and sniffs a user’s traffic, they could theoretically steal that “VIP” cookie. To the server, the hacker now is the user. It’s like photocopying a backstage pass; the bouncer doesn’t know the difference.

To fix this, I implemented three specific checks that make a stolen cookie useless.

1. IP Binding (The “Strict Mode”)
In my securityWaf.js, I don’t just check if the JWT is valid; I check who is holding it. When the token is minted, I embed the user’s IP address into the payload.

JavaScript
// From server.js
const payload = { verified: true, ip: clientIp, ... };

When the WAF verifies the request, it decodes the token and compares the embedded IP against the request IP.

JavaScript
// From securityWaf.js
const decoded = jwt.verify(req.cookies.salamgpt_waf_auth, process.env.JWT_SECRET);

if (decoded.ip && decoded.ip !== req.ip) {
     throw new Error("IP mismatch");
}

If you steal my cookie and try to use it from your own machine, the WAF sees the IP mismatch and kills the session immediately.

2. The Double-Zip Verification (CSRF)
Even if you bypass the WAF, you can’t mutate data. I implemented the “Double Submit Cookie” pattern using the csrf-csrf library.

The server generates a cryptographic token that is mathematically bound to the user’s Session ID. This token is sent to the client in a header, while the session ID is in a cookie.

If an attacker tries to trick a user into clicking a malicious link (Cross-Site Request Forgery), the browser will send the cookies automatically, but it won’t send the custom header. The server sees the missing header, realizes the context is wrong, and rejects the request.

3. Short-Lived TTL
As mentioned earlier, the VIP pass expires in 40 minutes. In the world of session hijacking, time is the enemy. By forcing a re-verification (which happens transparently in the background via fetchWithAuth in my frontend code), we minimize the window of opportunity for any attacker.

Layer 5: Protecting the Server from Itself (SSRF)

Finally, there is a vector most people forget: Server-Side Request Forgery (SSRF).

My app deals with images and AI. Sometimes, the app needs to fetch an image from a URL provided by a user. A smart attacker could provide a URL like http://localhost:8080/admin or http://169.254.169.254/latest/meta-data/ (AWS/GCP metadata endpoints).

If my server blindly fetches that URL, the attacker could map my internal network or steal cloud credentials.

I wrote a utility called securityUtils.js that acts as a DNS firewall for outgoing requests. Before my server fetches anything, it resolves the DNS and checks the IP address against a list of forbidden ranges.

JavaScript
// From securityUtils.js
const validateSafeUrl = async (inputUrl) => {
    // 1. Resolve DNS
    const ips = await resolve4(hostname);

    // 2. Check against Private Ranges (127.0.0.1, 10.x.x.x, etc.)
    for (const ip of ips) {
        const range = ipaddr.parse(ip).range();
        if (range !== 'unicast') {
            throw new Error(`Blocked: URL resolves to internal network (${ip})`);
        }
    }
    return true; 
}

This ensures that my server can talk to the outside world, but it can never be tricked into talking to itself or the internal network.

Conclusion

WAF providers want you to believe that protection requires a $3,000/month Enterprise contract, complex VPNs, and a team of SecOps engineers.

But by leveraging Redis for speed, JWTs for stateless verification, and Context-Aware Middleware for logic, you can build a system that is arguably more secure than a generic WAF, because it understands the specific nuances of your application and I can improve it overtime to perfectly protect my app.

I didn’t just save money; I built a system that runs faster, blocks smarter, and (most importantly) doesn’t treat me or my devs like hackers when they ask about “SQL Injection.”

Total added monthly cost: $0.
Sleep lost worrying about bots: 0 hours.

The Results

By rolling my own solution, I achieved:

  1. Massive Cost Savings: I saved ~$3,000/month by avoiding Cloud Armor Enterprise.
  2. Better UX: Legitimate users never see a captcha after the first check, and they don’t get 403 errors when asking technical questions.
  3. Real-Time Updates: My IP lists update automatically every 24 hours without me touching a config file.
  4. Security: I still have Cloud Armor at the edge handling volumetric DDoS, but the smart filtering happens where it belongs-close to the logic.

Sometimes, “Enterprise” solutions are just paying for convenience. With a little bit of code, you can build something that fits your specific needs much better, for a fraction of the price and I would not use this for banking app, LOL.


Technical Implementation Details for the Nerds

  • Stack: Node.js (Express), Redis, Google Cloud.
  • Libraries: ip-range-check for CIDR matching, recaptcha-enterprise for scoring.
  • Regex Optimization: To prevent ReDoS (Regular Expression Denial of Service), I limit the input scan size. We only regex-match the first 50kb of a request body. If a payload is larger than that and contains an attack deep inside, the API schema validation catches it later anyway.

Categorized in:

News,