Background#

The AI defense team needed to handle two things: real-time bot detection and post-event analysis. Embedding this directly into application code would mean planting an SDK into every service, making it a maintenance nightmare. By using Istio’s ext_authz extension point, requests can be transparently intercepted at the mesh level without touching app code and forwarded to the AI engine. I proposed this approach to the AI team and we built it together.

Division of Responsibility#

OwnerWork
Infrastructure (me)Istio ext_authz integration design, Go gRPC adapter development, EnvoyFilter authoring, Helm deployment configuration
AI teamCritical API selection, verdict spec design, AI Defense API (Python FastAPI) development, behavioral analysis engine

Core idea: Envoy is already proxying every request — just hook ext_authz into it and get an AI verdict there.


Architecture#

Full Request Flow#

[Browser]
    ↓ HTTPS
[Istio IngressGateway (Envoy)]
    ↓ ext_authz gRPC (Critical APIs only)
[Authz Adapter (Go :9001)]
    ↓ POST /ai/evaluate
[AI Defense API (Python :8000)]
    ↓ Redis (session/blacklist)
    ↓ Brain Decision Engine
    ↓
Response: NONE | CHALLENGE | THROTTLE | BLOCK
    ↓
[Authz Adapter]
    ├─ NONE      → 200 (pass)
    ├─ CHALLENGE → 428 (challenge required)
    ├─ THROTTLE  → 200 + delay
    └─ BLOCK     → 403 + Redis blacklist registration
    ↓
[Envoy → Backend API]

Defense Layer Structure#

Layer 1: CloudFront + AWS WAF     (Edge - L7 DDoS, bulk blocking)
Layer 2: Istio EnvoyFilter WAF    (Mesh - SQLi/XSS/CmdInj detection)
Layer 3: Istio Rate Limiting      (Mesh - per-path request limiting)
Layer 4: ext_authz + AI Defense   (Mesh - behavior-based bot detection) ← this post

Authz Adapter (Go)#

Role#

A Go gRPC server that implements Istio’s ext_authz protocol. Acts as a bridge between Envoy and the AI Defense API.

Core Features#

1. Envoy ext_authz gRPC Server#

// Envoy calls this via gRPC for each request
func (h *Handler) Check(ctx context.Context, req *authv3.CheckRequest) (*authv3.CheckResponse, error) {
    // 1. Extract IP, path, X-Bot-Token from request headers
    // 2. Check whitelist (team IPs)
    // 3. Check blacklist (Redis DB 5)
    // 4. Call AI Defense API
    // 5. Respond with allow/block/challenge based on result
}

2. Critical API Filtering#

Sending every request to the AI would be costly and slow, so only 7 core ticketing APIs are evaluated:

/api/queue/entry          ← queue entry
/api/seat/select          ← seat selection
/api/payment/process      ← payment processing
/api/booking/confirm      ← booking confirmation
/api/user/signup          ← user registration
/api/auth/login           ← login
/api/order/create         ← order creation

All other paths bypass ext_authz and pass through directly.

3. IP Blacklist / Whitelist (Redis)#

Redis DB 3: CDN Edge blacklist (managed by Lambda, TTL 30 days)
Redis DB 5: Real-time blacklist (managed by Adapter, TTL 7 days)
  • On a BLOCK verdict, the IP is registered in Redis DB 5 — subsequent requests are blocked immediately without calling the AI
  • 22 team member IPs are whitelisted and always pass through

4. X-Bot-Token Validation#

The frontend (Next.js, Vercel) generates an X-Bot-Token via getBotToken() and includes it in request headers. The Adapter validates this token.

[Frontend (Next.js)]
    → generate X-Bot-Token via getBotToken()
    → request header: X-Bot-Token: <token>
        ↓
[Authz Adapter]
    → validate token
    → missing or invalid → 403

5. Fail-Open Policy#

If the AI Defense API is unresponsive (timeout: 800ms), the request is allowed through. This is a deliberate availability-over-security design choice — the defense system failing at ticket sale time should never block legitimate users.


AI Defense API (Python)#

A FastAPI-based bot detection engine built by the AI team. Below is the integration structure from an infrastructure perspective.

API Endpoints#

EndpointRole
POST /ai/evaluateEvaluate request → behavioral verdict (called by Adapter)
POST /ai/challenge/startIssue VQA challenge token
POST /ai/challenge/verifyVerify challenge response
POST /ai/telemetry/ingestCollect browser telemetry
POST /ai/precheckPre-validation before queue entry

Verdict Actions#

ActionHTTP ResponseDescription
NONE200Normal pass-through
CHALLENGE428VQA challenge (Catch Ball mini-game) required
THROTTLE200 + delayT1: 200ms / T2: 1800ms delay
GATE429Temporary block
BLOCK403Permanent block + IP blacklist

Session State Management#

Redis DB 0: Session state
  - Per-user request history
  - Accumulated behavior patterns
  - Bot confidence score

Redis DB 5: Blacklist
  - IPs with BLOCK verdict
  - TTL 7 days

Istio Integration (EnvoyFilter)#

ext_authz Cluster Definition#

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: ext-authz-cluster
  namespace: istio-system
spec:
  configPatches:
    - applyTo: CLUSTER
      patch:
        operation: ADD
        value:
          name: authz-adapter
          type: STRICT_DNS
          connect_timeout: 0.8s
          lb_policy: ROUND_ROBIN
          typed_extension_protocol_options:
            envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
              "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
              explicit_http_config:
                http2_protocol_options: {}
          load_assignment:
            cluster_name: authz-adapter
            endpoints:
              - lb_endpoints:
                  - endpoint:
                      address:
                        socket_address:
                          address: authz-adapter.staging-ai.svc.cluster.local
                          port_value: 9001

Lua Path Matching + ext_authz Trigger#

- applyTo: HTTP_FILTER
  patch:
    operation: INSERT_BEFORE
    value:
      name: envoy.filters.http.lua
      typed_config:
        inlineCode: |
          function envoy_on_request(request_handle)
            local path = request_handle:headers():get(":path")
            -- Trigger ext_authz only for Critical API paths
            if matches_critical_path(path) then
              request_handle:headers():add("x-need-authz", "true")
            end
          end

Deployment Configuration#

Helm Values#

# values-authz-adapter.yaml
replicaCount: 2
image:
  repository: ghcr.io/goorm-gongbang/authz-adapter
ports:
  grpc: 9001
  metrics: 9090
env:
  AI_DEFENSE_URL: "http://ai-defense.staging-ai.svc.cluster.local:8000/ai/evaluate"
  AI_DEFENSE_TIMEOUT: "800ms"
  REDIS_ADDR: "redis.staging-data.svc.cluster.local:6379"
  OTEL_ENABLED: "true"

Observability#

ItemConfiguration
MetricsPrometheus :9090 (request count, latency, verdict distribution)
TracesOpenTelemetry → Collector → Tempo
LogsStructured JSON → Loki

Summary#

ComponentLanguageRoleOwner
Authz AdapterGoEnvoy ↔ AI bridge, IP management, token validationInfrastructure (me)
AI Defense APIPythonBehavioral analysis, bot verdict, challenge issuanceAI team
EnvoyFilterYAMLext_authz integration, path filteringInfrastructure (me)
Helm ChartsYAMLDeployment config, per-environment valuesInfrastructure (me)

By leveraging Istio’s ext_authz extension point, AI bot detection is applied transparently at the mesh level without modifying any application code. The Fail-Open policy ensures availability, while evaluating only Critical APIs keeps latency to a minimum.


Reference Repos#