Billing modes: infra-only, demo, live

Wassla runs every tenant in one of three billing modes — infra_only, demo, or live. The mode is checked on the server before any AI work (text reply, voice turn, knowledge search, transcription, embedding) is allocated, and it is the single switch that decides whether your workspace can spend on AI or not. New tenants default to infra_only; the platform operator promotes you to demo for evaluation and to live when you are ready to be billed.

This article explains what each mode does, when it applies, how the demo allowance is enforced, and how to move between modes. If you are looking for plan tiers and pricing, see Wassla billing and plans instead — modes are the underlying gate, plans are the commercial wrapping.

The three modes at a glance

Every AI-spend caller in Wassla — the WhatsApp inbound webhook, the voice agent, the knowledge-search endpoint, the sentiment analyzer — runs through a shared server-side gate before it does any paid work. That gate resolves your tenant's effective mode and returns one of three outcomes.

infra_only — the default lockdown

infra_only is the safe default. AI spend is blocked entirely regardless of credit balance, subscription state, or how many channels you have connected. Inbound messages are still accepted and stored, your team can reply manually from the inbox, but the AI agent will not generate a reply, a voice room will not start a session, and knowledge lookups will not call the language model.

This mode exists because Wassla onboards new workspaces well before any commercial agreement is in place. We want you to be able to wire up your WhatsApp number, invite your team, and try the inbox without us silently burning API spend on your behalf. Every new tenant starts here and stays here until an operator explicitly promotes them.

You will know you are in infra_only if the AI agent never replies to test messages and the Billing page shows the mode badge as Infra only. The block is enforced server-side — there is no way to bypass it from the client.

demo — evaluation with a monthly allowance

demo is for evaluation tenants. The platform grants a fixed allowance of free credits per calendar month (set by the operator in platform_settings.demo_allowance_credits_per_month), and AI spend is allowed up to that cap. Once the month-to-date spend plus in-flight reservations reaches the allowance, further AI calls are rejected with a demo_exhausted reason and the gate behaves like infra_only for the rest of the month. The cap resets on the first day of the next calendar month.

The cap check is atomic. When an AI call is about to start, the gate calls a database function called reserve_demo_spend that takes a per-tenant lock, sums your month-to-date ledger spend plus any in-flight reservations, and either inserts a new reservation row or returns demo_exhausted — all in a single transaction. This closes the race window where two concurrent calls could otherwise both pass the check and overshoot the allowance.

Reservations expire after one minute, and a background cron sweeps expired rows every minute. That means if a caller reserves a worst-case 2000 credits but only actually debits 50, the unused headroom comes back automatically within a few minutes. You do not need to do anything to reclaim it.

live — production billing

live is the production mode. The billing-mode gate is effectively a no-op — every AI call is allowed, and your existing credit balance and subscription plan handle the accounting. If your credit balance reaches zero, AI replies pause and inbound conversations queue for human handling, same as the rules described in How Wassla credits work.

This is the only mode where Wassla actually bills you for AI usage. We do not flip you to live automatically — an operator does it after the commercial agreement is in place, the payment method is on file, and the credit balance or subscription is funded.

How mode resolution works

When the gate runs, it resolves your effective mode by checking two places in order:

Your tenant's override (tenants.billing_mode_override). If this column is set, it wins.
The platform default (platform_settings.billing_mode_default). Falls back here if the override is null.

The platform default is infra_only. So unless an operator has explicitly set an override on your tenant row, you are in infra_only. There is one important cross-check: if your tenant is suspended (tenants.suspended_at is set), the gate short-circuits to tenant_suspended regardless of mode — a suspended tenant cannot burn AI spend even on live.

The gate is also fail-closed. If the database is unreachable or the mode column returns something unexpected, the gate returns spend_check_unavailable and the AI call is dropped. We would rather refuse a call than burn money on a tenant whose billing state we cannot verify.

How to switch modes

Switching modes is operator-controlled. There is no self-service toggle in the workspace UI — only Wassla platform owners with the platform.billing_mode capability can flip a tenant's mode, and the action is audited.

If you want to move from infra_only to demo for an evaluation, or from demo to live to start paying, contact your Wassla account contact or email [email protected]. The request triggers the following flow on our side:

A platform owner opens the Staff portal, finds your tenant, and calls the set_tenant_billing_mode action with the new mode.
The action is written to audit_events with the requester's identity, IP, and timestamp.
For platform-wide default flips (changing the new-tenant default for everyone), a second owner must approve the change within 24 hours via the Staff approvals queue — single-owner platform-wide flips are not allowed.

Tenant-level overrides take effect immediately. You can verify the new mode by sending a test message to your AI agent — if it replies, you are on demo or live; if it stays silent, you are still on infra_only.

What to expect in each mode

A few practical differences worth knowing:

Inbox always works. Regardless of mode, your team can read, reply, and tag conversations from the inbox. Mode only gates AI-generated spend.
Channel webhooks always work. Inbound WhatsApp, Instagram, Facebook, Twilio, and web-widget messages are received and stored in all modes. They just sit waiting for a human reply if AI is gated off.
Credit ledger is always written. Even on demo, every AI call writes a row to the credit transactions ledger with the delta, balance, and reason. That ledger is what the demo allowance check sums against.
The ledger you see is always real. The Billing page reads from the immutable per-transaction ledger, so what you see in demo is exactly what would be billed on live.