RBAC at Scale: Flat Permissions vs. Multi-Tenant Module Gating

300+

Permission constants

2 designs

Production deployments

11 modules

Permission namespaces

Problem & Constraints

Two production Go backends, same stack, but very different access-control requirements. The first is a single-tenant service running one operational context per deployment, with the main RBAC challenge being row-level data scoping — which organisational unit a user can see, not who they are across tenants. The second is a multi-tenant platform where the same user account can hold different roles across independent tenants, permissions are organised into feature modules, and isolation must be enforced strictly per tenant.

Both share the same structural primitives — User → Role → Permission join graphs, soft-delete reactivation, per-request permission resolution. The constraints diverge enough that each required a different shape. This doc documents both designs, the explicit comparison between them, and what the simpler design revealed when requirements expanded.

Correctness over caching: revoked permissions must not stay live after an admin removes them.
Soft-delete reactivation: revoke-and-regrant must not collide with UNIQUE indexes.
Per-request resolution: tokens carry identity, never permissions — permissions are resolved on every request.
Auditability: every grant/revoke is recoverable from the database, not from logs.

Comparative Overview

At a glance, the two designs differ on five orthogonal axes: tenancy model, role tiering, scope mechanism, module gating, and revocation strategy. The table below summarises where each design lands.

Axis	Design A — Flat	Design B — Module-Gated Multi-Tenant
Tenancy	Single operational context	Many independent tenants per deployment
Role tiers	One tier (flat roles)	Two tiers: system + tenant-scoped
Per-tenant isolation	Not required	TenantMemberRole 3-way join
Data scoping	ScopeContext injected on every request	DataAccess enum (single_tenant / all_tenant)
Module gating	None — flat permission catalog	TenantModule check is the final gate
Permission count	~100 constants, resource.action	300+ constants across 11 modules
Revocation	Immediate session kill via Redis	Per-request DB resolution, no cache
Superadmin escape	Hardcoded 'Super Admin' role name	is_superadmin user flag + system role

Design A — Project Structure

Single-tenant service. Permissions live in one flat catalog; scope is a per-request context, not a per-role attribute.

design-a/

design-a/├── ent/schema/│   ├── role.go                 # Role: name (unique), is_active│   ├── permission.go           # Permission: code, is_active│   ├── user_role.go            # User → Role (soft-delete)│   └── role_permission.go      # Role → Permission (soft-delete)├── internal/│   ├── transport/http/middleware/│   │   └── permission_middleware.go  # RequirePermission, ScopeContext injection│   ├── pkg/│   │   ├── permissions/│   │   │   └── constants.go    # ~100 resource.action constants│   │   └── scope/│   │       └── context.go      # ScopeContext {ScopeID, DeptIDs, DataAccess}│   └── app/│       ├── auth/               # Session service, Redis revocation│       └── role/               # Role assignment + revocation└── migrations/                 # Atlas versioned schema migrations

Design A — Flat Roles with Scope Context Injection

The data model is a textbook RBAC join graph: User → UserRole → Role → RolePermission → Permission. Every join row carries is_active so previously-revoked assignments can be reactivated without violating the UNIQUE (user_id, role_id) and (role_id, permission_id) indexes.

schema.sql

CREATE TABLE roles (    id        UUID PRIMARY KEY,    name      TEXT NOT NULL UNIQUE,    is_active BOOLEAN NOT NULL DEFAULT true);CREATE TABLE user_roles (    user_id   UUID NOT NULL REFERENCES users(id),    role_id   UUID NOT NULL REFERENCES roles(id),    is_active BOOLEAN NOT NULL DEFAULT true,    UNIQUE(user_id, role_id));CREATE TABLE role_permissions (    role_id        UUID NOT NULL REFERENCES roles(id),    permission_id  UUID NOT NULL REFERENCES permissions(id),    is_active      BOOLEAN NOT NULL DEFAULT true,    UNIQUE(role_id, permission_id));

Permission resolution flow

Flow· Design A — Permission Resolution

HTTP request
     │
     ▼
[1] JWT middleware            ── extract user_id, set on ctx
     │
     ▼
[2] RequirePermission("item.create")
     │
     ▼
[3] Load active UserRoles      ── WHERE user_id=? AND is_active
     │       │
     │       └─► If role.name == "Super Admin" → BYPASS (allow)
     │
     ▼
[4] Load active RolePermissions── WHERE role_id IN (...) AND is_active
     │
     ▼
[5] Build O(1) lookup map      ── map[code]struct{}
     │
     ▼
[6] Build ScopeContext         ── {ScopeID, DeptIDs, DataAccess}
     │
     ▼
[7] code ∈ map ?
     ├── yes → enrich ctx, call handler
     └── no  → 403 Forbidden

The non-obvious part of Design A is step 6 — scope context injection. Beyond permission checking, the middleware injects ScopeContext{ScopeID, DeptIDs, DataAccess} into the request context. Handlers use these for row-level filtering without re-querying: SELECT * FROM items WHERE scope_id = $1. The ScopeContext is the row-level multi-tenancy substitute in a system that has no tenant axis.

Permission catalog shape

permissions.go

// ~80 CRUD constants generated by resource × actionconst (    ITEM_VIEW, ITEM_CREATE, ITEM_UPDATE, ITEM_DELETE,    STOCK_VIEW, STOCK_CREATE, STOCK_UPDATE, STOCK_DELETE,    USER_VIEW,  USER_CREATE,  USER_UPDATE,  USER_DELETE,    ROLE_VIEW,  ROLE_CREATE,  ROLE_UPDATE,  ROLE_DELETE,    // ... 20+ resource types × 4 verbs)// ~20 named workflow actions — can't be derived from CRUD verbsconst (    WORKFLOW_APPROVE   // Approve a pending workflow item    WORKFLOW_REJECT    // Reject a workflow item    COMMITTEE_CREATE   // Create a recommendation    STAGE_ACTION       // Act on a multi-step approval stage    ADMIN_OVERRIDE     // Bypass approval chain (admin only)    ITEM_ISSUE         // Issue items against a requisition    DOCUMENT_DOWNLOAD  // Download a generated document    BULK_IMPORT        // Bulk import resources)

Revocation strategy

On any RBAC mutation (role assign, permission revoke), the service immediately invalidates the user's active sessions via Redis. A revoked operator must not retain access while their session is still live, so any TTL-based cache was ruled out from the start.

Design B — Project Structure

Multi-tenant platform. Same user account can be a member of many tenants with different roles in each. Permissions are organised by feature module; tenants opt in to modules independently.

design-b/

design-b/├── ent/schema/│   ├── tenant.go                # Tenant│   ├── module.go                # Module: 11 feature namespaces│   ├── tenant_module.go         # Tenant × Module activation (is_enabled)│   ├── role.go                  # Role: tenant_id (nullable), is_system│   ├── permission.go            # Permission: module_id FK, code│   ├── user.go                  # User: data_access, is_superadmin│   ├── user_role.go             # User → system Role (global)│   ├── tenant_member.go        # User × Tenant membership│   ├── tenant_member_role.go    # Tenant × Member × Role (3-way)│   └── role_permission.go       # Role → Permission (soft-delete)├── internal/│   ├── transport/http/middleware/│   │   ├── tenant_context.go    # Resolver: 5-step ladder│   │   └── permission_middleware.go  # checkViaTenantMembership / checkViaSystemRoles│   ├── pkg/│   │   ├── permissions/│   │   │   ├── constants.go     # 300+ constants tagged by ModuleKey│   │   │   └── modules.go       # 11 ModuleKey identifiers│   │   └── tenantaccess/│   │       └── resolver.go      # EffectiveTenantScope, CanAccessTenant│   └── app/│       ├── auth/                # Session, JWT (tenant_id claim)│       └── role/                # System + tenant role assignment└── migrations/                  # Atlas versioned schema migrations

Design B — Two-Tier Roles with Module Gating

A flat single-tier model cannot satisfy multi-tenant isolation. Design B introduces two-tier roles: system roles at the platform level (tenant_id = NULL, is_system = true) and tenant-scoped roles tied to a specific tenant. The same role name can exist independently across tenants because uniqueness is scoped per (tenant_id, name).

schema.sql

-- Two-tier role: system (tenant_id NULL) or tenant-scoped (tenant_id N)CREATE TABLE roles (    id        INT PRIMARY KEY,    name      TEXT NOT NULL,    tenant_id INT REFERENCES tenants(id),         -- NULL for system roles    is_system BOOLEAN NOT NULL DEFAULT false,    is_active BOOLEAN NOT NULL DEFAULT true,    CHECK (NOT is_system OR tenant_id IS NULL),    -- system → no tenant    UNIQUE NULLS NOT DISTINCT (tenant_id, name)    -- per-tenant role names);-- Per-tenant membershipCREATE TABLE tenant_members (    id        INT PRIMARY KEY,    user_id   INT NOT NULL REFERENCES users(id),    tenant_id INT NOT NULL REFERENCES tenants(id),    is_active BOOLEAN NOT NULL DEFAULT true,    UNIQUE(user_id, tenant_id));-- 3-way join: this is the per-tenant role isolation mechanismCREATE TABLE tenant_member_roles (    tenant_id        INT NOT NULL REFERENCES tenants(id),    tenant_member_id INT NOT NULL REFERENCES tenant_members(id),    tenant_role_id   INT NOT NULL REFERENCES roles(id),    is_active        BOOLEAN NOT NULL DEFAULT true,    UNIQUE(tenant_id, tenant_member_id, tenant_role_id));-- Module activation per tenant — the final gate at check timeCREATE TABLE tenant_modules (    tenant_id  INT NOT NULL REFERENCES tenants(id),    module_id  INT NOT NULL REFERENCES modules(id),    is_enabled BOOLEAN NOT NULL DEFAULT false,    is_active  BOOLEAN NOT NULL DEFAULT true,    UNIQUE(tenant_id, module_id));

Permission check flow

Flow· Design B — Permission Check

Request enters checkUserPermission(user, tenant, code)
        │
        ▼
   is_superadmin? ── yes ─► ALLOW (fast bypass)
        │ no
        ▼
   user.data_access ?
        │
        ├── single_tenant
        │       │
        │       ▼
        │   Path A: checkViaTenantMembership
        │       │
        │       ├─[1]─► Active TenantMember (user, tenant)?  no → skip
        │       ├─[2]─► Load active TenantMemberRoles
        │       ├─[3]─► Collect active role IDs (dedup)
        │       ├─[4]─► RolePermission(role IN ..., code=?)
        │       └─[5]─► TenantModule(tenant, perm.module).is_enabled?
        │                 yes → ALLOW
        │                 no  → fall through
        │
        └── all_tenant
                │
                ▼
            Path B: checkViaSystemRoles
                │
                ├─► Active UserRole → system Role
                ├─► RolePermission(role IN ..., code=?)
                └─► TenantModule gate still applies
                      yes → ALLOW
                      no  → DENY

Step 5 — TenantModule gating — is the architectural commitment that makes module subscriptions tractable. A permission granted via a role is denied if the tenant has not enabled the owning module. Activating a module on a tenant therefore unlocks all of its associated permissions automatically; no role reassignment, no migration. Conversely, deactivating a module instantly closes the door on all of its permissions, regardless of who holds them in their role.

DataAccess value	Check path	Tenant membership required	Use case
single_tenant	Path A (membership) → Path B fallback	Yes (for Path A)	Regular tenant users
all_tenant	Path B (system roles only)	No	Platform admins, integrations

Permission catalog shape

permissions.go

// 11 feature modules — each permission belongs to exactly oneconst (    ModuleCore       = "core"    ModuleBilling    = "billing"    ModuleReporting  = "reporting"    ModuleAssessment = "assessment"    ModuleAttendance = "attendance"    // + 6 more)type Definition struct {    Code      string    Name      string    ModuleKey string  // ← gates the permission at check time}func Definitions() []Definition {    return []Definition{        {Code: RECORD_VIEW,     ModuleKey: ModuleCore},        {Code: BILLING_VIEW,    ModuleKey: ModuleBilling},        {Code: ATTENDANCE_MARK, ModuleKey: ModuleAttendance},        // ... 300+ entries total    }}

Tenant Context Resolution

Before any permission check runs in Design B, the middleware must decide which tenant governs the request. The Resolver implements a 5-step ladder, centralised so no handler re-implements tenant selection.

Step	Condition	Outcome
1	Request specifies tenant + user may access it	Use requested tenant
2	Request specifies tenant + user may NOT access	Return Denied=true (403)
3	No request tenant + JWT carries tenant_id	Use JWT tenant
4	No JWT tenant + user is single_tenant	Use MIN active membership
5	None of the above	Resolved=false (caller handles)

tenant_context_resolver.go

func (r *Resolver) Resolve(    ctx context.Context, usr *ent.User,    requestedTenantID, jwtTenantID NullableTenantID,) (EffectiveTenantScope, error) {    if requestedTenantID.IsSet {        ok, _ := r.CanAccessTenant(ctx, usr, requestedTenantID.Value)        if !ok {            return EffectiveTenantScope{Denied: true}, nil        }        return EffectiveTenantScope{TenantID: requestedTenantID.Value, Resolved: true}, nil    }    if jwtTenantID.IsSet {        return EffectiveTenantScope{TenantID: jwtTenantID.Value, Resolved: true}, nil    }    if usr.DataAccess == DataAccessSingleTenant {        minID, _ := r.minActiveMembership(ctx, usr.ID)        return EffectiveTenantScope{TenantID: minID, Resolved: true}, nil    }    return EffectiveTenantScope{}, nil}

Access enforcement is DataAccess-aware: all_tenant users may access any active tenant; single_tenant users may only access tenants where they have an active TenantMember row. This keeps the access boundary consistent between the tenant-selection step and the permission-resolution step — a single source of truth for 'can this user see this tenant at all?'.

Concurrent Role-Assignment Validation

Assigning multiple roles in one request requires validating that the user and every target role exist and are active. These are independent DB lookups, so they run in parallel — the worst-case latency is the slowest individual query, not the sum.

assign_roles.go

func (s *rbacService) AssignRoles(ctx context.Context, userID uuid.UUID, roleIDs []uuid.UUID) error {    var (        wg   sync.WaitGroup        mu   sync.Mutex        errs []string    )    wg.Add(1 + len(roleIDs))    go func() {        defer wg.Done()        if _, err := s.domain.GetActiveUser(ctx, userID); err != nil {            mu.Lock(); errs = append(errs, err.Error()); mu.Unlock()        }    }()    for _, id := range roleIDs {        go func(id uuid.UUID) {            defer wg.Done()            if _, err := s.domain.GetActiveRole(ctx, id); err != nil {                mu.Lock(); errs = append(errs, err.Error()); mu.Unlock()            }        }(id)    }    wg.Wait()    if len(errs) > 0 {        return fmt.Errorf("validation failed: %s", strings.Join(errs, "; "))    }    return s.revokeUserAccess(ctx, userID, "rbac_role_change")}

Key Design Decisions

Soft-delete join rows (is_active=false) in both designs

Why: Revoke-and-regrant would hit the UNIQUE (role_id, permission_id) and (user_id, role_id) indexes if implemented as hard delete + insert. Soft-delete + reactivate avoids the violation and preserves the full assignment history for audit.

Alternative: Hard delete: cleaner table but requires ON CONFLICT handling and destroys audit trail.

Flat single-tier roles for Design A, two-tier for Design B

Why: Design A has one operational context; roles don't need tenant-level scoping. Adding a second tier would only introduce join complexity. Design B has strict multi-tenant isolation where the same user account can serve different roles per tenant — exactly what TenantMemberRole solves.

Alternative: One unified model for both: either over-engineers A or under-engineers B.

TenantModule gating at check time (Design B)

Why: Tenants subscribe to feature modules; a tenant without Billing should not have billing permissions regardless of their roles. Gating at check time means activating a module unlocks its permissions automatically — no role reassignment, no migration.

Alternative: Revoke all permissions in deactivated modules explicitly: requires migration on every module change and is error-prone.

Per-request DB resolution, no permission cache

Why: Both designs resolve permissions on every request. Combined with Redis session revocation (Design A) and DataAccess-aware tenant resolution (Design B), revoked permissions take effect within one round-trip. Compliance-critical systems can't tolerate stale ALLOW decisions.

Alternative: Permission cache with TTL: faster per-request, but revoked permissions stay live until expiry.

DataAccess enum (single_tenant vs all_tenant) on users in Design B

Why: Platform admins managing many tenants shouldn't be enrolled as TenantMember in every tenant. DataAccess=all_tenant bypasses the membership requirement and routes to system roles only — avoiding a proliferation of TenantMember rows while keeping the check path consistent.

Alternative: Enrol all_tenant users in every tenant: uniform path, but requires a data migration every time a new tenant is created.

Superadmin escape — name-based (A) vs flag-based (B)

Why: Both designs need an emergency-access path that survives catalog corruption. Design A uses a hardcoded 'Super Admin' role name. Design B adds an is_superadmin boolean on User for a faster bypass, gated by a separate MANAGE_PRIVILEGE permission to prevent escalation. The flag is more robust because it survives role renames.

Alternative: Superadmin as just another role with all permissions: correct normally, but breaks if a permission catalog migration partially fails.

Tradeoffs Summary

Single-tier vs two-tier roles: flat is simpler and correct when there is one tenancy axis; the second tier is justified the moment 'role per tenant' becomes a real requirement, not a hypothetical.
Per-request DB resolution vs cached permissions: chose correctness — revoked access propagates within one request. A targeted per-user-per-tenant cache with explicit invalidation is the right next step under load.
Module gating at check vs at grant: check-time gating decouples module subscriptions from role assignments. The cost is one extra index lookup per request; the saving is zero migrations on module changes.
Scope context (A) vs tenant context (B): both encode 'what data may this user see', but on different axes. A injects row-level scope from the user's roles; B routes through tenant membership and module enablement.
Soft delete vs hard delete on joins: soft delete preserves audit trail and avoids UNIQUE collisions on regrant. The cost is permanent table growth — acceptable here because join tables stay small relative to the data tables they govern.

Lessons & What I'd Change

The hardcoded 'Super Admin' string in Design A is the most fragile part. A role rename silently breaks emergency access. I'd replace it with an is_super_admin boolean column on the roles table — same bypass semantics, survives renames, and is auditable as data rather than as a string match in code. Design B already handles this better via the is_superadmin user flag.

Design A injects ScopeContext unconditionally on every request, even for endpoints that only need permission checking and don't filter by scope. A lazy pattern — compute scope only on first access — would eliminate the unnecessary DB round-trips on read-only endpoints.

Design B's permission check is 3–4 DB round-trips per request: TenantMember, TenantMemberRole, RolePermission, TenantModule. At low request volume this is fine. At scale, a short-TTL per-user-per-tenant permission cache (key = userID + tenantID + permissionCode) with explicit invalidation on RBAC mutations would collapse the hot path to one Redis lookup. The invalidation set is bounded — only the affected user and tenant need eviction — so cache coherence stays tractable.

Building both designs in sequence was clarifying. Design A confirmed that a flat model is the right default. Design B made it clear which specific requirements justify each added layer: multi-tenancy requires TenantMemberRole, module subscriptions require gating at check time, and per-tenant permission isolation requires DataAccess as an explicit user attribute rather than something inferred at query time. The shape of an access-control system should be derived from its requirements, not transplanted from another codebase.

All Case Studies