Join StarRocks Community on Slack

Connect on Slack
TABLE OF CONTENTS
     

    Apache Iceberg gives teams an open, shared table format that any engine can use. But with that flexibility comes a challenge: how do you enforce consistent, auditable access control when every engine needs to touch the same data? Relying on warehouse-style superusers and long-lived credentials doesn't work here; the lakehouse needs a new access control model model.

     

    Why the Traditional Data Warehouse Approach Breaks

    In a traditional data warehouse, security follows an engine-centric path, where a single system manages everything—authentication, authorization, query execution, and storage access:

    • The user logs into the warehouse engine, often authenticated through an external IdP.

    • The engine enforces permissions through its built-in authorization system, or by integrating with external policy managers like Apache Ranger for unified governance. However, queries are still executed under a privileged “super account” granted to the engine, which poses security risks.

    • Behind the scenes, the engine itself holds privileged access to the catalog and underlying storage, abstracting those credentials away from end users.

    This model doesn't translate well to Apache Iceberg, where multiple engines can read and write the same tables, when each engine tries to hold its own high-privilege credentials and enforce its own policies. The result is predictable problems:

    • Privilege sprawl — every engine needs superuser or long-lived storage keys, multiplying the risk of leaks.

    • Inconsistent enforcement — access rules differ across engines, making it impossible to guarantee a single source of truth.

    • Fragmented auditing — user actions are logged per engine, leaving security teams without a complete view of “who did what” on the data.

    This is why data warehouse-style engine-centric security cannot meet the demands of an open, multi-compute lakehouse.

     

    Catalog as the Security/Governance Source of Truth

    The Apache Iceberg lakehouse architecture needs a different access control model; the Iceberg REST catalog, already the source of truth for table metadata, is the natural control plane for security and governance as well.
    Instead of engines holding superuser keys, each engine should pass through the real user's identity when accessing the catalog. The catalog then becomes responsible for:
    • Authorization — enforcing fine-grained policies on databases, tables, and columns.
    • Credential vending — issuing short-lived, least-privilege tokens for access to data.
    • Auditing — recording all activity in one place, regardless of which engine executed the query.
    With that foundation in place, the next step is to see how this model works in practice—what the end-to-end flow looks like, and how engines, catalogs, and storage fit together.

     

    Reference Architecture: StarRocks 4.0 on Apache Iceberg

    In practice, this catalog-centric model translates into a simple but powerful flow: the query engine authenticates the user, the catalog enforces permissions, and storage access happens through temporary credentials rather than long-lived keys.

    In StarRocks 4.0, with the introduction of JWT identity passthrough and support for vended credentials through the Iceberg REST Session Catalog, this catalog-centric pattern becomes possible end-to-end. The flow works like this:

    • Authenticate the user — The user logs into StarRocks, which validates identity through the enterprise IdP.

    • Pass identity — StarRocks forwards the user's JWT to the Iceberg REST Session Catalog, so queries always run under the real user.

    • Enforce authorization — The catalog evaluates fine-grained permissions on schemas, tables, and columns based on that identity.

    • Issue short-lived credentials — If approved, the catalog vends temporary storage tokens instead of relying on long-lived keys.

    • Run the query + audit centrally — StarRocks executes the query using the temporary credentials, while the catalog records the activity against the real user for a consistent audit trail.

    This flow removes superuser accounts from the engine. It shifts control to the catalog, ensuring consistent policies and a single audit trail across all engines operating on the same Iceberg tables.

     

    How to Enable Catalog-Centric Access Control Model with StarRocks 4.0

    Putting this design into practice requires just a few key steps. At a high level:

     

    Set up JWT authentication

    Configure your engine to accept and forward JWT tokens issued by your enterprise IdP. This allows the user's real identity to flow through to the catalog.

     

    Create the Iceberg REST Session Catalog

    Define the catalog with security=jwt and enable vended credentials. This shifts authorization and storage access from the engine to the catalog.

    CREATE EXTERNAL CATALOG iceberg_rest_catalog
    PROPERTIES (
    "iceberg.catalog.type" = "rest",
    "iceberg.catalog.uri" = "https://<rest_server_api_endpoint>",
    "iceberg.catalog.security" = "jwt",
    "iceberg.catalog.warehouse" = "<s3|oss|hdfs_warehouse_path_or_identifier>",
    "iceberg.catalog.vended-credentials-enabled" = "true"
    );

     

    Grant catalog usage in the engine

    Provide minimal permissions at the engine layer (catalog usage only), and delegate fine-grained access control to the catalog backend (native rules or Ranger).

    Open to all users:

    GRANT USAGE ON CATALOG iceberg_rest_catalog TO ROLE public;

    —or—

    Role-scoped:

    CREATE ROLE data_analyst;
    GRANT data_analyst TO EXTERNAL GROUP <your_idp_group>;
    GRANT USAGE ON CATALOG iceberg_rest_catalog TO ROLE data_analyst;

    Delegate access control to the catalog

    We disable access control at the StarRocks level so that all authorization is enforced centrally by the catalog, ensuring consistent policies across engines without duplicating rules.

    ALTER CATALOG iceberg_rest_catalog
    SET PROPERTIES (
    "catalog.access.control" = "allowall"
    );

     

    Catalog-Centric Security in Action

    By shifting identity, authorization, and credential vending into the catalog, the lakehouse finally gets a security model as consistent and auditable as the data format itself. With StarRocks 4.0 and the Iceberg REST Session Catalog, you can put this into practice today and eliminate the risks of superusers and long-lived keys.

    If you'd like to learn more, join the Release Webinar to see a deep dive into StarRocks 4.0 and explore how these capabilities can power your low-latency, high-concurrency workloads. [Register here]

    copy success