Skip to main content

Kernel-Level Audit & Privacy: Building Resilient Audit Chains in the AI Coding Era

· 6 min read
Philip Z
Architect

In the era of AI Coding, business code may be co-generated and modified by human developers, AI agents, or automated tools. This brings a new challenge:

While business logic is becoming increasingly easy to generate automatically, the audit chain must not become fragile as a result.

Traditional audit systems often rely on business code to actively record logs. However, in AI Coding scenarios, this approach carries clear risks:

  • AI might forget to write audit logs;
  • AI might accidentally disable logs;
  • AI might generate code that bypasses audits;
  • Business code might unintentionally record sensitive plain text;
  • A custom audit hook might access raw data it shouldn't see;
  • Long strings, JSON payloads, or execution logs might cause audit log bloating or even out-of-memory (OOM) errors.

Therefore, TeaQL underwent a low-level refactoring to move auditing capabilities into the framework kernel rather than leaving them entirely to the business code. We established the following core principles:

Audit must be kernel-level.
Business code may enrich audit trails, but it cannot erase them.
Sensitive fields do not disappear; only their plain text disappears.

1. Kernel-Level Auditing to Prevent Mistakes Automatically

Humans don't have the energy to make every possible mistake; our primary targets for defense are actually AI and automatically generated code. If an AI can write business logic, it is equally capable of mistakenly wiping out the audit chain or unintentionally logging sensitive plain text (such as passwords, tokens, etc.).

To guard against this, the TeaQL auditing system cannot be altered arbitrarily from the outside. In the TeaQL Rust Runtime, every data mutation (Insert/Update/Delete) is automatically triggered by the kernel.

Dual-Channel Design: Isolating Internal Auditing from Custom Hooks

In previous versions, we provided a unified ctx.set_event_sink(), but this introduced conflicts. If a client overrode the sink to implement WebSocket push notifications, the infrastructure-level compliance audit logs would be lost.

To solve this, we strictly isolate internal system auditing from custom external user hooks:

  1. Immutable Internal Auditing (Raw Event Sink): set_event_sink has been demoted to a pub(crate) internal behavior and is controlled solely by environment variables (e.g., TEAQL_AUDIT_ENABLED). AI-generated business code or client-side code can never modify or disable internal compliance auditing.
  2. Secure External Customization (Custom Event Sink): We provide a clear public API: ctx.set_custom_event_sink(). This is the sole hook left open to the outside, allowing users to intercept log messages and perform further business processing (such as updating the UI in our robot kanban demo).

2. The Evolution from RawAuditEvent to SafeAuditEvent

Isolating the sinks is not enough on its own. While internal compliance audits can write directly to databases or log collectors, events exposed to the external custom sink must be safe.

If we were to pass the complete raw event stream to a custom sink, it would mean client code or AI-generated logic could easily access raw sensitive data they shouldn't see, or crash the memory with bloated text.

To completely solve this, we split the event model into two layers:

  1. RawAuditEvent: Contains 100% complete mutation data and raw requests, restricted for internal low-level use only.
  2. SafeAuditEvent: A sanitized, masked, and truncated event model exposed to the outside.

Metadata-Based Automatic Masking and Truncation

The core philosophy of TeaQL is: Model is the Single Source of Truth. By adding specific metadata attributes in the XML model, the code generator automatically generates Rust data structure descriptors (EntityDescriptor) containing safety policies.

In the latest model.xml, developers can define masking and truncation rules directly on the Entity:

<task_execution_log
task="task()"
action="string()"
detail="string()"
_audit_mask_fields="detail"
_audit_value_max_len="2048"
_data_service="meilisearch"
_name="Task Execution Log"
/>

When a RawAuditEvent is passed to the CustomEventSink, the kernel-space UserContext retrieves the security descriptor for the corresponding entity via the MetadataStore and converts the RawAuditEvent into a SafeAuditEvent (a lock-free, highly efficient conversion process).

This conversion process automatically performs two core defenses:

  1. Field-Level Masking: If a field is listed in _audit_mask_fields (such as passwords or sensitive content), TeaQL does not delete the field from the audit record. Instead, it replaces its value with *** MASKED ***. This preserves the principle that "sensitive fields do not disappear; only their plain text disappears," keeping the audit trail intact.

  2. Automatic Long-Text Truncation: If a field contains a multi-megabyte JSON payload or error stack, sending it directly to an external hook could trigger an OOM error. TeaQL automatically truncates long strings in the safe event to a specified length based on _audit_value_max_len, appending ...(truncated).

3. Preventative Design in Modeling: A Fail-Fast Linter for AI

Kernel-level interception at runtime is not enough. If an AI or developer forgets to add the _audit_mask_fields tag to highly sensitive fields like password or ssn during the modeling phase (when writing XML), plain text would still leak at runtime.

To address this, we have natively integrated a KSML Static Analysis and Evaluation Linter into the compile and code generation phase of TeaQL.

When a model is submitted to the framework, the PrivacyAuditEvaluationRule automatically performs lexical evaluations on all fields. Built specifically for Agentic Coding, this mechanism supports "fail-fast and self-healing":

  1. Blocking Errors (Error): When core high-risk privacy fields (like password, token, or ssn) are detected without masking, the engine blocks code generation.
  2. Actionable Fix Examples: The engine no longer throws cold stack traces; instead, it outputs clear snippets showing how to fix the issue. Whether a human is viewing the CLI or an AI is parsing a JSON response, they receive precise instructions:
    {
    "ruleId": "KSML-PRIVACY-001-ERR",
    "title": "High Sensitivity Data Unmasked",
    "message": "The field 'password' in entity 'user' contains highly sensitive keywords. You MUST mask it.\nFix Example: Update your XML entity definition:\n<user ... _audit_mask_fields=\"password\" />"
    }
  3. Soft Warnings & Suggestions: For secondary sensitive fields like user_email or phone, the system does not block generation but packages warnings into .teaql/evaluation_report.json. The IDE plugin or an AI's subsequent task can read this report to display warning lines in the XML editor.

Conclusion

With the rise of AI Coding, defensive design is no longer just about protecting against human typos—it's about keeping automated systems from running out of control in edge cases.

By (1) isolating the immutable compliance sink and (2) introducing the SafeAuditEvent automatically sanitized via model metadata, TeaQL builds an impassable barrier: no matter what business code the AI generates, it can never bypass audit baselines, nor can it accidentally leak sensitive data.