Why Privacy in Agents Needs Human Judgment
Privacy is not the absence of disclosure. It is the regulation of disclosure: deciding what information should flow, to whom, in what situation, and with what level of detail.
The same fact can be appropriate in one context and a violation in another. What matters is not only whether a message contains sensitive content, but whether the disclosure respects the social roles, relationships, and expectations of the people it affects.
Because these judgments depend on social expectations and norms, human judgment does not merely label privacy violations; it helps define them. Existing work often relies on automated proxies for both training and evaluation. PrivacyAlign instead places human judgment at the center of agentic privacy alignment.
In communication tasks, an agent may draw on private information from tools and memory while deciding what to do, but the final action still has to make a normative judgment about what should be shared.
We keep human judgment close to that process. Human annotations are not only final labels for benchmark examples; they also become prompt-specific context for LLM judges used for automated evaluation and annotation-conditioned rewards used to train agents.