Speaker
Description
Anomaly detection in high energy physics (HEP) and many other scientific fields, is challenged by rare signals found in high-dimensional data. Two main strategies have emerged to mitigate the curse of dimensionality: scaling detection methods to handle high dimensions, or reducing the dimensionality before statistical analysis.
This talk focuses on the latter, introducing a supervised contrastive learning framework that builds low-dimensional embeddings informed by known physics labels (e.g., backgrounds). In absence of a priori knowledge on the shape and location of a putative new signal in the constructed space, we perform a signal agnostic Neyman-Pearson two-sample test using the NPLM algorithm to identify statistically significant anomalous regions.
We validate our method on high-dimensional datasets including jet- and event-level LHC data, as well as LIGO and histology datasets, achieving high sensitivity to rare anomalies across all domains. Our results emphasize the value of physics-informed data compression and the critical role of domain knowledge in enhancing AI-driven discovery. Building on this work, we outline a strategy towards integration of domain adaptation and systematic uncertainties, paving the way for broader applicability in real-world discovery scenarios.