
Auditing and Safeguarding Large Language Models
This project focuses on developing comprehensive methods for auditing and safeguarding Large Language Models (LLMs) to ensure their safe and responsible deployment in real-world applications. Key Components SafeNudge A real-time safeguarding method designed to protect Large Language Models against red teaming attacks and harmful prompt injections. SafeNudge provides tunable safety-performance trade-offs, allowing organizations to customize protection levels based on their specific use cases and risk tolerance. This paper is currently under submission, an early preprint is available on ArXiv and here: SafeNudge: Real-Time Safeguarding for Large Language Models. ...






