Microsoft has released an open access automation framework called PyRIT (short for Python Risk Identification Tool), available on GitHub, to proactively identify risks in generative artificial intelligence (AI) systems.
How Microsoft's PyRIT will work and what role it will play
The tool red teaming it is designed for “enable every organization around the world to innovate responsibly with the latest advances in artificial intelligence“, has declared Ram Shankar Siva Kumar, head of the AI red team at Microsoft.
The company said PyRIT can be used to evaluate the robustness of Extended Language Model (LLM) endpoints against different categories of harm, such as fabrication (e.g., hallucination), abuse (e.g., prejudice), and prohibited content (e.g., harassment).
It can also be used to identify security breaches ranging from malware generation to jailbreaking, as well as privacy harms such as identity theft.
PyRIT has five interfaces: target, dataset, evaluation engine, the ability to support different attack strategies and the incorporation of a memory component that can take the form of JSON or a database to store intermediate input and output interactions.
The evaluation engine also offers two different options for evaluating outputs from the target AI system, allowing red teamers to use a classic machine learning classifier or leverage an LLM endpoint for self-assessment.
“The goal is to allow researchers to have a basis of evaluation of how well their model and the entire inference pipeline are doing against different categories of damage and be able to compare that baseline to future iterations of their model“Microsoft said.
Microsoft later added that “This allows them to have empirical data on how well their model is doing today and identify any performance degradation based on future improvements.”
That said, the big tech company carefully points out that PyRIT is not a replacement for manual red teaming of generative AI systems and complementing a red team's existing domain expertise, but rather more help than the human one who is the main driver.
In other words, the tool is designed to highlight “hot spots” of risks by generating prompts that could be used to evaluate the AI system and identify areas that require further investigation.
Microsoft also acknowledged that red teaming generative AI systems requires simultaneous investigation of security risks and responsible AI and that the exercise is more probabilisticwhile also highlighting the wide differences in the architectures of generative AI systems.
“Manual probing, although time consuming, is often necessary to identify potential blind spots“said Siva Kumar. “Automation is necessary for scalability, but does not replace manual probing.”
The development comes as Protect AI has revealed several critical vulnerabilities in popular platforms supply chain AI like ClearML, Hugging Face, MLflow, and Triton Inference Server which may result in arbitrary code execution and disclosure of sensitive information.
Conclusion
In conclusion, Microsoft's release of PyRIT represents a significant step towards real-time identification of risks in generative artificial intelligence systems, similar to Malwarebytes in real time, so to speak, with the difference that instead of identifying malware it identifies problems with generative artificial intelligence.
While the framework offers tools to evaluate robustness and identify potential threats, it is important to note that PyRIT does not replace the crucial role of manual red teaming. Automation, while essential for scalability, must complement human expertise in risk assessment, ensuring a balanced and responsible approach to innovation in artificial intelligence.
#PyRIT #Microsoft #releases #tools #generative #risks