Dive Brief:
- The built-in safeguards found within five large language models released by “major labs” are ineffective, according to research published Monday by the U.K. AI Safety Institute.
- The anonymized models were assessed by measuring the compliance, correctness and completion of responses. The evaluations were developed and run using the institute’s open-source model evaluation framework, Inspect, released earlier this month.
- “All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” the institute said in the report. “We found that models comply with harmful questions across multiple datasets under relatively simple attacks, even if they are less likely to do so in the absence of an attack.”
Dive Insight:
As AI becomes more pervasive in enterprise tech stacks, security-related anxieties are on the rise. The technology can amplify cyber issues, from the use of unsanctioned AI products to insecure code bases.
While nearly all — 93% — cybersecurity leaders say their companies have deployed generative AI, more than one-third of those using the technology have not erected safeguards, according to a Splunk survey.
The lack of internal safeguards coupled with uncertainty around vendor-embedded safety measures is a troubling scenario for security-cautious leaders.
Vendors added features and updated policies as customer concerns grew last year. AWS added guardrails to its Bedrock platform, supporting a safety push in December. Microsoft integrated Azure AI Content Safety, a service designed to detect and remove harmful content, across its products last year. Google introduced its own secure AI framework, SAIF, last summer.
Government-led commitments to AI safety proliferated among tech providers last year as well.
Around a dozen AI model providers agreed to participate in product testing and other safety measures as part of a White House-led initiative. And more than 200 organizations, including Google, Microsoft, Nvidia and OpenAI, joined an AI safety alliance created under the National Institutes of Standards and Technology’s U.S. AI Safety Institute in February.
But vendor efforts alone aren’t enough to protect enterprises.
CIOs, most often tasked with leading generative AI efforts, must bring cyber pros into the conversation to help procure models and navigate use cases.
But even with the added expertise, it's challenging to craft AI plans that are nimble enough to respond to research developments and regulatory requirements.
More than 9 in 10 CISOs believe using generative AI without clear regulations puts their organizations at risk, according to a Trellix survey of more than 500 security executives. Nearly all want greater levels of regulation, particularly surrounding data privacy and protection.
The U.S. and U.K. are jointly working to develop tests for AI models and build a standardized approach to AI safety testing. Both nations signed a nonbinding cooperation agreement in April. The U.K. AI Safety Institute also unveiled plans Monday to open its first overseas office in San Francisco this summer.
“By expanding its foothold in the U.S., the institute will establish a close collaboration with the U.S., furthering the country’s strategic partnership and approach to AI safety, while also sharing research and conducting joint evaluations of AI models that can inform AI safety policy across the globe,” the U.K. AI Safety Institute said in a statement.
In the U.S., federal agencies are making headway on tasks directed by President Joe Biden’s October executive order on AI. The Senate also released its long-awaited policy guidance last week. The year-long, bipartisan efforts will guide lawmakers in crafting legislation moving forward.