Responsible AI: There need to be guardrails, Microsoft expert says

Should AI redirect those asking troubling questions — such as how to build a bomb or commit suicide

Artificial intelligence — especially with the rise of programs such as ChatGPT — can quickly teach you how to do anything.

Like build a bomb to blow up your school.

Or commit suicide.

Stopping it from doing so is not as easy.

Microsoft Vice Chair and President Brad Smith, a Princeton University alum and world-renowned expert on AI, told the audience during the Q&A portion of his talk at the NJAI Summit on Thursday in Princeton that the issue is one of the biggest challenges to the technology community.

It’s a challenge that becomes greater with each passing day.

“As a philosophical principle, if you want to go far and reasonably fast, you do need real guardrails around this technology,” he told the crowd.

To be sure, there is consensus to do this.

Smith said there is a fairly common consensus around a set of principles: privacy, security, accessibility, fairness, accountability and transparency.

“But, what you realize is, all of these need to be operationalized,” he said. “At Microsoft, we’ve now had seven years to work on this. And you turn these principles into policies, you develop training for engineers and engineering teams, you build governance and compliance systems, you have to constantly work at.”

From a technical perspective, it begins with what Smith said is called a “classifier” — which recognizes questions that would lead to answers like the bomb and suicide questions.

“When you create a classifier for something like that, you have to catch all the permutations of all the ways that it can be requested in different words,” he said. “And then you combine that with what is called a meta prop.

“Even though artificial intelligence is fully capable of telling people how to do that, the meta prop intervenes, and basically says, ‘I’m sorry, I will not do that for you.’”

Smith said AI, in the case of the suicide question, can lead the user to a suicide prevention hotline number instead.

Problem solved. If only it were that easy.

“The biggest barrier to doing it better is the ingenuity of human beings who are trying to do bad things,” Smith said.  “There’s another term, called ‘jailbreaking.’

“Jailbreaking is trying to get around the classifiers, and meta prompts, so that you can get the system to do something that you’re trying to prevent it from doing.”

The challenges around this are being tackled by developers — and governments. Potentially even law enforcement.

Smith acknowledged that this enters into a discussion on privacy controls, but wondered where that line should be.

Should AI step in when it realizes someone who is trying to access financial data does not match the person who previously has been accessing that account, using a function called “Know your customer.”

Or, in a more specific real-world example, should Microsoft be able to prevent someone who was trying to create inappropriate images of Taylor Swift.

“We had an incident at Microsoft back in January, where somebody used one of our designer tools to create completely inappropriate images of Taylor Swift,” he said. “We had somebody who used it in March, to try to create a different image that was not appropriate. And, when we pulled the network log, we saw that the individual had made 593 attempts.”

What to do? Back to the conundrum.

“Unfortunately, at the end of the day, I don’t know that the technology can be so self-healing that we won’t at times need to turn to laws that say what’s permissible and what’s not,” he said.