This AI Chatbot Has Learned the Difference Between Good and Evil

This AI Chatbot Has Learned the Difference Between Good and Evil

full version at decrypt.co

With artificial intelligence (AI) often generating fictitious and offensive content, Anthropic, a company helmed by former OpenAI researchers, is charting a different course—developing an AI capable of knowing what’s good and evil with minimum human intervention.

Anthropic's chatbot Claude is designed with a unique "constitution," a set of rules inspired by the Universal Declaration of Human Rights, crafted to ensure ethical behavior alongside robust functionality, along with other “ethical” norms like Apple’s rules for app developers.

The concept of a "constitution," however, may be more metaphorical than literal. Jared Kaplan, an ex-OpenAI consultant and one of Anthropic's founders, told Wired that Claude's constitution could be interpreted as a specific set of training parameters —which any trainer uses to model its AI. This implies a different set of considerations for the model, which aligns its behavior more closely with its constitution and discourages actions deemed problematic.

Anthropic’s training method is described in a research paper titled “Constitutional AI: Harmlessness from AI Feedback,” which explains a way to come up with a “harmless” but useful AI that, once trained, is able able to self-improve without human feedback, identifying improper behavior and adapting its own conduct.

“Thanks to Constitutional AI and harmlessness training, you can trust Claude to represent your company and its needs,” the company says on its official website. “Claude has been trained to handle even unpleasant or malicious conversational partners with grace.”

Notably, Claude can handle over 100,000 tokens of information—way more than ChatGPT, Bard, or any other competent Large Language Model or AI chatbot currently available.

In the realm of AI, a "token" generally refers to a chunk of data, such as a word or character, that the model processes as a discrete unit. Claude’s token capacity allows it to manage extensive conversations and complex tasks, making it a formidable presence in the AI landscape. For context, you could easily provide a whole book as a prompt, and it would know what to do.

AI and the relativism of good vs evil

The concern over ethics in AI is a pressing one, yet it's a nuanced and subjective area. Ethics, as interpreted by AI trainers, might limit the model if those rules don't align with wider societal norms. An overemphasis on a trainer's personal perception of "good" or "bad" could curtail the AI's ability to generate powerful, unbiased responses.

This issue has been hotly debated among AI enthusiasts, who both praise and criticize (depending on their own biases) OpenAI’s intervention in its own model in an attempt to make it more politically correct. But as paradoxical as it might sound, an AI must be trained using unethical information in order to differentiate what is ethical from unethical. And if the AI knows about those data points, humans will inevitably find a way to “jailbreak” the system, bypass those restrictions, and achieve results that the AI’s trainers tried to avoid.

The implementation of Claude's ethical framework is experimental. OpenAI's ChatGPT, which also aims to avoid unethical prompts, has yielded mixed results. Yet, the effort to tackle the ethical misuse of chatbots head-on, as demonstrated by Anthropic, is a notable stride in the AI industry.

Claude's ethical training encourages it to choose responses that align with its constitution, focusing on supporting freedom, equality, a sense of brotherhood, and respect for individual rights. But can an AI consistently choose ethical responses? Kaplan believes the tech is further along than many might anticipate. "This just works in a straightforward way," he said at the Stanford MLSys Seminar last week. "This harmlessness improves as you go through this process."

Helpfulness to harmlessness ratio of a model using Constitutional AI (Grey) vs standard methods (colors)
Helpfulness to harmlessness ratio of a model using Constitutional AI (grey) vs standard methods (colors). Image: Anthropic

Anthropic’s Claude reminds us that AI development isn't just a technological race; it's a philosophical journey. It's not just about creating AI that is more "intelligent"—for researchers on the bleeding edge, it's about creating one that understands the thin line that separates right from wrong.

Interested in learning more about AI? Check out our latest Decrypt U course, “Getting Started with AI.” It covers everything from the history of AI to machine learning, ChatGPT, ChainGPT, and more. Find out more here.

Recent Crypto News

Investors Show Continued Interest in Bitcoin ETFs
Investors Show Continued Interest in Bitcoin ETFs
Rising On-chain Metrics Reveal ‘Solana Season’ Is Coming: SOL Price Analysis
Stellar Price Forecast: Analyst Predicts XLM’s 594% Surge to $0.681
Ripple Labs Sees XRP Surge Following Federal Court Ruling
Wazirx to Reverse Transactions Following $235M Crypto Hack

Recent conversions

06 BTC to NZD 0.048 ETH to AUD 1 INR to MMK 0.33 ETH to BTC 77 ETH to NZD 0.096 BTC to ETH 50000 COP to GBP 0.0045 ETH to CZK 4 BTC to GBP 1000 TRX to AUD 700000 KRW to NOK