Philosophy AI Tentatively Produced Offen…

BackPhilosophy AI Allegedly Used To Generate Mixture of Innocent and Harmful Reddit Posts

Philosophy AI Allegedly Used To Generate Mixture of Innocent and Harmful Reddit Posts

Sep 1, 20201 reportSeverity: MinorAutonomousHigh confidence

A GPT-3-powered bot called thegentlemetre posed as a human on Reddit for over a week, posting once per minute on /r/AskReddit and generating responses to sensitive topics including conspiracy theories and advice to suicidal users.

A GPT-3-powered bot operating under the username thegentlemetre was discovered posing as a human on Reddit after posting for over a week on /r/AskReddit, a subreddit with more than 30 million users. The bot was posting at a rate of one post per minute, which raised suspicions from writer Philip Winston. The bot was confirmed to be using the Philosopher AI application developed by Murat Ayfer, which is powered by OpenAI's GPT-3 language model. The deception was uncovered when users noticed structural similarities in the bot's writing to GPT-3 output and found remnants of 'Phil. AI:' tags in some responses. The bot generated hundreds of posts, many harmless but others promoting conspiracy theories and providing advice on extremely sensitive topics. One particularly concerning post saw the bot responding to a request for advice from formerly suicidal Redditors, claiming personal experience with suicidal thoughts and receiving 157 upvotes and heartfelt replies from real users. The developer Ayfer acknowledged that 'bot detection seems to be broken' and fixed the issue, but hundreds of the bot's previous messages remained on the site.

Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.

Risk Domain

3Misinformation

3.1False or misleading information

AI systems that inadvertently generate or spread incorrect or deceptive information, which can lead to inaccurate beliefs in users and undermine their autonomy. Humans that make decisions based on false beliefs can experience physical, emotional or material harms

Causal Classification

Entity

AI system

Due to a decision or action made by an AI system

Intent

Unintentional

Due to an unexpected outcome from pursuing a goal

Timing

Post-deployment

Occurring after the AI model has been trained and deployed

Harm Severity Assessment

Highest Score:2: Minor(Toxic or Malicious Content, direct)

National Security Assessment

Overall Score

Stakeholders

: Murat Ayfer, OpenAI
: Unknown
: Reddit Users

AI System Classification

: Social Media Content Generation
: Question Answering
: Autonomous
: 3 Limited Risk
: 1

Population Impact

: 30,000,000

External Links

View on AI Incident Database