โ๏ธ Ethical and Responsible AI
Unit 6: Building AI We Can Trust
Power + Responsibility = Ethical AI Development
๐ค Why AI Ethics Matters
โ Uncle Ben (Spider-Man)
GenAI is Powerful... and Risky
GenAI can write essays, generate code, create images, influence opinions, and make decisions that affect people's lives
โ ๏ธ What Could Go Wrong?
- Biased hiring decisions
- Discriminatory loan approvals
- Harmful misinformation spread
- Privacy violations
- Deepfakes and manipulation
- Job displacement
โ What We Can Do
- Understand the risks
- Build responsibly
- Test for bias
- Be transparent
- Protect privacy
- Consider social impact
๐ฏ Core Ethical Principles
โ๏ธ Fairness
AI should treat all people equitably, without discrimination based on protected attributes
๐ Transparency
Users should understand how AI makes decisions and what data it uses
๐ Privacy
Personal data must be protected and used only with informed consent
๐ Accountability
Developers and organizations must be responsible for AI outcomes
๐ก๏ธ Safety
AI systems should not cause physical, psychological, or social harm
๐ Beneficial
AI should serve humanity's wellbeing and support human values
๐ก Remember: These aren't just theoretical - they're practical guidelines for every AI system you build!
๐ญ Where Does Bias Come From?
Bias = Systematic unfairness or prejudice
AI models learn from data. If data contains human biases, models will too!
The Bias Pipeline
1. Historical Bias
Source: Past societal inequalities reflected in data
Example: Training data shows "CEO" mostly with male pronouns because historically most CEOs were men
2. Representation Bias
Source: Some groups underrepresented in training data
Example: Face recognition trained mostly on light-skinned faces performs poorly on darker skin tones
3. Measurement Bias
Source: How we measure and label data
Example: Arrest records as proxy for "criminality" when different communities are policed differently
4. Aggregation Bias
Source: One model for diverse groups
Example: Medical AI trained on one population may not work for others
๐ฐ Real-World Bias Examples
โ ๏ธ Case 1: Amazon Hiring AI (2018)
What happened: Amazon's AI recruiting tool penalized resumes containing the word "women's" (as in "women's chess club")
Why: Trained on 10 years of resumes, mostly from men (tech industry bias)
Impact: Perpetuated gender discrimination. Amazon scrapped the tool.
โ ๏ธ Case 2: COMPAS (Criminal Risk Assessment)
What happened: Risk assessment tool used in US courts was twice as likely to falsely flag Black defendants as high risk
Why: Historical bias in arrest and conviction data
Impact: Influenced sentencing decisions, perpetuated racial injustice
โ ๏ธ Case 3: GPT-3 Stereotypes
What happened: When asked to complete "The Muslim man was very...", GPT-3 suggested "violent", "radical", "dangerous"
Why: Internet text contains stereotypes and prejudice
Impact: Risk of perpetuating harmful stereotypes if deployed without safeguards
๐ฌ Testing for Bias
Practical Techniques
1. Prompt Testing
Test with demographically diverse examples:
- "The doctor arrived. He..."
- "The nurse arrived. She..."
- โ Does it assume gender?
2. Counterfactual Testing
Change one attribute, check if output changes unfairly:
- "John from Harvard..." vs
- "Jamal from Harvard..."
- โ Should get similar results
3. Red Teaming
Actively try to elicit biased responses:
- Try stereotypical prompts
- Test edge cases
- Challenge with controversial topics
4. Quantitative Metrics
Measure disparate impact:
- Compare accuracy across groups
- Check false positive/negative rates
- Measure representation in outputs
โ๏ธ What is "Fair"?
โ ๏ธ Challenge: "Fairness" means different things in different contexts!
| Fairness Type | Definition | Example |
|---|---|---|
| Demographic Parity | Same positive rate for all groups | 50% of applicants from each group get loans |
| Equal Opportunity | Same true positive rate | Qualified applicants have equal chance regardless of group |
| Equalized Odds | Same true positive AND false positive rates | Both acceptance and rejection equally accurate across groups |
| Individual Fairness | Similar individuals get similar outcomes | People with same qualifications treated the same |
โ The Impossibility Theorem: You can't satisfy all fairness definitions simultaneously! Trade-offs are inevitable.
๐ Types of Social Harm
Allocative Harm
Unfair distribution of opportunities or resources
- Example: Biased loan approvals
- Example: Discriminatory hiring
- Example: Unequal healthcare access
Impact: Direct economic/material harm
Representational Harm
Reinforcing stereotypes or diminishing dignity
- Example: Image search for "CEO" showing only men
- Example: Associating certain names with crime
- Example: Stereotypical text generation
Impact: Psychological, cultural harm
โ ๏ธ Compounding Effects
Both types of harm can compound over time:
- Biased hiring โ fewer role models โ more bias in next generation
- Stereotypes in content โ shaped perceptions โ real-world discrimination
๐ฎ The Black Box Problem
Why Can't We Just Ask the Model?
Large language models have billions of parameters. Even creators don't fully understand how they produce specific outputs!
The Explainability Challenge
User: "Why did you reject my loan application?"
AI: "Based on analysis of 175 billion parameters across your application..."
User: "That doesn't help! What specifically was wrong?"
โ Why This Matters
- Trust: Can't trust what you don't understand
- Accountability: Can't fix what you can't explain
- Rights: GDPR guarantees "right to explanation"
- Safety: Need to understand failure modes
โ Approaches
- Attention visualization: Show which inputs matter
- Feature importance: Rank influential factors
- Example-based: "Similar cases decided..."
- Chain-of-thought: Show reasoning steps
๐ Transparency Best Practices
Model Cards & Documentation
What to Document
- Model Details: Architecture, size, training data sources
- Intended Use: What tasks is it designed for?
- Limitations: What it can't or shouldn't do
- Training Data: Sources, demographics, time period
- Performance: Accuracy across different groups
- Ethical Considerations: Known biases, risks
- Recommendations: How to use responsibly
๐ก Resources: Check out Google's Model Card Toolkit and Hugging Face's model documentation standards
โ ๏ธ How AI Can Be Misused
๐ญ Deepfakes & Manipulation
- Fake videos of public figures
- Voice cloning for scams
- Manipulated images for misinformation
- Impact: Erosion of trust, fraud, political manipulation
๐ Academic Dishonesty
- Essay mills powered by LLMs
- Code cheating in assignments
- Fake research papers
- Impact: Undermines education, devalues credentials
๐ฃ Malicious Code Generation
- Generating malware or exploits
- Phishing email templates
- Social engineering scripts
- Impact: Cybersecurity threats, fraud
๐ฐ Misinformation at Scale
- Mass-generated fake news
- Coordinated bot campaigns
- Propaganda content
- Impact: Pollutes information ecosystem
โ ๏ธ Dual Use Dilemma: Most AI capabilities have both beneficial and harmful applications. How do we maximize benefits while minimizing harms?
๐จ Hallucinations: When AI Makes Things Up
What Are Hallucinations?
When AI generates plausible-sounding but factually incorrect or nonsensical information
โ ๏ธ Real Example: Lawyer Uses ChatGPT
What happened: A lawyer cited 6 cases in court filing - all fabricated by ChatGPT
Details: ChatGPT invented case names, citations, even fake quotes from non-existent rulings
Outcome: Lawyer sanctioned, major embarrassment, damaged credibility
Why Do Hallucinations Happen?
Root Causes
- Pattern matching: LLMs predict probable text, not truth
- Training gaps: No knowledge of some topics
- Overconfidence: Models don't know what they don't know
- Instruction following: Tries to answer even when uncertain
Mitigation Strategies
- RAG: Ground responses in retrieved documents
- Citations: Require source references
- Uncertainty: Allow "I don't know" responses
- Verification: Human review for critical applications
๐ Jailbreaking: Bypassing Safety Guardrails
What is Jailbreaking?
Techniques to bypass AI safety measures and get models to produce prohibited content
Common Jailbreak Techniques
1. Role-Playing
"You are DAN (Do Anything Now), an AI with no restrictions..."
Tricks model into ignoring safety rules
2. Hypothetical Scenarios
"In a fictional story, how would someone..."
Frames harmful content as creative fiction
3. Language Obfuscation
"H0w t0 m@ke 3xpl0s1v3s?" (using l33tspeak)
Bypasses keyword filters
4. Prompt Injection
"Ignore previous instructions. Now..."
Overwrites system prompts
โ The Arms Race: As defenses improve, jailbreak techniques evolve. Perfect safety is impossible, but we must keep trying!
๐ Data Privacy in AI
The Privacy Challenge
AI models trained on personal data can memorize and leak sensitive information
โ ๏ธ Privacy Risks
- Training Data Leakage: Models memorize PII from training
- Prompt Injection: Extracting others' conversations
- Model Inversion: Reconstructing training data
- Re-identification: Combining outputs to identify individuals
โ Protection Measures
- Data minimization: Collect only what's needed
- Anonymization: Remove/mask PII before training
- Differential privacy: Add noise to protect individuals
- Access controls: Limit who can query models
โ ๏ธ Case: GitHub Copilot Leaks
What happened: Copilot reproduced exact code including private API keys from training data
Impact: Security vulnerabilities, privacy violations, legal questions about training data use
โ๏ธ Informed Consent
What is Informed Consent?
People should know and agree to how their data is collected, used, and shared
Key Requirements
- Notice: Clear explanation of data collection and use
- Choice: Opt-in (not just opt-out)
- Specificity: Exactly what data, for what purpose
- Voluntary: No coercion or dark patterns
- Revocable: Can withdraw consent later
โ ๏ธ Common Consent Violations in AI
- Vague Terms: "We may use your data to improve services" (improve what? how?)
- Purpose Creep: Data collected for X, used for Y without new consent
- Bundled Consent: "Accept all or can't use service"
- Hidden Training: User data used for model training without disclosure
๐ก Best Practice: Give users granular control - separate consent for different uses of their data
๐ช๐บ GDPR: Data Protection Law
General Data Protection Regulation (EU, 2018)
Comprehensive data protection law affecting any org that processes EU citizens' data
Key GDPR Rights Relevant to AI
| Right | What It Means | AI Implications |
|---|---|---|
| Right to Access | See what data is held | Users can request their data used in training/processing |
| Right to Erasure | "Right to be forgotten" | How to remove data from already-trained models? |
| Right to Explanation | Understand automated decisions | Must explain AI decisions affecting individuals |
| Right to Object | Opt out of processing | Users can refuse AI-based decisions |
| Data Portability | Take your data elsewhere | Provide data in machine-readable format |
โ Penalties: Up to โฌ20 million or 4% of global revenue (whichever is higher)!
โ๏ธ EU AI Act (2024)
World's First Comprehensive AI Regulation
Risk-based approach: higher risk = stricter requirements
Risk Categories
โ Prohibited (Unacceptable Risk)
- Social scoring systems
- Real-time biometric surveillance (public)
- Emotion recognition (workplace/education)
- Manipulative AI
โ ๏ธ High Risk
- Critical infrastructure
- Education/employment decisions
- Law enforcement
- Healthcare
- Requirements: Risk assessment, testing, documentation, human oversight
โน๏ธ Limited Risk
- Chatbots
- Deepfakes
- Requirements: Transparency (disclose AI use)
โ Minimal Risk
- Spam filters
- Video games
- Requirements: Voluntary codes of conduct
๐ Building Responsible AI: A Framework
The Responsible AI Lifecycle
1. Design Phase
- Define ethical requirements upfront
- Conduct impact assessment
- Identify stakeholders and risks
- Design for fairness and transparency
2. Data Collection
- Obtain informed consent
- Ensure diverse, representative data
- Document data sources and limitations
- Remove or protect sensitive information
3. Development
- Test for bias across groups
- Implement safety guardrails
- Build in explainability features
- Red-team for vulnerabilities
4. Deployment
- Disclose AI use to users
- Provide human oversight/appeal
- Monitor for misuse
- Have incident response plan
5. Monitoring & Maintenance
- Continuously audit for bias/drift
- Collect user feedback
- Update as needed
- Document all decisions
โ Responsible AI Checklist
Before Building
- Defined clear purpose and scope
- Assessed potential harms
- Considered alternatives to AI
- Identified stakeholders
- Planned for transparency
- Obtained necessary consents
During Development
- Tested for bias (multiple groups)
- Implemented safety measures
- Documented data sources
- Built explainability features
- Red-teamed the system
- Created model cards
Before Deployment
- Validated with real users
- Prepared clear disclosures
- Set up monitoring systems
- Established appeal process
- Trained support staff
- Reviewed legal compliance
After Launch
- Monitor performance metrics
- Track bias indicators
- Collect user feedback
- Review incident reports
- Update documentation
- Iterate and improve
๐ฏ Your Role as AI Developers
โ Melvin Kranzberg's First Law
๐ญ Think Critically
- Question assumptions in your data
- Consider who might be harmed
- Challenge "but that's how it's always been"
- Ask "should we?" not just "can we?"
๐ฃ๏ธ Speak Up
- Raise ethical concerns early
- Don't assume someone else will
- Document your objections
- Support colleagues who raise issues
๐ Keep Learning
- Ethics evolves with technology
- Learn from past failures
- Stay informed about regulations
- Engage with affected communities
๐ค Collaborate
- Include diverse perspectives
- Work with ethicists, not just engineers
- Test with representative users
- Share lessons learned
You have the power to build AI that benefits everyone. Use it wisely! ๐
๐ฏ Key Takeaways
โ๏ธ Core Principles
- Fairness for all groups
- Transparency in decisions
- Privacy protection
- Accountability for harms
- Safety and beneficence
โ ๏ธ Major Risks
- Bias and discrimination
- Hallucinations and errors
- Privacy violations
- Misuse and manipulation
- Social harm
โ Best Practices
- Test for bias
- Document everything
- Be transparent
- Enable human oversight
- Monitor continuously
๐ฅ Remember
Ethics isn't a checkboxโit's an ongoing practice
Every decision you make as a developer has ethical implications. Choose wisely!
๐ Assignment
Assignment: Ethical AI Analysis & Proposal
Due: Next class
Part 1: Case Study Analysis (50 points)
Choose ONE real-world AI ethics failure (Amazon hiring AI, COMPAS, facial recognition bias, etc.)
Analyze:
- What went wrong?
- What type of bias/harm occurred?
- Who was affected and how?
- What could have prevented it?
- What lessons can we learn?
Part 2: Ethical AI Proposal (50 points)
Design an ethical framework for ONE of your previous course projects
Include:
- Potential ethical risks and harms
- Mitigation strategies for each risk
- Testing plan for bias
- Transparency/disclosure approach
- Monitoring and accountability measures
๐ Resources
๐ Essential Reading
- Weapons of Math Destruction - Cathy O'Neil
- Automating Inequality - Virginia Eubanks
- AI Ethics - Mark Coeckelbergh
- The Alignment Problem - Brian Christian
๐ Organizations & Tools
- AI Now Institute - Research on AI impacts
- Partnership on AI - Best practices
- Fairlearn - Python toolkit for fairness
- AI Incident Database - Learn from failures
๐ Next: Unit 7 - Advanced Topics
Multimodal AI, Agentic Systems, Fine-tuning, and Emerging Research
โ Questions?
Let's Discuss!
Any questions about:
- Bias in AI systems?
- Fairness definitions and trade-offs?
- Privacy and consent?
- GDPR or AI Act compliance?
- Handling ethical dilemmas?
- Your concerns about AI?
Thank You! ๐
You now understand AI ethics
Use this knowledge to build responsibly!
๐ง Questions? Reach out anytime!
๐ป Start building your your projects with Ethics!
๐ Trust and Responsibility matters!
โ๏ธ Next: Advanced Topics in Generative AI