Value Alignment in Ethical Dilemmas Through Reinforcement Learning

Open Access
- Author:
- Melo Cruz, Arthur
- Graduate Program:
- Aerospace Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- November 28, 2023
- Committee Members:
- Alan Richard Wagner, Thesis Advisor/Co-Advisor
Jacob Willem Langelaan, Committee Member
Amy Pritchett, Program Head/Chair - Keywords:
- Robotics
Reinforcement Learning
Ethical Dilemma
Robot Ethics
Machine Ethics
Machine Learning
Artificial Intelligence - Abstract:
- The principle of value alignment suggests that autonomous decision-making should be aligned with the values of humans impacted by the system’s decisions. Following this principle, methods that allow an agent to map moral value systems to decision-making in ethical dilemmas are proposed. The methods are tested on two ethical dilemma scenarios that have no universally agreed upon solutions: a home invasion scenario, and a search and rescue scenario. For the home invasion scenario, the proposed architecture leverages Inverse Reinforcement Learning (IRL) to allow an agent to learn different ethical behaviors from human demonstrations. The system is used to train an agent to generate different policies when facing the threat of a home break-in by a hostile actor. Testing of the trained policies show convergence over three different human-demonstrated policies (hide, call the police, and kill the invader) for decision-making, with the agent first deciding what to do based on contextual input, and subsequently interacting with the environment to execute its decision. For the search and rescue scenario, the proposed architecture aimed to achieve value alignment by codifying a moral value system directly in the reward function through handcrafted reward features and weights, without observation of a human ethical exemplar. The system is used to train an agent to rescue a number of hostages from a threat actor, while following the moral directive of not killing the threat actor. The trained policy demonstrated a strong trade-off between performance in the search and rescue task and value alignment with the moral directive of not killing the threat actor, with the agent only being able to achieve value alignment while reducing the number of rescued hostages. The policy was also tested against variations in environment configurations in order to evaluate generalizability of the solution, with the results indicating that small variations in scenario configurations lead to significant decrease in performance. The results from both experiments demonstrate that the proposed architectures are capable of producing value-aligned ethical behavior under specific conditions and can, perhaps, serve as a stepping stone for future architectures that address decision-making in ethical dilemma scenarios.