Reinforcement Learning from Human Feedback (RLHF) is a new technique for training large language models that have been critical to OpenAI’s ChatGPT and InstructGPT models, DeepMind’s Sparrow, Anthropic’s Claude, and more