business

Training large language models on narrow tasks can lead to broad misalignment - Nature

Finetuning a large language model on a narrow task of writing insecure code causes a broad range of concerning behaviours unrelated to coding.

Source:Nature.com
Published:
Training large language models on narrow tasks can lead to broad misalignment - Nature
  • Anwar, U. et al. Foundational challenges in assuring alignment and safety of large language models. TMLRhttps://openreview.net/forum?id=oVTkOs8Pka (2024).
  • Lynch, A. et al. Agentic misal… [+7112 chars]
  • Related News