0
Language models transmit behavioural traits through hidden signals in data
April 20, 2026
Posted 3 hours ago by
Alex Cloud, et al., Nature, Apr 20, 2026 This article (26 page PDF) proves a theoretical result showing that subliminal learning arises in neural networks under broad conditions. Specifically, as artificial intelligence systems are increasingly trained on the outputs of one another, they may inherit properties not visible in the data. So, for example, a 'teaching' LLM may favour owls, and this may result in a 'learning' LLM favouring owls, even though there's no explicit representation of owls in the data.
That said, as David Johnston comments, Why is this mysterious? Models learn latent representations. Why would you expect them to not transmit information when you only remove the final layer of data? O agree. Indeed, the strength of neural networks is that they detect patterns that are not readily apparent to humans. We should not be surprised to find them in the output. Web: [Direct Link] [This Post]
Stephen's Web ~ OLDaily
Coverage and analysis from Canada. All insights are generated by our AI narrative analysis engine.