0
Luke Plant: Inverse Sapir-Whorf and programming languages
May 1, 2026
Posted 4 hours ago by
The Sapir-Whorf hypothesis, in it simplest form, is the idea that the language you speak influences the thoughts you think. This post is about a twist on this idea, that I’m calling “Inverse Sapir-Whorf” (for want of a better term), and how we see it in computer programming languages. Sapir-Whorf is one of those ideas that has been popularised in general culture in a rather misrepresented and exaggerated form.
In the field of linguistics, not many people today take seriously the “strong” forms of Sapir-Whorf, such as “linguistic determinism” – the idea that a language controls your thoughts or limits what you can think, or that you even need certain languages to think certain thoughts. For example, just because a language might lack grammatical tenses, it doesn’t at all follow that the speakers will be more limited in how they think about time – there are always other ways you can express time. There is a fair amount of evidence that spoken languages can affect perception, skill and attitudes in certain areas, but it’s usually hard to demonstrate a large direct effect. Inverse Sapir-Whorf is a bit different. I haven’t been able to track down where I first came across the idea, but it goes like this: if classic Sapir-Whorf says your language limits what you can say or think, or makes it hard to say some things, inverse Sapir-Whorf says your language limits what you can’t say, or makes it hard not to say some things, or even hard not to think about some things. Some examples might clear things up. Examples in natural language There are many examples to choose from, but they are not always obvious to native speakers of a language. I’ll pick just a few. English: temporary or permanent present tense What’s the difference between someone saying “I’m living in London” and “I live in London”? A non-native speaker may not pick this up at all, and a native speaker may pick it up only subconsciously, but “I’m living in London” reveals that the arrangement is temporary. Now, this might not even be to do with the actual length of time you have been living there, because “temporary” is pretty relative. It might be more about how much you like London. You have to choose a tense, and because you typically do so subconsciously, the language is forcing you to reveal things – either the period of time you’ve been living somewhere, or how you feel about it. English/Turkish/French: gendered pronouns and nouns In English, in normal speech you are going to use “he” or “she” when referring to a specific person. “Singular they” does exist, but it’s very unnatural if you are talking about a specific person of known or assumed sex. You can compare this to another language which doesn’t have gendered pronouns, such as Turkish, which just has “o” for he/she/it. The lack of gendered pronouns in Turkish doesn’t stop you from thinking or talking about a person’s sex, or produce a “less gendered society”, or anywhere close, so it would be difficult to find support for normal Sapir-Whorf here. But the inverse Sapir-Whorf is obvious – English pronouns push you to talk about it whether you want to or not. If you are trying to talk about someone you know, but do so anonymously, it can be very hard to avoid making their identification easier by revealing their sex with an inadvertent “him” or “her”. Different again is French, in which nouns are gendered, which in some cases can force you to reveal information. If you translate “my friend” into French, you have to choose between “mon ami” (male friend) and “mon amie” (female friend), which are distinct, at least in written form, or “mon copain” vs “ma copine”. Possessive pronouns are also interesting – they are gendered in both English and French (his/her, son/sa), but refer to the gender of the possessor and possessee respectively, and so reveal different information. Turkish: “mış” tense With some simplifications, Turkish has two main past tenses: there is the normal one that is similar to “simple past” in English, and then there is the “mış” form (you can pronounce that “mish” if you want). This has various functions, but when describing a past event, this form is used when you have second hand or unreliable information. If someone asks you “Did Fred come to work on Monday?”, then if you saw him you would use the normal past tense “geldi” (he came), but if you only heard that he came you would instead say “gelmiş” (he came, but second hand information). The interesting thing to me as a non-native speaker was the effect of having these options, in contrast to English where you can just use simple past tense without any specific indication of reliability or where the information came from. In certain circumstances, Turkish forces you to include information about your level of certainty or whether you witnessed something – the simple past form is not neutral, because the existence of the “mış” form makes it an unnatural choice if it is not the most appropriate of the two. Interestingly, having learned to think that way, my wife and I have noticed an effect on our English. Often in Turkish the “mış” suffix would come at the end of the last word in a sentence, so now quite frequently we get to the end of an English sentence and notice that we haven’t put in any marker for “this-is-second-hand-info-I-didn’t-actually-witness-it”, and so we tack “mış” on the end. Of course, you can easily express the same thing in English, using words like “apparently” and other means, but English doesn’t force you to specify, while Turkish pretty much does. Comments You often don’t notice these things until you learn another language, or attempt to teach your language to a foreigner. You kind of just understand them subconsciously. The vast majority of times you choose simple present over present continuous, for example, you won’t be consciously thinking about what that implies. I should also note that when a language forces you express something, it might not be in the form of something included, but in something omitted. For example, I might say “I love cake” or “I love the cake”. In the first case, I’m talking about cake generally, in the second about a specific cake. It is the absence of the word “the” in the first case that makes it unambiguous that I’m referring to all cake, because if I’m referring to a specific cake, I must use the word “the” or some other marker like “this”. In another language, there might not be a direct equivalent to this distinction. Examples in programming When it comes to programming languages, I think that the “straight” version of Sapir-Whorf is closer to being true - in some programming languages it is simply hard to express certain concepts. For example, in a language like Python or Haskell it’s hard (though not impossible) to talk about memory allocations. We often talk about the limitations of a language in terms of “things that are hard to express” in that language. Hillel Wayne has some more discussion of this in his post Sapir-Whorf does not apply to Programming Languages. But I want to talk more about Inverse Sapir-Whorf. What is the language forcing you to talk about, even if you don’t actually care about it? I think there are actually many, many examples of this, but seeing them can be quite hard, and often requires the “foreigner perspective” that comes from learning multiple languages. Here are a few: Most languages force you to express the order in which computation should be done. For example, in Python: x = some_func(y + 1, z + 2) Here you are saying: first compute y + 1 then compute z + 2 then pass these two values as arguments to some_func You might not be very conscious of specifying this ordering, but you are doing it, and in Python, there isn’t a way to express the above computation which doesn’t also specify order. Most languages are similar, although in some it gets very complicated. A few languages are very different, however. In Haskell, in an equivalent expression like some_func (y + 1) (z + 2), due to non-strict semantics you are not specifying an order of evaluation at all. This enables some clever tricks, like referring to values you haven’t defined yet (see Tying the knot). Function colouring for async is another good example. In languages like Javascript or Python with an explicit async keyword, you have to talk about whether code is sync or async. In the case of “sync” functions, you do it by omission of the async keyword, but you are still choosing between the two options, and there is no way to write code that is ambivalent on the subject. Most languages without garbage collection force you to talk about memory allocation and de-allocation. For languages like C, you normally do this fairly explicitly – or implicitly use stack allocation, but you’ve still got to make that choice. In other languages, it can become more hidden, but doesn’t really go away. In Rust, for example, in gets converted into talk about lifetimes or explicit reference counting. Saying “I just don’t care about when the memory for this gets allocated or de-allocated, please deal with it” is not really one of your options. Of course, not talking about memory allocation also has a cost. In that case, the language will almost certainly need to put a lot of things on the heap and have a runtime garbage collector. However it may also have significant freedom to choose for you – Haskell will often be able to do this using strictness analysis, for example. All modern languages I’m aware of force you to think about “scope”. In many cases, scope is expressed by the physical place in which you put a variable, with some additional syntax if you want something different (like global or nonlocal in Python). If you never want to think about scope, you probably have to drop to assembly and live with a single global address space. Statically typed languages often force you to think about and talk about the type of every variable. This is lessened somewhat by type inference, as the “conversation” involves a more “intelligent” listener who can pick up more things from context, but it’s still there. Pure dynamically typed languages still allow you to talk about types – for example, using things like isinstance checks in Python, but it is more unnatural (and technically it’s a different thing anyway). In contrast to both of them, one of the attractions of gradually typed languages is that they genuinely avoid the inverse Sapir-Whorf problem, and allow you the freedom to talk about types, or not, at your preference. I’m not sure how well this works in practice - the existing code base conventions and the linters in use always put pressure in some direction. I suspect that many of the features of more “approachable” or “readable” programming languages could be analysed in these terms – they have a low inverse Sapir-Whorf barrier, and don’t force you to talk about things you don’t have an opinion on, and may not even understand yet. Are there more examples of this that you’ve come across? How do they affect the programming languages we use, or how we perceive them?
Planet Python
Coverage and analysis from United States of America. All insights are generated by our AI narrative analysis engine.