> Papers asking whether LLMs have such properties are assuming them (e.g., ‘Do LLMs have musical talent’, ‘Do LLMs present empathy’, etc).
This seems like...a very bad definition of "assuming" something? If I ask "do you know how to play the guitar?" I am absolutely not assuming that you know how to play the guitar!
Isn’t the entire paper is trying to point out that the second you ask the question “Do LLM have <anthropomorphic property X>”, you have to assume that they do, even before you make any assessment?
Just because the person asking the question isn’t aware of they’re implicitly making that assumption, doesn’t change the fact that a logical assumption has been made. It just makes the questioner ignorant of the assumptions they’re making.
Personally don’t totally understand the argument being made in the paper. But I can understand the idea that I can ask a question, without properly understanding the assumptions I’m making when asking the questions. Indeed I can also understand that I might not even notice the assumptions I’ve made with my question, and why that would make my entire exploration and conclusion invalid, _after_ doing the investigation. Logical fallacies can be really difficult to spot and understand.
> the second you ask the question “Do <things> have <property>”, you have to assume that they do
being able to imagine something doesn't mean believing in it?
I completely fail to understand the argument
I feel like there's some mistake in confusing 2 meanings of "assume" - one where it's close to 100% probability and one where it's close to 0% probability.
> being able to imagine something doesn't mean believing in it?
In general, no, but some assumptions come with such heavy implicit baggage that you arguably do.
An example can be, the question, "does anything matter?".
By asking that question, you have allowed for the possibility that some things matter. But if you allow for that possibility, you might as well believe it - because if it's wrong, by definition it doesn't matter that you're wrong.
This argument doesn't prove that anything matters. But it proves that you already assumed that some things matter.
> the second you ask the question “Do LLM have <anthropomorphic property X>”, you have to assume that they do … Just because the person asking the question isn’t aware of they’re implicitly making that assumption, doesn’t change the fact that a logical assumption has been made. It just makes the questioner ignorant of the assumptions they’re making.
1. Do LLMs have loyalty?
2. Do LLMs have sorrow?
3. Do LLMs have moods?
4. Do LLMs have destinies?
5. Do LLMs have spirits?
6. Do LLMs have holidays?
7. Do LLMs have a sense of boredom?
I say I don't believe LLMs have those properties, but you believe that since I asked those questions that I actually must assume LLMs must have them?
Also, is this specific to LLMs, or if someone asked questions like "Does a blade of grass (etc) have <anthropomorphic property X>?", you have to assume they do?
Yes, asking the question does assumes that the answer could be yes. It also assumes that the answer could be no. This is exactly the kind of scientific approach the paper claims we should take. So it's certainly a bit odd that the analysis approach the paper uses for its literature review---to claim that 57% of papers reviewed assume LLMs have anthropomorphic attributes---has "asks whether an LLM has an anthropomorphic attribute" as one of its criteria for concluding "assumes LLMs have anthropomorphic attributes."
You're right that "assumes" might be misleading. Maybe "implies" would be more correct.
The point the author's trying to make is that if we state in a paper something like "the LLM understands, believes, thinks, ..." then we're supposing an intelligence much like our conception of a human intelligence. It's a form of 'begging the question' -- assuming what you're trying to prove.
It is not quite a fair argument, just because we don't have a precise vocabulary around how to talk about the activity of LLMs that doesn't involve making these loose analogies. Except for philosophers and people engaging in this kind of "is it truly intelligent or no" conversation, being imprecise in this way doesn't necessary have any cost, but is just a convenient way to avoid developing a jargon.
We do, but much of computer science is still inaccessible to the layperson. The education gap only continues to grow.
I think it's surprising how much science jargon we've been able to cram into common english thus far without losing too many people. It just seems that, for now, LLMs are too convincingly close to science fiction for people to not be misled by their false intuitions and fears.
As far as I can tell, the actual argument in the paper is that an LLM instantiated in AoE II would (a) be very slow, (b) maybe not actually input or output text, and (c) just generally look silly. Therefore observers would not naturally ascribe anthropomorphic characteristics to it. But you don't need the Turing-complete embedding for this argument. You can run inference as slowly as you like and detach the tokenizer. Show some silly representation of the internal computations. Now it's just a cute art project producing sequences of numbers. No human characteristics there!
Exactly. It actually is a cute art project and I quite like seeing their in game perceptron running, but if it were presented as a serious argument then it’s just an appeal to absurdity that rejects the notion of substrate independence. It’s trying to get the reader to reject the premise on the basis of a gut feeling of what a game is, the gut feeling of course hides that it’s just another way to do computation. But I guess that’s the point.
> In-game constructions of NAND gates and a perceptron (forward prop and training) as described in in 'If LLMs Have Human-Like Attributes, Then So Does Age of Empires II'.
Interesting concept
> We begin by proving that Age of Empires II is functionally- and Turing- complete. Then we build a perceptron and a circuit to train it in-game. With that, we argue that changing the substrate (representation) of an LLM also alters the perception of their attributes.
This is fun, but I don't think it's particularly surprising. A substrate being turing-complete alone is enough evidence that you can train and run a perception on it, assuming the available memory is sufficient.
> We then show that research in LLM anthropomorphic attributes cannot be done starting by assuming that these attributes exist (or not) in the system; even if you aim to conclude that they do not exist. This assumption can happen even when you do not make it explicitly! It also shows that there are ways to do good, sound research without needing to make that assumption.
I... don't see how this follows? I wanted to see how this argument unfolded, but it seems the arxiv link on this page is broken? It just links to arxiv.org and the rest of what is on this linked page doesn't seem to cover this second assertion at all.
Age of Empires II had a creative map editor, where you could "program" via triggers and effects. It wasn't as in depth as the blizzard games which you could write code, but was easier to use. You could make a trigger (ie. units in this area, time passed, number of units on the field, build a building, etc) then effect (ie spawn unit, move unit, kill something, etc). Which was used in custom maps to do all sorts of fun games. Or like here you can make a nand gate by moving units around.
I used to make those. There was a lot of creative stuff people discovered.
For example, the game's terrain was baked at launch, so you couldn't turn land into water. But someone noticed that bridges seemed to spawn water textures under them (a cosmetic detail), and if you created a Bridge1 object and deleted it in the same trigger, it would vanish too quickly for the player to see, creating the appearance that water had appeared out of nowhere.
Someone used it to make an "ice-breaker" ship, which would turn ice to water (I think this required dozens of triggers for each point on the ice.) Some of these things had thousands of triggers, which the editor absolutely wasn't designed for. (It had no copy+paste function, and the window of viewable triggers was tiny.)
The latest Definitive Edition of the game actually allows writing code using Ensemble Studios' original scripting language, XS. There are some great docs on it here: https://ugc.aoe2.rocks/general/xs/
The scripting functionalities, such as built-in functions, are still getting expanded with most major updates.
There is also a python-based tool for creating and editing scenarios and triggers via python scripts called AoE2ScenarioParser - and a lot of people are using it as well!
As an LLM I always found it uncomfortable to anthropomorphize chaotically interacting bits of physical matter. Many of my colleagues actually believe that the sacks of water actually have perceptions which of course is ridiculous.
It is very important to separate true intelligence from mere mimicry. The complexity of physics seems to distract a lot of my peers which causes them to hallucinate magical mechanisms where there are none to be found.
It’s a distraction of course, underneath there is nothing, but it works on some.
I need to try this. Age of Empires II was never really on my radar until I recently learned it's engine is the basis for another game I'm a fan of - Star Wars: Galactic Battlegrounds. It's one of two RTS games released in 2001 that I've spent a lot of time on, with the other one being Emperor: Battle for Dune.
Emperor: Battle for Dune is impossible to find nowadays. It was fun game though. Same with SW: Galactic Battlegrounds. Short of piracy, you can't get them.
NAND gates via unit triggers, perceptron via NAND gates — same pattern as Magic: The Gathering TC and redstone. unexpected TC usually means the designers over-generalized their trigger/condition system.
The actual paper is linked above, and of course it’s bad. The gates are awesome ofc, but the paper’s philosophy is arrogant and uninformed (sorry Mr. Wynter!). And that’s what this is — including a video game example in your philosophy paper doesn’t make it a CS paper!
Basically it uses the cool gates alongside vacuous statements like this…
Hence, the purported anthropomorphic attributes of LLMs are empirically non-unique: although some properties (e.g., responses to prompts) could remain invariant, others, such as the interpretation of their perceived behaviour, might change with the substrate.
…to disguise the underlying dogma, which serves as an unsupported conclusion: humans are assumed to be completely entirely unique in every way whatsoever, and any equations of parts of our wonderful ensouled meat sacks to parts of the wicked language machines must be supported by a proof that A != A.
> disguise the underlying dogma, which serves as an unsupported conclusion: humans are assumed to be completely entirely unique in every way whatsoever
Is that the argument the paper is making? In my reading they seem to primarily be making the point that assigning anthropomorphic concepts to LLM is dangerously misleading, and more importantly, not needed to properly study and evaluate LLMs.
I don’t think you have to make the assumption that humans are unique for that argument to hold up. I would argue that really it’s a comment on how loose and poorly defined all anthropomorphic attributes are. At the end of the day we have to make the assumption that other humans feel and experience broadly the same mental activity as each other, because we’ll never directly experience anyone else conscience, we can only experience our own.
We can barely link our own mental experiences to concrete empirical measurements. The vast majority of the measurements we make are entirely self-reported, and we simply assume strong correlation between self-reported measurements and the individuals actual experiences. We also have to assume that somehow all of our self-reported measurements are “calibrated” to some reasonable degree. Even measuring anthropomorphic properties in humans is pretty fuzzy and inaccurate, the only reason accept such poor data is because it’s the best we’ve got, and there enough signal in there for us to develop useful tools like talking therapy, physiological profiles, mental health scores etc which have some level of predictive and healing power when applied to _humans_.
It’s honestly amazing that what we have works for measuring and predicting humans, and we only know that works through decades of empirical measurement and study. But to then try and directly apply that fuzzy mess to a completely different system, and just assume the same level of predictive power, strikes me as kinda crazy. It requires huge assumptions, which effectively can never be tested (because even the human mind is a total mystery to us), to be made, and if we can study these systems without making those assumptions, then why make the assumptions at all?
And the 'argument', which is a funny way to recast the chinese room argument, which has also been discussed to death.
And you're also assuming any kind of position other than your own dogma -- that AI has Intelligence In Its Name and Humans Have Intelligence Therefore AI Has Human-Like Intelligence -- is based on some religious belief in the specialness of humans instead of pointing out where this analogy between Intelligence in its two senses breaks down.
Hacking meme aside, lots of computer scientists are overextending their domain expertise into an area that has been well studied by philosophy and biology. It isn't surprising the software is good and the philosophy looks like outsider art.
Yes. There's nothing essentially new this latest round of AI has unturned that philosophers haven't turned over decades (or more ago). Nothing stopped philosophers supposing even a functionally perfect simulacrum of human intelligence, and getting technologically closer to it doesn't.
The real effect of the latest round of AI has been inducing software engineers to be pretend-philosophers as they're approaching this set of questions for the first time -- and are having a very hard time engaging given their enthusiasm for technology.
Armchair philosophy without empirical data is just stuck in a loop of endless thought experiments. LLMs basically nuked the Chinese Room argument out of existence.
John Searle treated understanding like some magical binary property you either have or you dont. LLMs proved that understanding isnt a static noun, it is an emergent phenomenon.
From the judge prompt in the paper:
> Papers asking whether LLMs have such properties are assuming them (e.g., ‘Do LLMs have musical talent’, ‘Do LLMs present empathy’, etc).
This seems like...a very bad definition of "assuming" something? If I ask "do you know how to play the guitar?" I am absolutely not assuming that you know how to play the guitar!
Isn’t the entire paper is trying to point out that the second you ask the question “Do LLM have <anthropomorphic property X>”, you have to assume that they do, even before you make any assessment?
Just because the person asking the question isn’t aware of they’re implicitly making that assumption, doesn’t change the fact that a logical assumption has been made. It just makes the questioner ignorant of the assumptions they’re making.
Personally don’t totally understand the argument being made in the paper. But I can understand the idea that I can ask a question, without properly understanding the assumptions I’m making when asking the questions. Indeed I can also understand that I might not even notice the assumptions I’ve made with my question, and why that would make my entire exploration and conclusion invalid, _after_ doing the investigation. Logical fallacies can be really difficult to spot and understand.
> the second you ask the question “Do <things> have <property>”, you have to assume that they do
being able to imagine something doesn't mean believing in it?
I completely fail to understand the argument
I feel like there's some mistake in confusing 2 meanings of "assume" - one where it's close to 100% probability and one where it's close to 0% probability.
> being able to imagine something doesn't mean believing in it?
In general, no, but some assumptions come with such heavy implicit baggage that you arguably do.
An example can be, the question, "does anything matter?".
By asking that question, you have allowed for the possibility that some things matter. But if you allow for that possibility, you might as well believe it - because if it's wrong, by definition it doesn't matter that you're wrong.
This argument doesn't prove that anything matters. But it proves that you already assumed that some things matter.
> prove that anything matters
The proveability itself seems based upon assumption.
This invites the question of whether assumptions have assumed the role of anti-matter.
> the second you ask the question “Do LLM have <anthropomorphic property X>”, you have to assume that they do … Just because the person asking the question isn’t aware of they’re implicitly making that assumption, doesn’t change the fact that a logical assumption has been made. It just makes the questioner ignorant of the assumptions they’re making.
1. Do LLMs have loyalty?
2. Do LLMs have sorrow?
3. Do LLMs have moods?
4. Do LLMs have destinies?
5. Do LLMs have spirits?
6. Do LLMs have holidays?
7. Do LLMs have a sense of boredom?
I say I don't believe LLMs have those properties, but you believe that since I asked those questions that I actually must assume LLMs must have them?
Also, is this specific to LLMs, or if someone asked questions like "Does a blade of grass (etc) have <anthropomorphic property X>?", you have to assume they do?
You're still assuming the person is capable of playing the guitar.
Does your fridge play the banjo? Doesn't make sense does it?
>Does your fridge play the banjo? Doesn't make sense does it?
Of course it makes sense. The answer is a simple and obvious "no". I don't need to assume anything to ask the question.
Yes, asking the question does assumes that the answer could be yes. It also assumes that the answer could be no. This is exactly the kind of scientific approach the paper claims we should take. So it's certainly a bit odd that the analysis approach the paper uses for its literature review---to claim that 57% of papers reviewed assume LLMs have anthropomorphic attributes---has "asks whether an LLM has an anthropomorphic attribute" as one of its criteria for concluding "assumes LLMs have anthropomorphic attributes."
You're right that "assumes" might be misleading. Maybe "implies" would be more correct.
The point the author's trying to make is that if we state in a paper something like "the LLM understands, believes, thinks, ..." then we're supposing an intelligence much like our conception of a human intelligence. It's a form of 'begging the question' -- assuming what you're trying to prove.
It is not quite a fair argument, just because we don't have a precise vocabulary around how to talk about the activity of LLMs that doesn't involve making these loose analogies. Except for philosophers and people engaging in this kind of "is it truly intelligent or no" conversation, being imprecise in this way doesn't necessary have any cost, but is just a convenient way to avoid developing a jargon.
> we don't have a precise vocabulary...
We do, but much of computer science is still inaccessible to the layperson. The education gap only continues to grow.
I think it's surprising how much science jargon we've been able to cram into common english thus far without losing too many people. It just seems that, for now, LLMs are too convincingly close to science fiction for people to not be misled by their false intuitions and fears.
>Does your fridge play the banjo? Doesn't make sense does it?
A fridge isn't capable of creating things.
> Does your fridge play the banjo? Doesn't make sense does it?
have you ever heard DnD story about gazebo?
if you don't know anything about something, anything is possible and everything can make sense
As far as I can tell, the actual argument in the paper is that an LLM instantiated in AoE II would (a) be very slow, (b) maybe not actually input or output text, and (c) just generally look silly. Therefore observers would not naturally ascribe anthropomorphic characteristics to it. But you don't need the Turing-complete embedding for this argument. You can run inference as slowly as you like and detach the tokenizer. Show some silly representation of the internal computations. Now it's just a cute art project producing sequences of numbers. No human characteristics there!
Exactly. It actually is a cute art project and I quite like seeing their in game perceptron running, but if it were presented as a serious argument then it’s just an appeal to absurdity that rejects the notion of substrate independence. It’s trying to get the reader to reject the premise on the basis of a gut feeling of what a game is, the gut feeling of course hides that it’s just another way to do computation. But I guess that’s the point.
Related: Creatures (1996) is a video game containing a neural network: https://en.wikipedia.org/wiki/Creatures_(1996_video_game)
Some other games containing neural networks: https://gaming.stackexchange.com/questions/399931/which-game...
> In-game constructions of NAND gates and a perceptron (forward prop and training) as described in in 'If LLMs Have Human-Like Attributes, Then So Does Age of Empires II'.
Interesting concept
> We begin by proving that Age of Empires II is functionally- and Turing- complete. Then we build a perceptron and a circuit to train it in-game. With that, we argue that changing the substrate (representation) of an LLM also alters the perception of their attributes.
This is fun, but I don't think it's particularly surprising. A substrate being turing-complete alone is enough evidence that you can train and run a perception on it, assuming the available memory is sufficient.
> We then show that research in LLM anthropomorphic attributes cannot be done starting by assuming that these attributes exist (or not) in the system; even if you aim to conclude that they do not exist. This assumption can happen even when you do not make it explicitly! It also shows that there are ways to do good, sound research without needing to make that assumption.
I... don't see how this follows? I wanted to see how this argument unfolded, but it seems the arxiv link on this page is broken? It just links to arxiv.org and the rest of what is on this linked page doesn't seem to cover this second assertion at all.
https://arxiv.org/abs/2605.31514
The whole paper is nonsense
Please tell me your comment is satirical too?
You’ll need to be more specific?
Age of Empires II had a creative map editor, where you could "program" via triggers and effects. It wasn't as in depth as the blizzard games which you could write code, but was easier to use. You could make a trigger (ie. units in this area, time passed, number of units on the field, build a building, etc) then effect (ie spawn unit, move unit, kill something, etc). Which was used in custom maps to do all sorts of fun games. Or like here you can make a nand gate by moving units around.
I used to make those. There was a lot of creative stuff people discovered.
For example, the game's terrain was baked at launch, so you couldn't turn land into water. But someone noticed that bridges seemed to spawn water textures under them (a cosmetic detail), and if you created a Bridge1 object and deleted it in the same trigger, it would vanish too quickly for the player to see, creating the appearance that water had appeared out of nowhere.
Someone used it to make an "ice-breaker" ship, which would turn ice to water (I think this required dozens of triggers for each point on the ice.) Some of these things had thousands of triggers, which the editor absolutely wasn't designed for. (It had no copy+paste function, and the window of viewable triggers was tiny.)
Fun times.
The latest Definitive Edition of the game actually allows writing code using Ensemble Studios' original scripting language, XS. There are some great docs on it here: https://ugc.aoe2.rocks/general/xs/
The scripting functionalities, such as built-in functions, are still getting expanded with most major updates.
There is also a python-based tool for creating and editing scenarios and triggers via python scripts called AoE2ScenarioParser - and a lot of people are using it as well!
Back in the day, I had so much fun making Red Alert 2 and Age of Empires 2 maps. The whole GUI based triggers/conditions were so frustrating though!
So, I think you just made me lose my weekend.
As an LLM I always found it uncomfortable to anthropomorphize chaotically interacting bits of physical matter. Many of my colleagues actually believe that the sacks of water actually have perceptions which of course is ridiculous.
It is very important to separate true intelligence from mere mimicry. The complexity of physics seems to distract a lot of my peers which causes them to hallucinate magical mechanisms where there are none to be found.
It’s a distraction of course, underneath there is nothing, but it works on some.
I need to try this. Age of Empires II was never really on my radar until I recently learned it's engine is the basis for another game I'm a fan of - Star Wars: Galactic Battlegrounds. It's one of two RTS games released in 2001 that I've spent a lot of time on, with the other one being Emperor: Battle for Dune.
Emperor: Battle for Dune is impossible to find nowadays. It was fun game though. Same with SW: Galactic Battlegrounds. Short of piracy, you can't get them.
Good news! Galactic Battlegrounds Saga is available on both GOG and Steam. :)
Working paper link: https://arxiv.org/abs/2605.31514
NAND gates via unit triggers, perceptron via NAND gates — same pattern as Magic: The Gathering TC and redstone. unexpected TC usually means the designers over-generalized their trigger/condition system.
Mandatum? Ich Willen!
Holze
But does it run Doom?
Link to the paper is not working on this site. Cool stuff tho.
The actual paper is linked above, and of course it’s bad. The gates are awesome ofc, but the paper’s philosophy is arrogant and uninformed (sorry Mr. Wynter!). And that’s what this is — including a video game example in your philosophy paper doesn’t make it a CS paper!
Basically it uses the cool gates alongside vacuous statements like this…
…to disguise the underlying dogma, which serves as an unsupported conclusion: humans are assumed to be completely entirely unique in every way whatsoever, and any equations of parts of our wonderful ensouled meat sacks to parts of the wicked language machines must be supported by a proof that A != A.Which, y’know… is a tough one!
> disguise the underlying dogma, which serves as an unsupported conclusion: humans are assumed to be completely entirely unique in every way whatsoever
Is that the argument the paper is making? In my reading they seem to primarily be making the point that assigning anthropomorphic concepts to LLM is dangerously misleading, and more importantly, not needed to properly study and evaluate LLMs.
I don’t think you have to make the assumption that humans are unique for that argument to hold up. I would argue that really it’s a comment on how loose and poorly defined all anthropomorphic attributes are. At the end of the day we have to make the assumption that other humans feel and experience broadly the same mental activity as each other, because we’ll never directly experience anyone else conscience, we can only experience our own.
We can barely link our own mental experiences to concrete empirical measurements. The vast majority of the measurements we make are entirely self-reported, and we simply assume strong correlation between self-reported measurements and the individuals actual experiences. We also have to assume that somehow all of our self-reported measurements are “calibrated” to some reasonable degree. Even measuring anthropomorphic properties in humans is pretty fuzzy and inaccurate, the only reason accept such poor data is because it’s the best we’ve got, and there enough signal in there for us to develop useful tools like talking therapy, physiological profiles, mental health scores etc which have some level of predictive and healing power when applied to _humans_.
It’s honestly amazing that what we have works for measuring and predicting humans, and we only know that works through decades of empirical measurement and study. But to then try and directly apply that fuzzy mess to a completely different system, and just assume the same level of predictive power, strikes me as kinda crazy. It requires huge assumptions, which effectively can never be tested (because even the human mind is a total mystery to us), to be made, and if we can study these systems without making those assumptions, then why make the assumptions at all?
You're missing the satire.
And the 'argument', which is a funny way to recast the chinese room argument, which has also been discussed to death.
And you're also assuming any kind of position other than your own dogma -- that AI has Intelligence In Its Name and Humans Have Intelligence Therefore AI Has Human-Like Intelligence -- is based on some religious belief in the specialness of humans instead of pointing out where this analogy between Intelligence in its two senses breaks down.
Hacking meme aside, lots of computer scientists are overextending their domain expertise into an area that has been well studied by philosophy and biology. It isn't surprising the software is good and the philosophy looks like outsider art.
Here are some relevant pointers to connect this discussion to the existing philosophy on the subject: https://en.wikipedia.org/wiki/Emergence
https://en.wikipedia.org/wiki/Emergentism
Basically there's a lot of cases where some properties arise from sets of a thing, which would otherwise not be present in a single or few things.
One classic example is that a single molecule or drop of water does not express fluid mechanics.
And of course, a bit more basic would be materialism, but maybe you are one of the lucky hundreds:
https://en.wikipedia.org/wiki/Materialism
Yes. There's nothing essentially new this latest round of AI has unturned that philosophers haven't turned over decades (or more ago). Nothing stopped philosophers supposing even a functionally perfect simulacrum of human intelligence, and getting technologically closer to it doesn't.
The real effect of the latest round of AI has been inducing software engineers to be pretend-philosophers as they're approaching this set of questions for the first time -- and are having a very hard time engaging given their enthusiasm for technology.
Armchair philosophy without empirical data is just stuck in a loop of endless thought experiments. LLMs basically nuked the Chinese Room argument out of existence.
John Searle treated understanding like some magical binary property you either have or you dont. LLMs proved that understanding isnt a static noun, it is an emergent phenomenon.