Academic Turing Tests | How to fool the experts with nonsense

0 32

Written by

No bio yet...

4 years ago

The Turing test is a set of criteria used to determine if machines can “think”, or exhibit signs of intelligence. If a computer can reliably convince interrogators that they are communicating with another human, it passes the Turing test. John Searle, philosopher of language and mind, claims this test misses an obvious point about language and communication. He illustrates the test’s problem with the Chinese room experiment.

Imagine an English speaker is placed in a room with a slit from which pieces of paper with Chinese writing are passed to him. In the room he has an instruction manual to help him create responses to pass back through the slit. Suppose this instruction manual is comprehensive and he learns to use it create elaborate responses. If native Chinese speakers outside the room are convinced that they are communicating with another Chinese speaker, does this mean the person inside the room understands Chinese?

This example makes it easy for us to understand why not. When the English speaker looks at a Chinese character, he does not form in his mind the image of what the symbol refers to. It doesn’t have any meaning to him. He is given symbols and a set of instructions for how to manipulate those symbols. This is precisely what computers are doing.

There is a similar sort of symbol manipulation going on in academia.

As a student and an assistant for a philosophy professor, I was forced to write out and articulate arguments and beliefs I disagreed with. Not as a method of indoctrination, but because we were graded on how well we understood an argument or worldview, regardless of what we thought about it. There is a time and place for commentary and criticism, but professors don’t care about your comments and criticisms until you can demonstrate the rudimentary skills of working with ideas first. There is plenty to disagree with Aristotle on, but if you don’t understand what he’s arguing, you are wasting everyone’s time telling people what you think of his philosophy.

My last year in school I took one of those [Insert Subject] Studies courses and once again I had to articulate thoughts I disagreed with. This time it was easy to do. In previous courses, we looked at evidence to understand how it refutes or confirms competing theories, but this time we were given the correct way to think at the beginning. Rather than weighing ideas against the evidence, we were given an algorithm through which to filter information.

I understood by the end of the first week that because I knew the algorithm, I was going to get an A in the class. So, I showed up every class period hungover, feeling ill, irritable, and I just processed the information through the algorithm. I have very few memories of that class because I was rarely ever conscious. You don't need a mind to run an algorithm.

The final for that course was something of a Turing test: can you generate text with the algorithm? Not literally asked like this, but you get it. The course looked just like any other. We were given material to study and that material was on the test. I didn’t have to study though, because I went through each question and filtered it through the algorithm. The truth was taken to be self-evident. All I was doing was taking symbols and manipulating them to create a response, rather convincingly too.

Maybe you think I’m exaggerating? What if we were to simulate a Turing test on academics to find out? It would work like this: we take two fields – say embryology and women’s studies – and submit 20 bullshit articles each to the highest impact journals in those fields. We get a bunch of non-experts in these fields to help generate text that appears to look like authentic research in those fields. Do a little editing to make it as convincing as possible. Then, we compare the rates of which each field can filter out bullshit.

In 2018 three academics successfully got 7 purposefully nonsensical papers accepted into peer-reviewed journals, with another 13 in the review process before they were found out. They were discovered by a journalist, of all people, who couldn’t believe that one of their published papers was real.

“While I closely and respectfully examined the genitals of slightly fewer than ten thousand dogs, being careful not to cause alarm and moving away if any dog appeared uncomfortable, there is some relevant margin of error concerning my observations about their gender in some instances,” they wrote in a paper describing rape culture at an urban dog park. Did the reviewers not suspect this was an elaborate prank? Academic writing is always on the margins of our knowledge and sensemaking. Reviewers ought to be hypercritical and hyperconscious when they are reading this type of writing – not asleep at the wheel.

The reason why they were able to fool the reviewers is because they knew the script. They knew the conclusions and the jargon the reviewers wanted, and from there reverse engineered the papers. It’s noteworthy that all these papers were in the social sciences and humanities – and specifically in fields like critical theory, gender theory, etc. etc. I have a hunch that you couldn’t pull this off in the field of embryology. Much of the theoretical jargon is built on top of empirical rigor. BUT – I don’t know this for sure, so I propose the above test.

This essay by Jordan Hall helps to explain why the reviewers were so easily duped. It introduces the concept of “simulated thinking” by comparing it to two modes we oscillate between in our ordinary lives. The first is exploratory mode. It’s an awkward, self-conscious, and clumsy sort of way of going about learning a new skill or studying new material. Eventually, as you become more competent, you shift into habit mode. In habit mode, you can drive your car without consciously narrating to yourself, “there is a red light, I should put on the break.” Instead, you run a bunch of unconscious scripts that allow you to drive while you pay attention to the radio or carry a conversation.

There are two possible errors here. We know the first pretty well, called overthinking, when we put too much conscious effort into mundane things. The other, call simulated thinking, occurs when we use unconscious scripts when we should be in exploratory mode, say when driving around in a new and unfamiliar city looking for a specific spot.

Simulated thinking is more advantageous than exploratory mode in the short term. When it works, it’s quicker and more efficient than exploratory thinking; and because it works, it doesn’t matter whether it has any consistent relationship to reality. In the long term, however, the liabilities add up, and things line up for a monumental systemic failure – say, a car crash, or an entire academic discipline getting built up on the tastes and sensibilities of the academics rather than on reason and facts.

Sponsors of CotyReh

empty

Become a sponsor

Get sponsored