Last March, Microsoft researchers signed up the not yet peer-reviewed article Sparks of Artificial General Intelligence: Early experiments with GPT-4: “We demonstrate that GPT-4 not only masters language, but can also solve new and difficult tasks in math, programming, medicine, law, psychology and much more without the need for special instruction. Moreover, the performance of GPT-4 in all these tasks is remarkably close to human-level performance.”

GPT-4 is the language model that underlies the most recent version of ChatGPT, an AI system that writes texts based on a piece of input text and has caused a storm of excitement since its launch in November 2022. That storm has now subsided somewhat and researchers from countless scientific fields have been able to experiment with it. NRC asked professors from four disciplines about their first experiments with ChatGPT and its significance for their field of study.

Marc van Oostendorp:

‘On a scientific level you see that ChatGPT turns linguistics upside down’

Professor of Dutch and academic communication at Radboud University.

He submitted the central vwo final exam Dutch to ChatGPT.



“I did my first experiment with ChatGPT based on GPT-3.5. This system achieved 33 of the 60 points and has just dropped. But when I ran the experiment again with the GPT-4-based ChatGPT, the program passed, with something like an 8 or even an 8.5. ChatGPT also scored more than an 8 for the vwo final exam in French, but the session for Frisian was very sad with even bizarre answers.

“A year ago I would not have predicted that there would now be a computer that more or less passes the VWO final exam in Dutch. The answers varied between ‘I’m surprised a computer got this right’ and ‘what a weird mistake’. The system is relatively good at analyzing the questions, but not so good at understanding the genre of the exam itself.

“The central exam is about measurable aspects of dealing with texts, such as recognizing argumentation schemes, fallacies and connections between paragraphs. Of course you can ask whether ChatGPT really understands the text if it can do all that, but that question was also asked before ChatGPT when it comes to students. “Reading comprehension” of the final exam may not really be comprehension. This also means, for example, being able to place a text in its context, such as in which debate the author of this piece is participating, or reading somewhat more complicated texts than the opinion pieces from the central final exam: literary texts, for example.

“At a scientific level, you see ChatGPT turning linguistics upside down. For decades, there has been a debate about whether and to what extent language is innate. Some scientists say that ChatGPT shows that the idea of ​​innateness is nonsense. Again, other scientists argue that while ChatGPT can learn human language, it can also learn inhuman language, for example, a language in which you number each syllable, then stress syllables that are prime numbers. Humans can’t, computers can.

“With artificial intelligence, there are always shifting boundaries: first chess was the highest form of human intelligence, then the game go, and if that turns out to be solved, we choose a new boundary. This is what happens with text comprehension. I find it very interesting how our thinking about our thinking becomes more and more precise because we have to compare it to what computers can or cannot do.”

Anne Meuwese

‘The most complex legislation has to solve new problems. A system like ChatGPT can’t do that’

Professor public law and governance of artificial intelligence at Leiden University.

She experimented with ChatGPT as a law writer.



“I gave ChatGPT the following assignment: write a law banning dangerous dogs. There is no legislation on this subject in the Netherlands yet, but it is not strange to legislate on it. ChatGPT came up with an article divided into nine sub-articles.

“What immediately stood out is that it is a short and highly simplified legal text that is relatively poor in formal, legal aspects. For example, the eighth sub-article reads: ‘Violation of the provisions of this Act is punishable by law and may lead to a fine or the revocation of the licence’. A concept such as ‘punishable’ should be specified in more detail. What is missing are references to articles in the Criminal Code. The withdrawal of a permit does not count as a punishment either.

“At the same time, ChatGPT comes up with quite interesting substantive suggestions, such as the idea of ​​a permit. In the second sub-article, ChatGPT writes: ‘It is prohibited to keep, possess or control a dangerous dog in the Netherlands, unless the owner or caretaker is in possession of a valid permit, issued by the municipality in which the dog lives. stays.’ On the other hand, the crucial point is how you define what constitutes a dangerous dog, and ChatGPT doesn’t elaborate on that at all.

“My main point of criticism is that the difficult thing about making a legal provision is not writing the text, but thinking about how the law fits into the legal system, which definitions you use and which rules you want to make exactly. The most complex legislation has to solve new problems, such as legislation on nitrogen emissions, and this often requires a new way of thinking. A system like ChatGPT cannot do that, because it is only trained on data from the past. So I don’t think ChatGPT can save much time for writing laws.

“Perhaps ChatGPT can provide inspiration by listing options or by drawing on comparable foreign legislation, but that is always something that is already there. What ChatGPT can also help with is rewriting less formal texts, for example in a slightly different style. I do think that organizations, and certainly governments, should think carefully about whether they want to allow their employees to use ChatGPT at all, because of the opacity of the model and the data you disclose with it.”

Sanne Abeln

“Then I asked, is that size related to anything? ChatGPT couldn’t think of that either’

Professor AI technology for life at Utrecht University and also affiliated with the VU University.

She submitted exam questions for master students of biology to ChatGPT and investigates what the underlying large language models can mean for biological research.



“ChatGPT does quite well for knowledge questions at the level of master’s students, for example when I ask what type of local folding proteins can have. But when I ask to link that knowledge to scientific literature, the system gives references to non-existent articles.

“Reasoning about existing knowledge also goes very wrong. For example, I asked the question twice in a slightly different way to come up with a measure for the local folding of a protein. One time it went right, the other time it went completely wrong. Then I also asked if that size already existed. There was no good answer. Then I asked, is that size related to anything? ChatGPT couldn’t think of that either. Then I asked the other way: there is a measure of local folding, can you explain that? And yes, given a description of that size, the system could explain it. So what you see is that you already need quite a bit of domain knowledge to adjust the system towards the right answer.

“In my own scientific research, we have been experimenting for several months with part of ESMFold, an AI program that can predict protein structures and is based on the same types of models as ChatGPT. When you have a lot of available training data, such AI programs have good predictive power. What is missing, however, is insight or understanding of why a protein folds as the program predicts. In addition, in biology you often have little data, for example when it comes to rare diseases. That is why we continue to need other models that also provide understanding.

“At the university we have now made guidelines about what students are and are not allowed to do with ChatGPT. But at my husband’s secondary school, suddenly three-quarters of the havo students submitted answers generated by ChatGPT. I think ChatGPT is disruptive for all levels of education. It has resulted in a lot of extra work for teaching staff in the past six months. Actually, I think it is irresponsible that ChatGPT has been made public without the education sector being able to prepare for it.”

Arie van Deursen:

‘GPT as a programming assistant is just one of the possible applications’

Professor of software engineering at TU Delft, addresses the question of how good ChatGPT is at programming.



“Language models such as GPT can be very helpful in programming. Programmers already use these types of models as tools that read along with the code they write and can make suggestions. All major tech companies are working on this kind of technology. They employ many developers and want them to be as productive as possible.

“A recent study by Meta on their tool CodeCompose reports that 8 percent of the total number of lines of code to be written can be predicted by CodeCompose. But that doesn’t mean that CodeCompose only makes correct suggestions. Only a quarter of the suggestions are actually accepted. So as a developer you have to be alert and choose what is and what is not good. A GitHub study reports that developers who use such a so-called co-pilot enjoy their work more and are therefore more productive.

“All these studies still have a ‘we from toilet duck’ content, with the companies themselves proclaiming how useful their tools are. There are no independent evaluations on open data yet, but they will be.

“GPT as a programming assistant is only one of the possible applications. There are more conceivable, for example a continuously open chat window in which both the developer and GPT can ask questions. At the moment it is still a problem that you do not know where the answers come from, and whether they are correct. In time, GPT will also be used in combination with search engines, as is already the case in Bing.

“Another application of GPT is to aid in software testing, especially in formulating interesting test cases. I also think that GPT can help make programming more accessible to everyone. Think of a ChatGPT dialog linked to a spreadsheet, where you say what you want in the dialog, and GPT helps you build the desired spreadsheet interactively.

“The distance to building complex software is still very large. Trying to create an income tax system with ChatGPT is a fun exercise. ChatGPT then warns that taxes can be very complicated, with many exceptions. And all those rules and exceptions will have to be formulated precisely. And then you are programming again.”