Large Language Models and ChatGPT for Legal Professionals
Interview with Boas Loeb, Product Manager Data, Legartis
Chat GPT is on everyone's lips. And rightly so. Now ChatGPT 4 has also been officially released and now interacts with images and multimodal content. The artificial intelligences were developed to gain human-like speech capabilities. They can be used for many applications, including legal. They are able to understand, interpret and generate natural language. But even though they are already used in many areas, they also have their limitations. What can Large Language Models do in the legal context? Boas Loeb, Product Manager Data, and David A. Bloch, CEO, Legartis, talked about the capabilities and limitations of these models and what users need to consider for their application in the legal field.
For the legally informed, but not technically savvy: What does the release of GPT 4 mean?
GPT 4 is a continuation and an improvement of GPT 3 or 3.5. One difference is: GPT 4 is multimodal, the model is trained on images and also on text. And GPT 4 also takes both as input: you can send images to the model, but also linguistic content. What makes these models a little bit more special is that they have some understanding of the world. You can ask them as users: about things that require the models to understand how the world works. The idea of multimodal networks is that they have a better understanding of the world because they don't just know text, they've seen pictures. This is new in the legal field. It would now be conceivable, for example, that such a model would evaluate images and recognize how the facts are and not just use linguistic information as input.
Is multimodality something new in large language models?
Basically, no. It has been shown before that such models give better results and have a better understanding of the world. What's really new with ChatGPT is the free way of prompting, of input that is. Since ChatGPT is a chatbot, you can write anything and it will answer you in a good way. Most large language models require a bit more finesse in prompting. There's a structure you have to follow in how you ask the question to present the best results because they're trained to do that. So what's really exciting about ChatGPT is that it's freer in prompting and input.
Staying with Large Language Models, how do LLMs work in the legal field, who is driving this, and what is important here?
You have to distinguish between two things. On the one hand, there are these Large Language Models who have read an enormous amount of text and they get better the more text they have read. They benefit from very different sources. That is, even if they have read non-legal texts, they benefit from other texts in order to formulate legal questions and sentences better. It is relatively costly to train such large models. It is the drivers like Google, Facebook or OpenAI that train these models. But there is also Bloom, for example. A model that is assembled by researchers and trained around the world. That's a similar-sized model that's open source. There are different players building this basic model and then there are applications of it. And that's where Legal AI and its application to the legal space approaches.
In the legal context, there are different players, like Legartis, that are really trying to figure out: How do we best use these models so that there's a real benefit to the end user? How do we create a product that helps him in his daily work and that takes work off his plate? That is the scope of the application. In order to create this benefit, there is, on the one hand, finetuning. This involves taking a large language model and optimizing it for a specific task. And the other is prompting, which is becoming increasingly important: Asking questions. What questions do I have to ask a model so that I get exactly the answer or the form of answer that I need?
So there are finetuning and use cases. How is that done, what can I imagine about it? Who plays a part in this?
Finetuning is a kind of specialization. You take a large language model that is capable of many things and then train it again specifically for certain tasks. To do this, you need supervised learning, you need feedback from people: What is right and what is wrong? In this way, the model is trained for exactly these tasks. For example, ChatGPT learns in finetuning how it reacts best in a chat context. The task is to chat with a human and give the best possible responses in that chat.
Other models are specialized, for example, in answering yes-no questions or in reasoning. Here, the model makes statements about why it comes to its conclusion. So there are different specializations. Different vendors take advantage of that. Legartis does the same thing. We also do fine-tuning for tasks that are relevant in our use case. That's mainly the classification with us.
However, this seems to me to be a probably important step now, especially in view of the application in the legal context.
The Large Language Models are enormously good on their own. Finetuning, however, can improve the quality of the application even more. I think the exciting thing in the legal field is that right and wrong is much less binary than in almost all other sciences. There are often different opinions. That means, especially there, it's very important to annotate coherently. Sometimes there is a certain view that one follows. Lawyers know this well, and then it is important to follow this view coherently.
This is something that is important in Legartis. When we classify contracts, there might be different views on how to classify a certain clause. We put a lot of work into creating a coherent ontology. That is a breakdown of these individual categories that is coherent. And just by having a coherent task, we can train these models to get the results that we want, if possible.
You could say that more and more powerful engines are being put there. But the providers make sure to make the difference whether it is a normal car or can really become a race car. What does that mean for use cases like contract testing?
If we look at the contract review, then a model like ChatGPT can theoretically already do an enormous amount. That is, you can ask questions relatively freely and ChatGPT also gives good answers. However, a single contract review does not consist of a single question. If you give ChatGPT a contract and ask if you should sign that contract, yes or no, ChatGPT doesn't know that either. It lacks context. This context is one of the limitations. A human reviewing a contract knows: Who is the other party? What is important to my business? What are risks that we want to take or not take? And what specifically needs to be reviewed for us in this clause? This is contextual information that you could theoretically solve with ChatGPT. But you would have to ask hundreds or rather thousands of different questions, which together would then result in a contract review. So to really use the model, you have to do intermediate steps. I first have to figure out: What are the questions I want to ask? What are the prompts that we need? Then what do we do with that information? And what are the rules and relevant context for my business that we need to test?
You said at the beginning that the large language models are already very good. Do I still need the fine tuning? Will I always need it?
Only time will tell. Finetuning is specialization. What we're seeing in this whole space is that we're moving away from specialized models to more general models. And that Large Language models are getting better and better and more and more useful.
But of course, what is always of enormous importance is testing. So if you want to use a model like that, you have to test whether it actually does what it should, whether the results are good. And that's actually very similar to finetuning. You have to annotate and you need examples to see what is expected as a result and then check it.
You say Large Language Models can do a lot. Can you share your first impression: What can ChatGPT do?
Quite simply, ChatGPT can predict the next word. And it has some context. So if ChatGPT has the description of a situation and a context, it can predict: What comes next? That means I have to give input and describe what my business is and what my situation is. The more I reveal, the more likely ChatGPT will generate an adequate response. Where will this be used? I think that's where we haven't begun to see everything that's possible. But the simplest thing is of course text generation. So writing emails, but also making suggestions for individual clauses. It can also summarize what is useful. Let's say you have a super long, complex clause now and you would like to have that translated in a simplified way. Then you can use that.
Can it also detect contradictions in a document?
If you were to take just two sentences and then ask the question, "Do you see a contradiction in this?", I imagine it could handle that. But if I put the whole contract as input into the model, then it won't be able to answer that question directly. If you then ask, "Do you see any contradictions anywhere?" it will probably find any contradictions, but not the ones you are concerned about. Then you would probably have to ask a series of 20 very specific questions about just that contradiction. Then maybe it would work.
Why should I, as a legal professional, be concerned with this issue now?
It's like the ten-finger system. The earlier you learn it, the more useful it is, the longer you benefit from it. Because it makes you work much faster. Regardless of whether you're a lawyer or not. It's clear that these models are here to stay. Email creation, document creation, presentations - these models can do an enormous amount and make life very easy. In the legal field, they are very useful because large language models have a lot of knowledge about a lot of text, which is beneficial for lawyers. And the other thing is: they are able to write and understand text, which is also a core legal task.
So that means, as a lawyer, you can either wait and see how it evolves. And it will evolve. But already it's becoming clear that a lot can be done with these models. And to learn what questions to ask or what to expect or just not to expect from these models: Learning these early on will just make us much, much more efficient.
So in other words. I'm not going to be replaced today or tomorrow. But I will certainly fall behind if I don't deal with it now. Or will we humans be replaced by AI after all?
In the coming years we will have two groups of lawyers: The one that knows how to use these models and understands where they can be used and useful. And they will probably be able to work much more efficiently and quickly.
Let's imagine I'm a in-house legal in a company and I have to deal with this. What are my options?
The great thing about ChatGPT is that it is so simple. The threshold is very low. You don't have to be careful at all what prompts you write. You can really just use it directly. You have to be careful, of course, and please, please don't put any sensitive data in there. You should generally be careful what you put in there. But basically you can just try it out, experiment and use it for all kinds of questions. You can also ask it what to ask it. There are hardly any limits. The most important thing is to try it out and learn a bit: What can such models do, what can they not yet do? That is the easiest way.
For lawyers, sensitive data is probably exactly the inhibiting element. Are there alternatives? How can I deal with this as a lawyer?
It depends on how technical the person is. If you want, you can try out a lot of different possibilities. There are various models via API, via Hugging Face, which is a platform where many such models are published. But the threshold here is much higher than with ChatGPT. On the one hand, the prompts have to be more accurate, otherwise you don't get good results. And the other is the interface itself. ChatGPT simply has a very simple, pleasant interface.
Are other AI applications, such as Legartis, a way to learn for the future?
Yes, I think we have to evaluate the extent to which our users want to try things out and prompt themselves, or to what extent they want us to prepare them. For us it is clear that there is a mixture. This means that such prompting will definitely be possible in the future via Legartis.
What are the arguments in favor of independent prompting and what are the arguments against it?
Of course, the freedom that exists speaks for this. That means that every user and every customer can decide for themselves what they want to have evaluated or tested. What speaks against this is the quality and, above all, the security. We have test cases for each of these prompts that we publish and make available to our customers. We have many different examples where lawyers have annotated what should come out. This allows us to evaluate what is really the right question here. What questions do we have to ask to get the right answer? As a normal customer, you would still have to do this effort to really test this for these different cases. That is certainly a difficulty. The problem is, if I do this myself as a customer and I'm not so experienced and trained in this, then I may not use the best prompt and then not get the desired results.
Thank you very much for the interview.
About Boas Loeb, Product Manager Data, Legartis:
As Product Manager Data, Boas Loeb is intensively involved with the possibilities and applications of legal tech and, in particular, legal AI. He is responsible for product development on the Data side. He has a legal background.
About David A. Bloch, CEO, Legartis:
David was an attorney at a leading Swiss law firm and worked in diverse areas of law. Since 2016, he has been passionately focused on developing digital solutions for legal departments. Today, he is CEO of Legartis, an award-winning legal tech platform for AI-assisted contract review. David is co-founder of the think tank foraus and since 2014 Global Shaper of the World Economic Forum.