Skip to content

πŸ“ƒ A contracts clause summarization system using LLM and vector database

Notifications You must be signed in to change notification settings

d1pankarmedhi/legal_summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Legal Summarizer

Pandas Python Streamlit

Summarizing legal documents made easy using Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs).


πŸ“Œ Overview

Legal documents are often dense, complex, and difficult for non-lawyers to understand. This project leverages Information Retrieval and Context Augmentation using Large Language Models (LLMs) to simplify and summarize contracts, agreements, and other legal texts.

Fig: High-level system architecture


🚨 The Problem: Understanding Legal Documents is Hard

  • Legal documents use complex terminologies that require domain expertise.
  • They contain long, dense sentences that make key information difficult to extract.
  • They rely on statutes, legal citations, and references, assuming prior knowledge.
  • The conservative and risk-averse language results in intricate phrasing.
  • Misinterpretation can lead to serious consequences, discouraging individuals from handling contracts themselves.

πŸ€– The Solution: AI-Powered Legal Summarization

With the advancements in Large Language Models (LLMs), we can now:
βœ… Extract key insights from legal documents
βœ… Summarize complex clauses into easy-to-read formats
βœ… Retrieve relevant information using RAG (Retrieval-Augmented Generation)
βœ… Improve accessibility of legal content for non-lawyers


πŸ“Œ How Does It Work?

πŸ” Retrieval-Augmented Generation (RAG)

RAG enhances the summarization process by first searching for relevant content and then reconstructing it using an LLM.

Step 1: Document Processing

  • Processing complex agreements, contracts and other legal documents, extracting information using OCR, transformers, etc and chunking, and tagging them with relevant topics for efficient keyword search.

Step 2: Document Retrieval

  • Uses BM25 ranking (keyword-based) or Semantic Search (context-based) to fetch relevant parts of legal documents.

Step 3: Context Augmentation

  • The retrieved text is then passed to an LLM to generate a structured and readable summary.

Learn More About RAG πŸ”— Exploring the Power of RAG & OpenAI’s Function Calling for Q&A


πŸ›  Installation & Setup

1️⃣ Create a Virtual Environment

$ python -m venv venv
$ venv\Scripts\activate  # Windows
$ source venv/bin/activate  # macOS/Linux

2️⃣ Install Dependencies

$ pip install -r requirements.txt

3️⃣ Run the Application

$ streamlit run summarize.py

πŸ“œ License

This project is licensed under the MIT License. See the LICENSE file for details.


πŸ’‘ Contributing

Contributions are welcome! Feel free to submit an issue or a pull request.


πŸ’‘ Need Help?

If you have any questions, feel free to reach out! πŸš€

About

πŸ“ƒ A contracts clause summarization system using LLM and vector database

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages