LLM Infrastructure at Ruli: Challenges and How We Solve Them

LLMs are powerful, and many people believe that simply plugging them into a product will make everything work like magic. It doesn’t. We need to solve engineering challenges and make thoughtful infrastructure decisions to make the product shine. We’ve seen many competitors deliver impressive NDA demos, but their solutions perform subpar in production when dealing with unpredictable data input. Here are a few challenges we faced while building Ruli — and how we solved them:

Processing and Understanding Long Documents – Some legal documents are hundreds or even thousands of pages long, exceeding AI context limits and causing latency constraints. In Ruli, we built a prompting chain and chunking system that automatically scales itself on a serverless architecture, allowing us to efficiently process massive legal documents while keeping responses accurate and fast.
Reliable Citation – AI should complement, not replace, human expertise. Citations are critical step to ensure accuracy, reduce hallucination concerns and close the gap between human and AI work. Standard AI pipelines lose track of original document references after processing, making it difficult to trace answers back. At Ruli, we built a reliable chunking pipeline using fine-tuned OCR, preserving layout information. This allows citations to be directly sourced and rendered as bounding box highlights in the original document.
Handling Unstructured Data in Any Format – Many legal AI products only support text-based PDFs and Word files, but a significant portion of contracts exist outside traditional CLM systems, including scanned, handwritten, and wet-ink documents. Ruli processes these formats using advanced OCR and layout retention. We recently onboarded a customer that runs DataGrid to extract renewal clauses from thousands of wet-inked contract documents.
Layout-Aware Understanding – Legal documents, employee handbooks, and compliance policies often contain hierarchical structures with sections, subsections, and nested clauses. Traditional chunking mechanisms process text in isolation, failing to capture structural context. At Ruli, we built semantic-level chunking that retains document hierarchy, enabling AI to understand context better and generate more accurate, section-aware responses.

LLMs are powerful, and many people believe that simply plugging them into a product will make everything work like magic. It doesn’t. We need to solve engineering challenges and make thoughtful infrastructure decisions to make the product shine. We’ve seen many competitors deliver impressive NDA demos, but their solutions perform subpar in production when dealing with unpredictable data input. Here are a few challenges we faced while building Ruli — and how we solved them:

Processing and Understanding Long Documents – Some legal documents are hundreds or even thousands of pages long, exceeding AI context limits and causing latency constraints. In Ruli, we built a prompting chain and chunking system that automatically scales itself on a serverless architecture, allowing us to efficiently process massive legal documents while keeping responses accurate and fast.
Reliable Citation – AI should complement, not replace, human expertise. Citations are critical step to ensure accuracy, reduce hallucination concerns and close the gap between human and AI work. Standard AI pipelines lose track of original document references after processing, making it difficult to trace answers back. At Ruli, we built a reliable chunking pipeline using fine-tuned OCR, preserving layout information. This allows citations to be directly sourced and rendered as bounding box highlights in the original document.
Handling Unstructured Data in Any Format – Many legal AI products only support text-based PDFs and Word files, but a significant portion of contracts exist outside traditional CLM systems, including scanned, handwritten, and wet-ink documents. Ruli processes these formats using advanced OCR and layout retention. We recently onboarded a customer that runs DataGrid to extract renewal clauses from thousands of wet-inked contract documents.
Layout-Aware Understanding – Legal documents, employee handbooks, and compliance policies often contain hierarchical structures with sections, subsections, and nested clauses. Traditional chunking mechanisms process text in isolation, failing to capture structural context. At Ruli, we built semantic-level chunking that retains document hierarchy, enabling AI to understand context better and generate more accurate, section-aware responses.

LLMs are powerful, and many people believe that simply plugging them into a product will make everything work like magic. It doesn’t. We need to solve engineering challenges and make thoughtful infrastructure decisions to make the product shine. We’ve seen many competitors deliver impressive NDA demos, but their solutions perform subpar in production when dealing with unpredictable data input. Here are a few challenges we faced while building Ruli — and how we solved them:

Processing and Understanding Long Documents – Some legal documents are hundreds or even thousands of pages long, exceeding AI context limits and causing latency constraints. In Ruli, we built a prompting chain and chunking system that automatically scales itself on a serverless architecture, allowing us to efficiently process massive legal documents while keeping responses accurate and fast.
Reliable Citation – AI should complement, not replace, human expertise. Citations are critical step to ensure accuracy, reduce hallucination concerns and close the gap between human and AI work. Standard AI pipelines lose track of original document references after processing, making it difficult to trace answers back. At Ruli, we built a reliable chunking pipeline using fine-tuned OCR, preserving layout information. This allows citations to be directly sourced and rendered as bounding box highlights in the original document.
Handling Unstructured Data in Any Format – Many legal AI products only support text-based PDFs and Word files, but a significant portion of contracts exist outside traditional CLM systems, including scanned, handwritten, and wet-ink documents. Ruli processes these formats using advanced OCR and layout retention. We recently onboarded a customer that runs DataGrid to extract renewal clauses from thousands of wet-inked contract documents.
Layout-Aware Understanding – Legal documents, employee handbooks, and compliance policies often contain hierarchical structures with sections, subsections, and nested clauses. Traditional chunking mechanisms process text in isolation, failing to capture structural context. At Ruli, we built semantic-level chunking that retains document hierarchy, enabling AI to understand context better and generate more accurate, section-aware responses.