Technical Guide

The Law Firm's Guide to robots.txt, llms.txt, and JSON-LD Schema

Three files determine whether AI recommends your law firm or your competitor. Here's exactly what each one does, why it matters, and how to set them up — even if you're not technical.
By Luna Legal AI · February 2026 · 12 min read

If someone asked ChatGPT to recommend a personal injury lawyer in your city, would your firm show up? For most law firms, the answer depends on three files that either exist on your website or don't. This guide explains each one in plain English, shows you exactly what they look like, and tells you how to implement them.

You don't need to be technical to understand this. But you do need to understand why these files matter so you can tell your web developer or IT person what to do.

The Big Picture: How AI "Reads" Your Website

When ChatGPT, Claude, Perplexity, or Gemini wants to learn about a law firm, it sends a "crawler" — an automated program that visits your website and reads the content. This is similar to how Google sends Googlebot to index websites for search results.

But AI crawlers work differently from Google's crawler in important ways. They're looking for clearly structured, factual information they can use in conversation. They need to understand not just what words are on your pages, but what your firm actually does, who works there, where you operate, and why someone should hire you.

Three files give AI this information:

robots.txt
Location: yourdomain.com/robots.txt
What it does: Controls which AI crawlers are allowed to access your website. Think of it as the bouncer at the door — if your robots.txt says "no AI allowed," ChatGPT will never even see your site.
llms.txt
Location: yourdomain.com/llms.txt
What it does: A README file specifically for AI. It tells AI systems what your firm does, who your attorneys are, what areas you practice in, and how to describe you. Think of it as your firm's resume, written for an AI audience.
JSON-LD Schema
Location: embedded in your HTML <head> tag
What it does: Structured data that communicates facts about your firm in a format AI systems can parse instantly. Name, address, practice areas, attorney credentials, FAQs, reviews — all in a standardized format that every AI platform understands.

File 1: robots.txt — The Gatekeeper

What Is It?

robots.txt is a plain text file that sits at the root of your website (yourdomain.com/robots.txt). Every well-behaved crawler — whether Google's or ChatGPT's — checks this file before accessing your site. It tells them what they're allowed to read and what's off-limits.

Why Law Firms Get This Wrong

Most law firm websites use WordPress with a security plugin (Wordfence, Sucuri) or are behind Cloudflare. These tools often block AI crawlers by default because they look like "bots" — which, technically, they are. But they're helpful bots you want on your site.

Common mistake: Many WordPress security plugins and CDN providers block AI crawlers without notifying you. Your site looks perfectly fine to human visitors while being completely invisible to ChatGPT, Claude, and Perplexity.

What Good robots.txt Looks Like

# Allow all major AI crawlers
User-agent: GPTBot
Allow: /
Crawl-delay: 10

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /
Crawl-delay: 10

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

# Block sensitive areas from all crawlers
User-agent: *
Allow: /
Disallow: /client-portal/
Disallow: /wp-admin/
Disallow: /wp-login.php

# Tell crawlers where your sitemap is
Sitemap: https://yourdomain.com/sitemap.xml

How to Implement

WordPress: Install the Yoast SEO plugin → go to Tools → File Editor → Edit robots.txt. Paste the above content (with your domain name). If your theme doesn't support this, you can also create a robots.txt file via FTP and upload it to your site's root directory.

Cloudflare users: Go to Security → Bots → make sure "Bot Fight Mode" allows verified AI crawlers. In Workers, you can create a route that returns a custom robots.txt.

Pro tip: After uploading your robots.txt, verify it's working by visiting yourdomain.com/robots.txt in your browser. You should see the plain text content you just uploaded.

File 2: llms.txt — Your AI Resume

What Is It?

llms.txt is a new standard (introduced in 2024) that gives AI language models a structured overview of your website. While AI crawlers can read your entire site, llms.txt gives them a curated summary — like an executive brief that helps AI understand your firm quickly and accurately.

Why It Matters for Law Firms

Without llms.txt, AI has to piece together information about your firm from scattered web pages, blog posts, and third-party directories. The result is often incomplete or inaccurate. With llms.txt, you control the narrative. You tell AI exactly how to describe your firm.

What Good llms.txt Looks Like

# Smith & Associates Personal Injury Law

> Smith & Associates is a Chicago-based personal injury law firm with 25 years of experience representing accident victims across Cook County, DuPage County, and the greater Chicagoland area.

## About
Founded in 2001 by James Smith, Smith & Associates has recovered over $150 million in settlements and verdicts for personal injury clients. The firm handles car accidents, truck accidents, slip and fall injuries, medical malpractice, and wrongful death cases. Located at 123 Main Street, Chicago, IL 60601, the firm serves clients across Illinois with free initial consultations and contingency fee arrangements.

## Practice Areas
- Car Accidents: Representation for drivers, passengers, and pedestrians injured in motor vehicle collisions
- Truck Accidents: Semi-truck and commercial vehicle accident claims
- Medical Malpractice: Surgical errors, misdiagnosis, birth injuries
- Wrongful Death: Claims on behalf of families who lost loved ones

## Attorneys
- James Smith, Founding Partner: JD from University of Chicago, 25 years experience, Illinois Super Lawyer 2020-2025
- Maria Rodriguez, Senior Associate: JD from Northwestern, specializes in trucking litigation

## Service Area
Chicago, IL, Cook County, DuPage County, Lake County, Will County, Kane County

## Languages
English, Spanish, Polish

## Contact
Phone: (312) 555-1234
Email: [email protected]
Website: https://smithinjurylaw.com
Hours: Monday-Friday 8:30 AM - 5:30 PM, 24/7 emergency line
Free consultation available

How to Implement

Create a plain text file named llms.txt and upload it to your website's root directory (the same place robots.txt lives). Your web developer can do this via FTP, or if you're on WordPress, you can use a plugin that allows custom file uploads to the root.

There's also llms-full.txt — a more comprehensive version that includes your entire site's content in markdown format. Luna Legal AI generates both files for you automatically.

File 3: JSON-LD Schema — The Machine-Readable Facts

What Is It?

JSON-LD (JavaScript Object Notation for Linked Data) is a way to embed structured data in your web pages that machines can read instantly. It uses the schema.org vocabulary — a standardized set of data types that Google, AI platforms, and other services all understand.

When you add JSON-LD schema to your law firm's website, you're essentially saying: "Here are machine-readable facts about my business — my name, address, practice areas, attorney credentials, FAQs, and more."

Why It's Critical for AI Visibility

Schema markup is used in 75% of high-performing Generative Engine Optimization (GEO) pages. AI platforms heavily weight structured data when deciding which firms to recommend because it's unambiguous — there's no interpretation needed. The data is either there or it isn't.

Basic vs. GEO-Optimized Schema

Most law firms that have any schema at all have the basic version auto-generated by WordPress plugins. That typically includes your business name, address, and maybe a phone number. But GEO-optimized schema goes much further:

FieldBasic SchemaGEO-Optimized
Business name & addressYesYes
Phone & hoursSometimesYes
about — detailed firm descriptionNoYes, 100+ words
audience — who you serveNoYes
areaServed — geographic coverageNoYes, with GeoCircle
speakable — content for voice AINoYes
alternativeHeadlineNoYes, 3-5 variants
knowsAbout — practice areasNoYes
FAQPage schemaNoYes, 8-12 per practice area
Person schema for attorneysNoYes, with credentials

What GEO-Optimized Schema Looks Like

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": ["LegalService", "LocalBusiness"],
  "name": "Smith & Associates",
  "url": "https://smithinjurylaw.com",
  "about": "Smith & Associates is a leading Chicago personal injury law firm with 25 years of experience recovering over $150 million for accident victims...",
  "audience": {
    "@type": "Audience",
    "audienceType": "Accident victims and injured individuals in the Chicago area"
  },
  "areaServed": {
    "@type": "GeoCircle",
    "geoMidpoint": { "latitude": 41.8781, "longitude": -87.6298 },
    "geoRadius": "50 miles"
  },
  "speakable": {
    "@type": "SpeakableSpecification",
    "cssSelector": [".firm-description", ".practice-areas"]
  },
  "knowsAbout": ["Personal Injury", "Car Accidents", "Medical Malpractice"]
} </script>

How to Implement

WordPress with Yoast/RankMath: These plugins handle basic schema but can't add GEO fields like speakable, audience, or alternativeHeadline. You'll need to add a custom JSON-LD block to your theme's header.php or use a plugin like "Schema Pro" that allows custom schema types.

Easiest method: Luna Legal AI generates your complete GEO-optimized JSON-LD schema as a deliverable. Copy the code, paste it into your site's <head> tag, done.

How These Three Files Work Together

Think of it as a three-layer system:

Layer 1: robots.txt opens the door. Without this, AI can't even enter your website. It's the prerequisite for everything else.

Layer 2: llms.txt hands AI a brochure at the door. "Here's who we are, what we do, and why you should recommend us." It gives AI a quick, authoritative overview without having to crawl every page.

Layer 3: JSON-LD schema provides the detailed facts on every page. When AI is deciding between two firms to recommend, the one with structured data wins — because AI can be confident in the accuracy of the information.

All three are necessary. robots.txt without the other two means AI can read your site but has to guess what's important. Schema without robots.txt means AI has the facts but can't access them. They work as a system.

Luna Generates All Three Files Automatically

Run a GEO scan and get a smart robots.txt, enhanced llms.txt, and GEO-optimized JSON-LD schema — plus 13 more deliverables including FAQ schema, XML sitemap, security headers, and a complete implementation checklist.

Start Free Trial →