Back to guides
5
5 min

Structured Output

JSON, Tables & Reliable Extraction

Why Structured Output Matters

Free-text AI responses are useful for humans but useless for code. If your application needs to parse the model's output — store it in a database, pass it to another API, display it in a UI — you need structured output.

The challenge: LLMs generate text token by token. They don't inherently produce valid JSON or well-formed CSV. But with the right prompting techniques, you can get reliable structured output 95%+ of the time.

Getting JSON Output

Basic JSON Request

Extract the following fields from this product description and return valid JSON:

Product: "The ProDesk 4K Monitor features a 32-inch IPS panel with 3840x2160 resolution, 99% sRGB coverage, USB-C with 65W charging, and an adjustable stand. Available for $549."

Return format:
{
  "name": string,
  "screen_size": string,
  "resolution": string,
  "price": number,
  "features": string[]
}

Schema Enforcement

For production use, provide the exact schema with types and constraints:

Extract invoice data from the text below. Return a JSON object matching this exact schema:

{
  "invoice_id": string (format: "INV-XXXX"),
  "vendor": string,
  "line_items": [
    {
      "description": string,
      "quantity": number (integer),
      "unit_price": number (2 decimal places),
      "total": number (2 decimal places)
    }
  ],
  "subtotal": number,
  "tax_rate": number (as decimal, e.g. 0.08 for 8%),
  "tax_amount": number,
  "total": number,
  "date": string (ISO 8601 format)
}

Rules:
- If a field cannot be determined, use null
- Do not include any text outside the JSON object
- Ensure all numbers are actual numbers, not strings

The explicit schema, type annotations, and rules dramatically reduce malformed output.

Getting Table Output

Markdown tables work well for comparison and summary tasks:

Compare the products in the attached data. Output a markdown table with columns:
| Product | Price | Key Feature | Best For |

Sort by price ascending. Include all products.

For CSV output, be explicit about delimiters and quoting:

Convert the following data to CSV format.
- Use comma delimiters
- Quote fields that contain commas
- First row must be headers
- Use ISO 8601 dates (YYYY-MM-DD)
- Output ONLY the CSV, no explanation

Handling Edge Cases

Missing Data

If a field is not mentioned in the source text:
- For strings: use null (not empty string)
- For numbers: use null (not 0)
- For arrays: use empty array []
- For booleans: use null (not false)

Ambiguous Data

If a value is ambiguous:
- Include your best interpretation in the field
- Add an "ambiguous_fields" array listing field names that required interpretation
- Example: {"name": "J. Smith", "ambiguous_fields": ["name"]}

Multiple Items

If the text contains multiple invoices, return a JSON array.
Each element must follow the schema above.
Maintain the order they appear in the source text.

Validation Strategies

Even with perfect prompts, LLM output can be malformed. Build validation into your pipeline:

1. JSON.parse Check

function parseAIResponse(text: string): unknown {
  // Strip markdown code fences if present
  const cleaned = text.replace(/```json?\n?/g, "").replace(/```/g, "").trim();
  return JSON.parse(cleaned);
}

2. Schema Validation

Use a library like Zod or Ajv to validate the parsed object matches your expected shape. Reject and retry if it doesn't.

3. Retry with Error Context

If parsing fails, send the error back to the model:

Your previous response was not valid JSON. The error was:
"Unexpected token at position 142"

Please fix the output and return valid JSON only.

Building Extraction Prompts

A reliable extraction prompt combines several techniques from earlier modules:

[System]
You are a data extraction specialist. You extract structured data from unstructured business documents. Always return valid JSON matching the provided schema. Never include explanatory text outside the JSON.

[User]
Extract all contact information from this email thread.

Schema:
{
  "contacts": [{
    "name": string,
    "email": string | null,
    "phone": string | null,
    "company": string | null,
    "role": string | null
  }]
}

Email thread:
[paste email data here]

This combines: system prompt (Module 4) + schema enforcement + edge case handling.

Practice Tasks

  • Extract structured data from each email in data/emails.json — return JSON with sender, intent, urgency, and entities
  • Convert data/invoices.csv to validated JSON objects, then back to CSV — verify round-trip accuracy
  • Build an extraction prompt for data/reviews.json that outputs a comparison table across all products
  • Key Takeaways

  • Provide the exact JSON schema with types, not just a vague "return JSON."
  • Define how to handle missing, ambiguous, and multi-item data explicitly.
  • Always strip markdown fences and validate parsed output in code.
  • Retry with error context when parsing fails — models can fix their own malformed output.
  • Combine system prompts + schema + edge case rules for production-grade extraction.
  • This is chapter 5 of Prompt Engineering Essentials.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details