Structured Output
JSON, Tables & Reliable Extraction
Why Structured Output Matters
Free-text AI responses are useful for humans but useless for code. If your application needs to parse the model's output — store it in a database, pass it to another API, display it in a UI — you need structured output.
The challenge: LLMs generate text token by token. They don't inherently produce valid JSON or well-formed CSV. But with the right prompting techniques, you can get reliable structured output 95%+ of the time.
Getting JSON Output
Basic JSON Request
Extract the following fields from this product description and return valid JSON:
Product: "The ProDesk 4K Monitor features a 32-inch IPS panel with 3840x2160 resolution, 99% sRGB coverage, USB-C with 65W charging, and an adjustable stand. Available for $549."
Return format:
{
"name": string,
"screen_size": string,
"resolution": string,
"price": number,
"features": string[]
}Schema Enforcement
For production use, provide the exact schema with types and constraints:
Extract invoice data from the text below. Return a JSON object matching this exact schema:
{
"invoice_id": string (format: "INV-XXXX"),
"vendor": string,
"line_items": [
{
"description": string,
"quantity": number (integer),
"unit_price": number (2 decimal places),
"total": number (2 decimal places)
}
],
"subtotal": number,
"tax_rate": number (as decimal, e.g. 0.08 for 8%),
"tax_amount": number,
"total": number,
"date": string (ISO 8601 format)
}
Rules:
- If a field cannot be determined, use null
- Do not include any text outside the JSON object
- Ensure all numbers are actual numbers, not stringsThe explicit schema, type annotations, and rules dramatically reduce malformed output.
Getting Table Output
Markdown tables work well for comparison and summary tasks:
Compare the products in the attached data. Output a markdown table with columns:
| Product | Price | Key Feature | Best For |
Sort by price ascending. Include all products.For CSV output, be explicit about delimiters and quoting:
Convert the following data to CSV format.
- Use comma delimiters
- Quote fields that contain commas
- First row must be headers
- Use ISO 8601 dates (YYYY-MM-DD)
- Output ONLY the CSV, no explanationHandling Edge Cases
Missing Data
If a field is not mentioned in the source text:
- For strings: use null (not empty string)
- For numbers: use null (not 0)
- For arrays: use empty array []
- For booleans: use null (not false)Ambiguous Data
If a value is ambiguous:
- Include your best interpretation in the field
- Add an "ambiguous_fields" array listing field names that required interpretation
- Example: {"name": "J. Smith", "ambiguous_fields": ["name"]}Multiple Items
If the text contains multiple invoices, return a JSON array.
Each element must follow the schema above.
Maintain the order they appear in the source text.Validation Strategies
Even with perfect prompts, LLM output can be malformed. Build validation into your pipeline:
1. JSON.parse Check
function parseAIResponse(text: string): unknown {
// Strip markdown code fences if present
const cleaned = text.replace(/```json?\n?/g, "").replace(/```/g, "").trim();
return JSON.parse(cleaned);
}2. Schema Validation
Use a library like Zod or Ajv to validate the parsed object matches your expected shape. Reject and retry if it doesn't.
3. Retry with Error Context
If parsing fails, send the error back to the model:
Your previous response was not valid JSON. The error was:
"Unexpected token at position 142"
Please fix the output and return valid JSON only.Building Extraction Prompts
A reliable extraction prompt combines several techniques from earlier modules:
[System]
You are a data extraction specialist. You extract structured data from unstructured business documents. Always return valid JSON matching the provided schema. Never include explanatory text outside the JSON.
[User]
Extract all contact information from this email thread.
Schema:
{
"contacts": [{
"name": string,
"email": string | null,
"phone": string | null,
"company": string | null,
"role": string | null
}]
}
Email thread:
[paste email data here]This combines: system prompt (Module 4) + schema enforcement + edge case handling.
Practice Tasks
data/emails.json — return JSON with sender, intent, urgency, and entitiesdata/invoices.csv to validated JSON objects, then back to CSV — verify round-trip accuracydata/reviews.json that outputs a comparison table across all productsKey Takeaways
This is chapter 5 of Prompt Engineering Essentials.
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details