Data Extraction Prompt
Extract structured data from unstructured text with consistent schemas and reliable parsing.
The Problem
Unstructured text — emails, logs, documents, support tickets — contains valuable data buried in natural language. Manual extraction is slow and error-prone. Regex fails on the variability of human writing. AI excels at understanding context and extracting structured data, but only if you define the output schema precisely and handle edge cases upfront.
The Prompt
Extract structured data from the following text. Return ONLY valid JSON matching the schema below.
SCHEMA:
{
"entities": [{
"field1": "string — [description and format rules]",
"field2": "string | null — [description, null if not found]",
"field3": "number — [description, unit, range]",
"confidence": "number — 0.0-1.0, how certain you are about this extraction"
}],
"unmatched": "string[] — any text that seems relevant but does not fit the schema"
}
EXTRACTION RULES:
- Extract ALL matching entities, not just the first one
- Use null for fields that are not present in the text, never guess
- Normalize dates to ISO 8601 format (YYYY-MM-DD)
- Normalize currencies to numeric values with currency code
- If a field is ambiguous, extract it AND set confidence below 0.7
- Include unmatched relevant text so nothing is silently lost
TEXT TO EXTRACT FROM:
"""
[paste your unstructured text here]
"""
Example Output
{
"entities": [
{
"company": "Acme Corp",
"contact_email": "jane@acme.com",
"deal_value": 45000,
"currency": "USD",
"close_date": "2026-06-15",
"stage": "negotiation",
"confidence": 0.95
},
{
"company": "TechStart GmbH",
"contact_email": null,
"deal_value": 12000,
"currency": "EUR",
"close_date": "2026-07-01",
"stage": "proposal",
"confidence": 0.82
}
],
"unmatched": ["mentioned possible partnership with DataFlow but no deal details provided"]
}
When to Use
Use data extraction prompts for processing emails into CRM entries, parsing log files into structured incidents, converting free-text reports into database records, or any workflow where unstructured input needs to become structured output. It replaces hours of manual data entry with seconds of AI processing.
Pro Tips
- Include a confidence score — this lets you auto-accept high-confidence extractions and flag low-confidence ones for human review.
- Add an “unmatched” field — text that does not fit your schema should not be silently dropped; capture it for review.
- Provide normalization rules — “convert all dates to ISO 8601” prevents format inconsistency across extractions.
- Test with messy real data — the model handles clean text well; test with abbreviations, typos, and incomplete entries to validate robustness.