Data Entry & Extraction
Pull structured data from unstructured text instantly.
Data Entry & Extraction
One of AI's genuine superpowers: turning messy, unstructured text into clean, structured data. Tasks that used to take an intern all day now take 30 seconds. This is not a marginal improvement -- it is an order-of-magnitude change in how fast you can process information.
The pattern for data extraction is always the same: tell AI exactly what fields you want, exactly what format to use, and what to do when data is missing or ambiguous. Specificity in, accuracy out.
Extracting Data from Text
The basic pattern -- tell AI what fields you want and what format to use:
"Extract the following information from this email/document and format as JSON:
- Company name
- Contact person
- Email address
- Phone number
- Requested service
- Budget mentioned
- Timeline
If any field is not found, use 'N/A'.
If you're unsure about an extraction, add a confidence note.
Document: [paste text]"
Invoice Processing
"Extract line items from this invoice text and format as a table:
| Item | Quantity | Unit Price | Total |
Also extract: Invoice number, date, vendor name, subtotal, tax, and grand total.
Double-check that line items add up to the subtotal. Flag any discrepancies.
Invoice: [paste invoice text]"
The "double-check the math" instruction is critical. AI sometimes misreads numbers, transposes digits, or gets confused by formatting. For any financial data extraction, always include a validation step and verify the totals yourself before acting on them.
Business Card / Contact Parsing
"Parse these business card details into a contact database format:
[paste business card text or multiple cards]
Format: CSV with columns: First Name, Last Name, Title, Company, Email, Phone, Address, LinkedIn URL
If a card has multiple phone numbers, use the mobile number for the Phone column and note others in a Notes column."
The Batch Processing Pattern
For processing many items at once, always provide an example first:
"Process each entry below. For each one, extract: name, date, amount, and category.
Example:
Input: 'Paid $450 to ABC Plumbing on March 15 for bathroom repair'
Output: Name: ABC Plumbing | Date: March 15 | Amount: $450 | Category: Home Repair
Now process these entries:
1. 'Bought $89 of office supplies at Staples on Tuesday'
2. 'Monthly Spotify subscription $14.99 charged Jan 1'
3. 'Dinner with client Sarah at Olive Garden $127.50 on 3/10'
4. 'Paid quarterly insurance premium $2,400 to State Farm'
5. 'Gas station fill-up $62.18 Shell on Highway 101'"
Providing one example before the batch dramatically improves consistency. AI learns your expectations for format and detail level from that single example and applies it uniformly across all items. Without the example, you will get inconsistent formatting that takes time to clean up.
Cleaning Messy Data
Real-world data is never clean. AI handles the mess:
"This spreadsheet data has inconsistencies. Standardize it:
- Names: First Last format, proper capitalization
- Phone numbers: (XXX) XXX-XXXX format
- Addresses: Full format with ZIP code
- Dates: YYYY-MM-DD format
- Remove duplicates (keep the most complete entry)
Data: [paste messy data]"
Transforming Between Formats
"Convert this data from [format A] to [format B]:
- JSON to CSV
- Email thread to structured table
- Meeting notes to Jira tickets
- Resume text to database fields
[paste data]"
Tips for Accurate Extraction
- 1Always specify the output format (JSON, CSV, table, etc.) -- ambiguity kills accuracy
- 2Provide one example for complex extractions -- AI learns your expectations
- 3Ask AI to flag uncertain extractions with a confidence indicator
- 4Always verify extracted numbers -- AI occasionally misreads or transposes digits
- 5For critical data, ask AI to double-check itself: "Review your extraction above. Did you miss anything or make any errors?"
- 6Chunk large datasets -- Process 20-50 items at a time, not 500
Exercises
0/4Find a real receipt, invoice, or email with data in it. Use AI to extract all data into a structured JSON format. Check every field against the original -- how accurate was the extraction?
Hint: Try a receipt with at least 5 line items. Check the math on totals -- that's where AI most commonly makes mistakes.
Create a batch processing prompt that converts 5 informal expense descriptions into a structured expense report table with columns: Date, Vendor, Amount, Category, Payment Method.
Hint: Make up realistic entries like "coffee meeting with client $12 at Starbucks." Always provide one example for AI to follow.
Why should you always provide an example when using the batch processing pattern?
When extracting critical data, you should ask AI to _______ itself to catch missed items or errors.