Documentation Index
Fetch the complete documentation index at: https://docs.extruct.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This guide explains how to choose and configure AI Tables columns for company research, people discovery, contact enrichment, and scoring.
It is meant to be actionable on its own: each supported column kind below includes the minimum working shape, required context, and the most common mistake.
Prerequisites
export EXTRUCT_API_TOKEN="YOUR_API_TOKEN"
Generate tokens in Dashboard API Tokens. For full setup, see Authentication.
Endpoints used
Column Guide in 60 Seconds
A column is one field Extruct fills for each row.
Every column has kind, name, and key.
Some kinds also have value.
If kind is agent, you also choose agent_type, output_format, and a prompt.
Typical shape:
{
"kind": "agent",
"name": "Description",
"key": "company_description",
"value": {
"agent_type": "research_pro",
"prompt": "Describe what the company does in under 25 words.",
"output_format": "text"
}
}
Exceptions:
input, email_finder, and phone_finder do not need a value block
reverse_email_lookup uses top-level email_column_key instead of value
Column kinds at a glance
All column kinds require kind, name, and key.
| Kind | What it does | What you provide |
|---|
input | stores a value you already have | the row value |
agent | researches, reasons, or transforms one field | agent_type, prompt, output_format |
company_people_finder | finds people at each company by role | roles |
email_finder | finds a work email for a person | nothing extra |
phone_finder | finds a phone number for a person | nothing extra |
reverse_email_lookup | resolves profile data from a known email | email_column_key |
If kind = agent
agent is the core Extruct pattern. After choosing kind: "agent", you choose the agent behavior, the output shape, and the prompt.
| Choice | What it controls |
|---|
agent_type | how Extruct gets or produces the answer |
output_format | the shape of the returned result |
prompt | the exact job the column should do |
Agent types at a glance
| Agent type | Best for | Not best for |
|---|
research_pro | researched answers on company tables | people or generic tables, direct contact lookups, LinkedIn fetch from a known URL |
research_reasoning | researched answers on people and generic tables | company-table use cases that need company-context disambiguation |
llm | transforming data already in the row | web research or source discovery |
linkedin | LinkedIn data from a known LinkedIn URL | discovering the LinkedIn URL in the first place |
| Format | Use for | Extra required fields |
|---|
text | prose or notes | none |
url | one canonical URL | none |
email | one email-shaped answer from an agent | none |
select | exactly one allowed option | labels |
multiselect | zero or more allowed options | labels |
numeric | counts and measured values | none |
money | structured financial values | none |
date | exact or partial dates | none |
phone | one phone number in structured form | none |
grade | bounded scoring with explanation | none |
json | nested schema-defined output | output_schema |
Fast defaults
- On
company tables, start with the official website domain or URL as input
- On
company tables, default to kind: "agent" and agent_type: "research_pro"
- On
people and generic tables, use kind: "agent" and agent_type: "research_reasoning"
- On
company tables, choose research_reasoning only if you intentionally want to skip the company-context disambiguation step
- For scoring, use
output_format: "grade"
- For people or contact workflows, prefer
company_people_finder, email_finder, phone_finder, or reverse_email_lookup over generic prompts
- When you add new columns, rerun only the new column IDs with
mode: "new"
Choose the right column kind
Use the simplest column kind that matches the result you want back.
What it does:
- stores a value you already have
When to use it:
- company website domain or URL
- manual notes
- internal tags or metadata
- local helper fields such as
company_website on a standalone people table
Minimum config:
{
"kind": "input",
"name": "Company",
"key": "input"
}
Context it needs:
- none; you provide the value directly in row
data
Best practice:
- on
company tables, the preferred input is the company’s official website domain or URL
- a raw company name is a weaker fallback when you do not have the site
What it returns:
- exactly the row value you send
Most common mistake:
- using weak company names as the default company input when you already have a domain or URL
agent
What it does:
- researches, reasons, or transforms one field
When to use it:
- research a company fact
- classify a company or person
- summarize upstream evidence
- return structured output such as
select, money, grade, or json
Minimum config:
{
"kind": "agent",
"name": "Description",
"key": "company_description",
"value": {
"agent_type": "research_pro",
"prompt": "Describe what the company does in under 25 words.",
"output_format": "text"
}
}
Context it needs:
- on
company tables, Extruct auto-injects company context
- on
people tables, Extruct auto-injects person context
- on
generic tables, use explicit prompt references for the row fields you need
What it returns:
- one value in the output format you choose
Most common mistake:
- one prompt tries to do multiple jobs or references
{input} unnecessarily on company or people tables
company_people_finder
What it does:
- finds people at each company by role
When to use it:
- branch from company research into a people workflow
- find leadership, decision-makers, or role-based contact targets
Minimum config:
{
"kind": "company_people_finder",
"name": "Decision Makers",
"key": "decision_makers",
"value": {
"roles": ["VP Sales", "sales leadership", "revenue operations"]
}
}
Context it needs:
- a
company table with the correct company rows
What it returns or creates:
- fills the finder cell on the company table
- creates or updates a child
people table for downstream enrichment
Good role style:
- broader role families for coverage:
sales leadership, engineering leadership
- exact titles for narrower targeting:
CEO, Head of Engineering
Most common mistake:
- adding it to a
people table or using overly narrow role strings before checking coverage
email_finder
What it does:
- finds a work email for a person
When to use it:
- enrich a
people table after person rows already exist
Minimum config:
{
"kind": "email_finder",
"name": "Work Email",
"key": "work_email"
}
Context it needs:
full_name
profile_url
company_website
company_website can come from:
- a parent company row, or
- a local
people-table input column keyed exactly company_website
What it returns:
- one work email or no result
Most common mistake:
- adding the column before
company_website exists or trying to configure it with a value block
phone_finder
What it does:
- finds a phone number for a person
When to use it:
- enrich a
people table after person rows already exist
Minimum config:
{
"kind": "phone_finder",
"name": "Direct Phone",
"key": "direct_phone"
}
Context it needs:
full_name
profile_url
company_website
company_website can come from:
- a parent company row, or
- a local
people-table input column keyed exactly company_website
What it returns:
- one phone number or no result
Most common mistake:
- expecting high coverage without first verifying that the person rows and company website context are strong
reverse_email_lookup
What it does:
- resolves person or profile data from an email you already have
When to use it:
- you already have an email column and want profile resolution rather than company-to-people branching
Minimum config:
{
"kind": "reverse_email_lookup",
"name": "Profile from Email",
"key": "profile_from_email",
"email_column_key": "work_email"
}
Context it needs:
- a source email column keyed in
email_column_key
What it returns:
- person or profile metadata such as name, headline, location, and profile URL
Most common mistake:
- putting
email_column_key inside value or using this when the real job is company-to-people discovery
Agent types
Agent types change what Extruct can do for a custom agent column.
research_pro and research_reasoning can both research, reason, and make judgments.
The difference is whether Extruct first disambiguates the requested company using company context before answering.
| Agent type | Best for | What it actually does |
|---|
research_pro | researched answers on company tables | web research plus Extruct DB search and similar-company retrieval, with a company-context disambiguation step |
research_reasoning | researched answers on people and generic tables | the same general research tool class, but without the company-context disambiguation step |
llm | row-local transformation | transforms existing row data only; no web research |
linkedin | LinkedIn data from a known LinkedIn URL | uses LinkedIn company or profile lookup behavior |
Research agents cannot do these jobs directly:
- fetch LinkedIn data from a known LinkedIn URL
- find people at a company by role
- find work emails
- find phone numbers
- resolve a person from an email
Use these instead:
linkedin
company_people_finder
email_finder
phone_finder
reverse_email_lookup
research_pro
Use this for researched answers on company tables.
What makes it different:
- it uses the same general research and reasoning capability as
research_reasoning
- before answering, it gathers or confirms company context
- it uses that company context to disambiguate whether the evidence belongs to the requested company or to some other company
Good fits:
- pricing and packaging
- funding or valuation
- target segments
- recent news
- product or use-case research
- competitor or alternative discovery
- canonical company URLs such as LinkedIn or careers pages
Default role:
- this is the safe default for custom
agent columns on company tables
- use it unless you intentionally want to skip the disambiguation step and you know the entity is already clear
research_reasoning
Use this for researched answers when you do not want the company-context disambiguation step.
What makes it different:
- it uses the same general research and reasoning capability as
research_pro
- it does not start with the company-context disambiguation step
- it is the default research agent on
people and generic tables
- on
company tables, it is an expert choice when the entity is already clear and you intentionally want to skip the disambiguation step
Good fits:
- official-site disambiguation
- custom 1-5 scoring
- SMB vs enterprise judgments
- ranked comparisons
- researched answers on
generic tables
llm
Use this for row-local transformation only.
What to expect:
- no web research
- no page visits
- no source URLs
- cheaper downstream transformation after a research step already happened
Good fits:
- turn notes into a
select
- summarize several upstream columns
- normalize messy evidence
- derive structured
json from row context
linkedin
Use this when the row already has a known LinkedIn company or person URL and you want LinkedIn-derived data.
Typical chain:
- use
research_pro to find the right LinkedIn URL
- use
linkedin to fetch LinkedIn data
- use
llm to summarize or classify that data
Important people-table rule:
- custom
agent columns on people tables can use llm, research_reasoning, and linkedin
research_pro is not supported on custom agent columns on people tables
Built-in company and people context
company tables
company tables automatically create and fill:
company_profile
company_name
company_website
This is why most company-table prompts should be plain natural-language instructions.
Weak:
Research the company {input} and summarize what it does.
Strong:
Describe what the company does in under 25 words.
people tables
Custom agent columns on people tables get baseline row context automatically:
full_name
profile_url
role
Use plain natural-language prompts when that baseline context is enough.
Add explicit prompt references only when the column depends on another custom field or a local input field.
Weak:
Classify this person using {input}.
Strong:
Classify this person's seniority level based on their current role.
Example where an explicit reference is appropriate:
Choose the company's pricing model based on:
Pricing Notes
---
{pricing_notes}
---
Choose the output format that best matches how the result will be used later.
| Output format | Best for | Example |
|---|
text | short prose or notes | company description |
url | one canonical URL | careers page |
email | one email address | normalized email |
select | one allowed option | pricing model |
multiselect | zero or more allowed options | markets served |
numeric | counts or measured values | employee count |
money | structured financial values | annual revenue |
date | exact or partial dates | founded date |
phone | one phone number | direct phone |
grade | bounded scoring with explanation | ICP fit |
json | nested structured output | competitors list |
Guidance:
- use
text for short prose meant to be read directly
- use
select or multiselect when the answer should stay within labels
- use
money for revenue, ARR, valuation, or funding
- use
grade for scoring
- use
json only when the structure is genuinely useful downstream
For custom agent columns, the main result lives in rows[].data.<key>.value.answer.
What changes by output_format is the shape of answer. Supporting fields such as explanation and sources may sit alongside it in rows[].data.<key>.value.
text
Use this for prose or short notes that will be read directly.
Minimum config:
{
"kind": "agent",
"name": "Description",
"key": "company_description",
"value": {
"agent_type": "research_pro",
"prompt": "Describe what the company does in under 25 words.",
"output_format": "text"
}
}
Typical returned shape:
{
"answer": "AI research automation for company and people workflows.",
"explanation": "Company site and product messaging describe structured research automation.",
"sources": ["https://extruct.ai"]
}
Best practice:
- use
text only when the result is meant to be read directly; if you need filtering or sorting later, prefer a structured format
url
Use this for one canonical URL.
Minimum config:
{
"kind": "agent",
"name": "Careers URL",
"key": "careers_page_url",
"value": {
"agent_type": "research_pro",
"prompt": "Find the company's official careers or jobs page URL.",
"output_format": "url"
}
}
Typical returned shape:
{
"answer": "https://extruct.ai/careers",
"explanation": "This is the company's official careers page.",
"sources": ["https://extruct.ai/careers"]
}
Best practice:
- ask for the single best URL only; do not ask for a list unless you actually need a
json structure
email
Use this when you want one email-shaped answer from a custom agent.
Minimum config:
{
"kind": "agent",
"name": "Support Email",
"key": "support_email",
"value": {
"agent_type": "research_pro",
"prompt": "Find the company's primary support email address.",
"output_format": "email"
}
}
Typical returned shape:
{
"answer": "support@extruct.ai",
"explanation": "The support contact is listed on the company's website.",
"sources": ["https://extruct.ai"]
}
Best practice:
- if the real job is finding a work email for a person, prefer
email_finder
select
Use this for exactly one allowed option.
Minimum config:
{
"kind": "agent",
"name": "Pricing Model",
"key": "pricing_model",
"value": {
"agent_type": "llm",
"prompt": "Select the company's primary pricing model.\n\nPricing Notes\n---\n{pricing_notes}\n---",
"output_format": "select",
"labels": ["Free", "Freemium", "Subscription", "Usage-Based", "Custom Quote"]
}
}
labels is required in the column config.
Typical returned shape:
{
"answer": {
"value": "Subscription"
},
"explanation": "The company sells recurring subscription plans.",
"sources": ["https://extruct.ai/pricing"]
}
Best practice:
- labels should be stable categories, not free-form text or sentence-length answers
multiselect
Use this for zero or more allowed options.
Minimum config:
{
"kind": "agent",
"name": "Markets Served",
"key": "markets_served",
"value": {
"agent_type": "research_reasoning",
"prompt": "Select the markets this company clearly serves.",
"output_format": "multiselect",
"labels": ["SMB", "Mid-Market", "Enterprise", "Public Sector", "Healthcare"]
}
}
labels is required in the column config.
Typical returned shape:
{
"answer": {
"values": ["Mid-Market", "Enterprise"]
},
"explanation": "The company shows clear messaging for larger commercial customers.",
"sources": ["https://extruct.ai"]
}
Best practice:
- use
multiselect only when multiple labels can legitimately be true at the same time
numeric
Use this for counts, ranges, or measured values.
Minimum config:
{
"kind": "agent",
"name": "Employee Count",
"key": "employee_count",
"value": {
"agent_type": "research_pro",
"prompt": "How many employees does the company currently have?",
"output_format": "numeric"
}
}
Typical returned shape:
{
"answer": {
"value": 5234,
"min_value": null,
"max_value": null,
"unit": "employees",
"max_possible": null
},
"explanation": "Recent company sources point to approximately 5,234 employees.",
"sources": ["https://example.com"]
}
Best practice:
- prefer
numeric over text when the result may be sorted, filtered, or bucketed later
money
Use this for structured financial values such as revenue, ARR, valuation, or funding.
Minimum config:
{
"kind": "agent",
"name": "Annual Revenue",
"key": "annual_revenue",
"value": {
"agent_type": "research_pro",
"prompt": "What is the company's latest annual revenue or ARR?",
"output_format": "money"
}
}
Typical returned shape:
{
"answer": {
"amount": 15000000,
"min_amount": null,
"max_amount": null,
"currency": "USD",
"period": "annual"
},
"explanation": "The latest reliable figure is approximately $15M annually.",
"sources": ["https://example.com"]
}
Best practice:
- prefer
money over prose for any financial figure you may compare or score later
date
Use this for exact dates, partial dates, or date ranges.
Minimum config:
{
"kind": "agent",
"name": "Founded Date",
"key": "founded_date",
"value": {
"agent_type": "research_pro",
"prompt": "When was the company founded?",
"output_format": "date"
}
}
Typical returned shape:
{
"answer": {
"year": 2021,
"month": null,
"day": null,
"end_year": null,
"end_month": null,
"end_day": null
},
"explanation": "Only the founding year is clearly supported by the available sources.",
"sources": ["https://example.com"]
}
Best practice:
- partial dates are valid and expected; do not force full day-level precision when the source does not support it
phone
Use this for one phone number in structured form.
Minimum config:
{
"kind": "agent",
"name": "Main Phone",
"key": "main_phone",
"value": {
"agent_type": "research_pro",
"prompt": "Find the company's primary phone number.",
"output_format": "phone"
}
}
Typical returned shape:
{
"answer": {
"number": "+14155552671",
"country_code": "1",
"extension": null
},
"explanation": "This is the main published company phone number.",
"sources": ["https://example.com"]
}
Best practice:
- if the real job is finding a phone number for a person, prefer
phone_finder
grade
Use this when you want a bounded score plus an explanation.
The recommended scoring pattern is a custom agent column with output_format: "grade".
Example:
{
"kind": "agent",
"name": "Enterprise Fit",
"key": "enterprise_fit",
"value": {
"agent_type": "research_reasoning",
"prompt": "Assess this statement about the company: The company clearly sells to enterprise customers. Use the default 1-5 grade scale. If there is not enough evidence to score confidently, return Not found.",
"output_format": "grade"
}
}
Default scale:
5: strong yes
4: likely yes
3: mixed, partial, or uncertain
2: likely no
1: strong no
Not found: not enough evidence to score confidently
Typical returned shape:
{
"answer": 4,
"explanation": "Strong mid-market and enterprise signals, but not purely enterprise-only.",
"sources": ["https://example.com"]
}
Not found shape:
{
"answer": "Not found",
"explanation": "Not enough reliable evidence to score this confidently.",
"sources": []
}
json
Use this for user-defined nested output.
Minimum config:
{
"kind": "agent",
"name": "Competitors",
"key": "competitors",
"value": {
"agent_type": "research_pro",
"prompt": "Find the company's closest competitors and alternatives. Return one object with a competitors array. Each competitor should include name, domain, and short_reason.",
"output_format": "json",
"output_schema": {
"type": "object",
"properties": {
"competitors": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"domain": { "type": "string" },
"short_reason": { "type": "string" }
}
}
}
}
}
}
}
output_schema is required.
Typical returned shape:
{
"answer": {
"competitors": [
{
"name": "Example Co",
"domain": "example.com",
"short_reason": "Targets the same buyer with a similar core workflow."
}
]
},
"explanation": "These companies are the closest alternatives based on product and buyer overlap.",
"sources": ["https://example.com"]
}
Best practice:
- unlike the fixed formats above, the shape of
answer is defined by your output_schema
Dependencies and prompt writing
For custom agent columns, dependencies come from:
- prompt references such as
{pricing_notes}
extra_dependencies
Prompt references are usually the clearest default because the dependency is visible in the prompt itself.
Use extra_dependencies only when you intentionally need an upstream dependency that is not referenced in the prompt.
Dependency example:
{
"kind": "agent",
"name": "Pricing Model",
"key": "pricing_model",
"value": {
"agent_type": "llm",
"prompt": "Select the company's primary pricing model.\n\nPricing Notes\n---\n{pricing_notes}\n---",
"output_format": "select",
"labels": ["Free", "Freemium", "Subscription", "Usage-Based", "Custom Quote"]
}
}
Prompt writing rules:
- each column should do one job
- say exactly what artifact should come back
- prefer natural-language prompts on
company and people tables
- use explicit prompt references only when a column truly depends on another custom field
Research prompt pattern:
Find the company's official careers page URL. Return the single best URL only.
Competitor discovery pattern:
Find the company's closest competitors and alternatives.
Prefer companies that solve the same core problem for a similar buyer.
Return one object with a competitors array.
Each competitor should include name, domain, and short_reason.
If you are not confident a company is a real competitor, leave it out.
Troubleshooting
Unresolved column references
Cause:
- the prompt references a missing column key
Fix:
- create the source column first
- or change the prompt to reference an existing key
Column is not compatible with the table kind
Common examples:
company_people_finder on a people table
research_pro custom agent columns on a people table
Cells never start running
Cause:
- upstream dependency cells are not done
Fix:
- inspect the source columns first
- keep dependency chains short
- rerun only the new column IDs with
mode: "new"
Results are hard to reuse downstream
Cause:
Fix:
- move repeated decisions into
select, multiselect, numeric, money, date, phone, grade, or json
A column feels too expensive
Cause:
- web research is being used for a transformation problem
Fix:
- research the source fact once
- derive downstream fields with
llm
Recommended defaults
If you are unsure:
- default to
company tables
- default to official website domain or URL as company-table
input
- default to built-in or purpose-built column kinds when available
- default to
research_pro on company tables
- default to
research_reasoning on people and generic tables
- on
company tables, switch to research_reasoning only when you intentionally want to skip company-context disambiguation
- default to
llm for downstream transformations
- default to
agent + grade for scoring
- default to plain natural-language prompts on
company and people tables
- add explicit prompt references only for intentional chaining or on
generic tables
For fuller reusable configs and multi-step patterns, continue to Column Library.