Column Guide - Extruct AI

Overview

This guide explains how to choose and configure AI Tables columns for company research, people discovery, contact enrichment, and scoring. It is meant to be actionable on its own: each supported column kind below includes the minimum working shape, required context, and the most common mistake.

Prerequisites

export EXTRUCT_API_TOKEN="YOUR_API_TOKEN"

Generate tokens in Dashboard API Tokens. For full setup, see Authentication.

Endpoints used

Column Guide in 60 Seconds

A column is one field Extruct fills for each row. Every column has kind, name, and key. Some kinds also have value. If kind is agent, you also choose agent_type, output_format, and a prompt. Typical shape:

{
  "kind": "agent",
  "name": "Description",
  "key": "company_description",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Describe what the company does in under 25 words.",
    "output_format": "text"
  }
}

Exceptions:

input, email_finder, and phone_finder do not need a value block
reverse_email_lookup uses top-level email_column_key instead of value

Column kinds at a glance

All column kinds require kind, name, and key.

Kind	What it does	What you provide
`input`	stores a value you already have	the row value
`agent`	researches, reasons, or transforms one field	`agent_type`, `prompt`, `output_format`
`company_people_finder`	finds people at each company by role	`roles`
`email_finder`	finds a work email for a person	nothing extra
`phone_finder`	finds a phone number for a person	nothing extra
`reverse_email_lookup`	resolves profile data from a known email	`email_column_key`

If `kind = agent`

agent is the core Extruct pattern. After choosing kind: "agent", you choose the agent behavior, the output shape, and the prompt.

Choice	What it controls
`agent_type`	how Extruct gets or produces the answer
`output_format`	the shape of the returned result
`prompt`	the exact job the column should do

Agent types at a glance

Agent type	Best for	Not best for
`research_pro`	researched answers on `company` tables	`people` or `generic` tables, direct contact lookups, LinkedIn fetch from a known URL
`research_reasoning`	researched answers on `people` and `generic` tables	company-table use cases that need company-context disambiguation
`llm`	transforming data already in the row	web research or source discovery
`linkedin`	LinkedIn data from a known LinkedIn URL	discovering the LinkedIn URL in the first place

Output formats at a glance

Format	Use for	Extra required fields
`text`	prose or notes	none
`url`	one canonical URL	none
`email`	one email-shaped answer from an agent	none
`select`	exactly one allowed option	`labels`
`multiselect`	zero or more allowed options	`labels`
`numeric`	counts and measured values	none
`money`	structured financial values	none
`date`	exact or partial dates	none
`phone`	one phone number in structured form	none
`grade`	bounded scoring with explanation	none
`json`	nested schema-defined output	`output_schema`

Fast defaults

On company tables, start with the official website domain or URL as input
On company tables, default to kind: "agent" and agent_type: "research_pro"
On people and generic tables, use kind: "agent" and agent_type: "research_reasoning"
On company tables, choose research_reasoning only if you intentionally want to skip the company-context disambiguation step
For scoring, use output_format: "grade"
For people or contact workflows, prefer company_people_finder, email_finder, phone_finder, or reverse_email_lookup over generic prompts
When you add new columns, rerun only the new column IDs with mode: "new"

Choose the right column kind

Use the simplest column kind that matches the result you want back.

`input`

What it does:

stores a value you already have

When to use it:

company website domain or URL
manual notes
internal tags or metadata
local helper fields such as company_website on a standalone people table

Minimum config:

{
  "kind": "input",
  "name": "Company",
  "key": "input"
}

Context it needs:

none; you provide the value directly in row data

Best practice:

on company tables, the preferred input is the company’s official website domain or URL
a raw company name is a weaker fallback when you do not have the site

What it returns:

exactly the row value you send

Most common mistake:

using weak company names as the default company input when you already have a domain or URL

`agent`

What it does:

researches, reasons, or transforms one field

When to use it:

research a company fact
classify a company or person
summarize upstream evidence
return structured output such as select, money, grade, or json

Minimum config:

{
  "kind": "agent",
  "name": "Description",
  "key": "company_description",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Describe what the company does in under 25 words.",
    "output_format": "text"
  }
}

Context it needs:

on company tables, Extruct auto-injects company context
on people tables, Extruct auto-injects person context
on generic tables, use explicit prompt references for the row fields you need

What it returns:

one value in the output format you choose

Most common mistake:

one prompt tries to do multiple jobs or references {input} unnecessarily on company or people tables

`company_people_finder`

What it does:

finds people at each company by role

When to use it:

branch from company research into a people workflow
find leadership, decision-makers, or role-based contact targets

Minimum config:

{
  "kind": "company_people_finder",
  "name": "Decision Makers",
  "key": "decision_makers",
  "value": {
    "roles": ["VP Sales", "sales leadership", "revenue operations"]
  }
}

Context it needs:

a company table with the correct company rows

What it returns or creates:

fills the finder cell on the company table
creates or updates a child people table for downstream enrichment

Good role style:

broader role families for coverage: sales leadership, engineering leadership
exact titles for narrower targeting: CEO, Head of Engineering

Most common mistake:

adding it to a people table or using overly narrow role strings before checking coverage

`email_finder`

What it does:

finds a work email for a person

When to use it:

enrich a people table after person rows already exist

Minimum config:

{
  "kind": "email_finder",
  "name": "Work Email",
  "key": "work_email"
}

Context it needs:

full_name
profile_url
company_website

company_website can come from:

a parent company row, or
a local people-table input column keyed exactly company_website

What it returns:

one work email or no result

Most common mistake:

adding the column before company_website exists or trying to configure it with a value block

`phone_finder`

What it does:

finds a phone number for a person

When to use it:

enrich a people table after person rows already exist

Minimum config:

{
  "kind": "phone_finder",
  "name": "Direct Phone",
  "key": "direct_phone"
}

Context it needs:

full_name
profile_url
company_website

company_website can come from:

a parent company row, or
a local people-table input column keyed exactly company_website

What it returns:

one phone number or no result

Most common mistake:

expecting high coverage without first verifying that the person rows and company website context are strong

`reverse_email_lookup`

What it does:

resolves person or profile data from an email you already have

When to use it:

you already have an email column and want profile resolution rather than company-to-people branching

Minimum config:

{
  "kind": "reverse_email_lookup",
  "name": "Profile from Email",
  "key": "profile_from_email",
  "email_column_key": "work_email"
}

Context it needs:

a source email column keyed in email_column_key

What it returns:

person or profile metadata such as name, headline, location, and profile URL

Most common mistake:

putting email_column_key inside value or using this when the real job is company-to-people discovery

Agent types

Agent types change what Extruct can do for a custom agent column. research_pro and research_reasoning can both research, reason, and make judgments. The difference is whether Extruct first disambiguates the requested company using company context before answering.

Agent type	Best for	What it actually does
`research_pro`	researched answers on `company` tables	web research plus Extruct DB search and similar-company retrieval, with a company-context disambiguation step
`research_reasoning`	researched answers on `people` and `generic` tables	the same general research tool class, but without the company-context disambiguation step
`llm`	row-local transformation	transforms existing row data only; no web research
`linkedin`	LinkedIn data from a known LinkedIn URL	uses LinkedIn company or profile lookup behavior

Research agents cannot do these jobs directly:

fetch LinkedIn data from a known LinkedIn URL
find people at a company by role
find work emails
find phone numbers
resolve a person from an email

Use these instead:

linkedin
company_people_finder
email_finder
phone_finder
reverse_email_lookup

`research_pro`

Use this for researched answers on company tables. What makes it different:

it uses the same general research and reasoning capability as research_reasoning
before answering, it gathers or confirms company context
it uses that company context to disambiguate whether the evidence belongs to the requested company or to some other company

Good fits:

pricing and packaging
funding or valuation
target segments
recent news
product or use-case research
competitor or alternative discovery
canonical company URLs such as LinkedIn or careers pages

Default role:

this is the safe default for custom agent columns on company tables
use it unless you intentionally want to skip the disambiguation step and you know the entity is already clear

`research_reasoning`

Use this for researched answers when you do not want the company-context disambiguation step. What makes it different:

it uses the same general research and reasoning capability as research_pro
it does not start with the company-context disambiguation step
it is the default research agent on people and generic tables
on company tables, it is an expert choice when the entity is already clear and you intentionally want to skip the disambiguation step

Good fits:

official-site disambiguation
custom 1-5 scoring
SMB vs enterprise judgments
ranked comparisons
researched answers on generic tables

`llm`

Use this for row-local transformation only. What to expect:

no web research
no page visits
no source URLs
cheaper downstream transformation after a research step already happened

Good fits:

turn notes into a select
summarize several upstream columns
normalize messy evidence
derive structured json from row context

`linkedin`

Use this when the row already has a known LinkedIn company or person URL and you want LinkedIn-derived data. Typical chain:

use research_pro to find the right LinkedIn URL
use linkedin to fetch LinkedIn data
use llm to summarize or classify that data

Important people-table rule:

custom agent columns on people tables can use llm, research_reasoning, and linkedin
research_pro is not supported on custom agent columns on people tables

Built-in company and people context

`company` tables

company tables automatically create and fill:

company_profile
company_name
company_website

This is why most company-table prompts should be plain natural-language instructions. Weak:

Research the company {input} and summarize what it does.

Strong:

Describe what the company does in under 25 words.

`people` tables

Custom agent columns on people tables get baseline row context automatically:

full_name
profile_url
role

Use plain natural-language prompts when that baseline context is enough. Add explicit prompt references only when the column depends on another custom field or a local input field. Weak:

Classify this person using {input}.

Strong:

Classify this person's seniority level based on their current role.

Example where an explicit reference is appropriate:

Choose the company's pricing model based on:

Pricing Notes
---
{pricing_notes}
---

Output formats

Choose the output format that best matches how the result will be used later.

Output format	Best for	Example
`text`	short prose or notes	company description
`url`	one canonical URL	careers page
`email`	one email address	normalized email
`select`	one allowed option	pricing model
`multiselect`	zero or more allowed options	markets served
`numeric`	counts or measured values	employee count
`money`	structured financial values	annual revenue
`date`	exact or partial dates	founded date
`phone`	one phone number	direct phone
`grade`	bounded scoring with explanation	ICP fit
`json`	nested structured output	competitors list

Guidance:

use text for short prose meant to be read directly
use select or multiselect when the answer should stay within labels
use money for revenue, ARR, valuation, or funding
use grade for scoring
use json only when the structure is genuinely useful downstream

For custom agent columns, the main result lives in rows[].data.<key>.value.answer. What changes by output_format is the shape of answer. Supporting fields such as explanation and sources may sit alongside it in rows[].data.<key>.value.

`text`

Use this for prose or short notes that will be read directly. Minimum config:

{
  "kind": "agent",
  "name": "Description",
  "key": "company_description",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Describe what the company does in under 25 words.",
    "output_format": "text"
  }
}

Typical returned shape:

{
  "answer": "AI research automation for company and people workflows.",
  "explanation": "Company site and product messaging describe structured research automation.",
  "sources": ["https://extruct.ai"]
}

Best practice:

use text only when the result is meant to be read directly; if you need filtering or sorting later, prefer a structured format

`url`

Use this for one canonical URL. Minimum config:

{
  "kind": "agent",
  "name": "Careers URL",
  "key": "careers_page_url",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Find the company's official careers or jobs page URL.",
    "output_format": "url"
  }
}

Typical returned shape:

{
  "answer": "https://extruct.ai/careers",
  "explanation": "This is the company's official careers page.",
  "sources": ["https://extruct.ai/careers"]
}

Best practice:

ask for the single best URL only; do not ask for a list unless you actually need a json structure

`email`

Use this when you want one email-shaped answer from a custom agent. Minimum config:

{
  "kind": "agent",
  "name": "Support Email",
  "key": "support_email",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Find the company's primary support email address.",
    "output_format": "email"
  }
}

Typical returned shape:

{
  "answer": "support@extruct.ai",
  "explanation": "The support contact is listed on the company's website.",
  "sources": ["https://extruct.ai"]
}

Best practice:

if the real job is finding a work email for a person, prefer email_finder

`select`

Use this for exactly one allowed option. Minimum config:

{
  "kind": "agent",
  "name": "Pricing Model",
  "key": "pricing_model",
  "value": {
    "agent_type": "llm",
    "prompt": "Select the company's primary pricing model.\n\nPricing Notes\n---\n{pricing_notes}\n---",
    "output_format": "select",
    "labels": ["Free", "Freemium", "Subscription", "Usage-Based", "Custom Quote"]
  }
}

labels is required in the column config. Typical returned shape:

{
  "answer": {
    "value": "Subscription"
  },
  "explanation": "The company sells recurring subscription plans.",
  "sources": ["https://extruct.ai/pricing"]
}

Best practice:

labels should be stable categories, not free-form text or sentence-length answers

`multiselect`

Use this for zero or more allowed options. Minimum config:

{
  "kind": "agent",
  "name": "Markets Served",
  "key": "markets_served",
  "value": {
    "agent_type": "research_reasoning",
    "prompt": "Select the markets this company clearly serves.",
    "output_format": "multiselect",
    "labels": ["SMB", "Mid-Market", "Enterprise", "Public Sector", "Healthcare"]
  }
}

labels is required in the column config. Typical returned shape:

{
  "answer": {
    "values": ["Mid-Market", "Enterprise"]
  },
  "explanation": "The company shows clear messaging for larger commercial customers.",
  "sources": ["https://extruct.ai"]
}

Best practice:

use multiselect only when multiple labels can legitimately be true at the same time

`numeric`

Use this for counts, ranges, or measured values. Minimum config:

{
  "kind": "agent",
  "name": "Employee Count",
  "key": "employee_count",
  "value": {
    "agent_type": "research_pro",
    "prompt": "How many employees does the company currently have?",
    "output_format": "numeric"
  }
}

Typical returned shape:

{
  "answer": {
    "value": 5234,
    "min_value": null,
    "max_value": null,
    "unit": "employees",
    "max_possible": null
  },
  "explanation": "Recent company sources point to approximately 5,234 employees.",
  "sources": ["https://example.com"]
}

Best practice:

prefer numeric over text when the result may be sorted, filtered, or bucketed later

`money`

Use this for structured financial values such as revenue, ARR, valuation, or funding. Minimum config:

{
  "kind": "agent",
  "name": "Annual Revenue",
  "key": "annual_revenue",
  "value": {
    "agent_type": "research_pro",
    "prompt": "What is the company's latest annual revenue or ARR?",
    "output_format": "money"
  }
}

Typical returned shape:

{
  "answer": {
    "amount": 15000000,
    "min_amount": null,
    "max_amount": null,
    "currency": "USD",
    "period": "annual"
  },
  "explanation": "The latest reliable figure is approximately $15M annually.",
  "sources": ["https://example.com"]
}

Best practice:

prefer money over prose for any financial figure you may compare or score later

`date`

Use this for exact dates, partial dates, or date ranges. Minimum config:

{
  "kind": "agent",
  "name": "Founded Date",
  "key": "founded_date",
  "value": {
    "agent_type": "research_pro",
    "prompt": "When was the company founded?",
    "output_format": "date"
  }
}

Typical returned shape:

{
  "answer": {
    "year": 2021,
    "month": null,
    "day": null,
    "end_year": null,
    "end_month": null,
    "end_day": null
  },
  "explanation": "Only the founding year is clearly supported by the available sources.",
  "sources": ["https://example.com"]
}

Best practice:

partial dates are valid and expected; do not force full day-level precision when the source does not support it

`phone`

Use this for one phone number in structured form. Minimum config:

{
  "kind": "agent",
  "name": "Main Phone",
  "key": "main_phone",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Find the company's primary phone number.",
    "output_format": "phone"
  }
}

Typical returned shape:

{
  "answer": {
    "number": "+14155552671",
    "country_code": "1",
    "extension": null
  },
  "explanation": "This is the main published company phone number.",
  "sources": ["https://example.com"]
}

Best practice:

if the real job is finding a phone number for a person, prefer phone_finder

`grade`

Use this when you want a bounded score plus an explanation. The recommended scoring pattern is a custom agent column with output_format: "grade". Example:

{
  "kind": "agent",
  "name": "Enterprise Fit",
  "key": "enterprise_fit",
  "value": {
    "agent_type": "research_reasoning",
    "prompt": "Assess this statement about the company: The company clearly sells to enterprise customers. Use the default 1-5 grade scale. If there is not enough evidence to score confidently, return Not found.",
    "output_format": "grade"
  }
}

Default scale:

5: strong yes
4: likely yes
3: mixed, partial, or uncertain
2: likely no
1: strong no
Not found: not enough evidence to score confidently

Typical returned shape:

{
  "answer": 4,
  "explanation": "Strong mid-market and enterprise signals, but not purely enterprise-only.",
  "sources": ["https://example.com"]
}

Not found shape:

{
  "answer": "Not found",
  "explanation": "Not enough reliable evidence to score this confidently.",
  "sources": []
}

`json`

Use this for user-defined nested output. Minimum config:

{
  "kind": "agent",
  "name": "Competitors",
  "key": "competitors",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Find the company's closest competitors and alternatives. Return one object with a competitors array. Each competitor should include name, domain, and short_reason.",
    "output_format": "json",
    "output_schema": {
      "type": "object",
      "properties": {
        "competitors": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": { "type": "string" },
              "domain": { "type": "string" },
              "short_reason": { "type": "string" }
            }
          }
        }
      }
    }
  }
}

output_schema is required. Typical returned shape:

{
  "answer": {
    "competitors": [
      {
        "name": "Example Co",
        "domain": "example.com",
        "short_reason": "Targets the same buyer with a similar core workflow."
      }
    ]
  },
  "explanation": "These companies are the closest alternatives based on product and buyer overlap.",
  "sources": ["https://example.com"]
}

Best practice:

unlike the fixed formats above, the shape of answer is defined by your output_schema

Dependencies and prompt writing

For custom agent columns, dependencies come from:

prompt references such as {pricing_notes}
extra_dependencies

Prompt references are usually the clearest default because the dependency is visible in the prompt itself. Use extra_dependencies only when you intentionally need an upstream dependency that is not referenced in the prompt. Dependency example:

{
  "kind": "agent",
  "name": "Pricing Model",
  "key": "pricing_model",
  "value": {
    "agent_type": "llm",
    "prompt": "Select the company's primary pricing model.\n\nPricing Notes\n---\n{pricing_notes}\n---",
    "output_format": "select",
    "labels": ["Free", "Freemium", "Subscription", "Usage-Based", "Custom Quote"]
  }
}

Prompt writing rules:

each column should do one job
say exactly what artifact should come back
prefer natural-language prompts on company and people tables
use explicit prompt references only when a column truly depends on another custom field

Research prompt pattern:

Find the company's official careers page URL. Return the single best URL only.

Competitor discovery pattern:

Find the company's closest competitors and alternatives.
Prefer companies that solve the same core problem for a similar buyer.
Return one object with a competitors array.
Each competitor should include name, domain, and short_reason.
If you are not confident a company is a real competitor, leave it out.

Troubleshooting

Unresolved column references

Cause:

the prompt references a missing column key

Fix:

create the source column first
or change the prompt to reference an existing key

Column is not compatible with the table kind

Common examples:

company_people_finder on a people table
research_pro custom agent columns on a people table

Cells never start running

Cause:

upstream dependency cells are not done

Fix:

inspect the source columns first
keep dependency chains short
rerun only the new column IDs with mode: "new"

Results are hard to reuse downstream

Cause:

too many text outputs

Fix:

move repeated decisions into select, multiselect, numeric, money, date, phone, grade, or json

A column feels too expensive

Cause:

web research is being used for a transformation problem

Fix:

research the source fact once
derive downstream fields with llm

Recommended defaults

If you are unsure:

default to company tables
default to official website domain or URL as company-table input
default to built-in or purpose-built column kinds when available
default to research_pro on company tables
default to research_reasoning on people and generic tables
on company tables, switch to research_reasoning only when you intentionally want to skip company-context disambiguation
default to llm for downstream transformations
default to agent + grade for scoring
default to plain natural-language prompts on company and people tables
add explicit prompt references only for intentional chaining or on generic tables

For fuller reusable configs and multi-step patterns, continue to Column Library.

API Guides

Build with AI Agents

Integrations

Documentation Index

​Overview

​Prerequisites

​Endpoints used

​Column Guide in 60 Seconds

​Column kinds at a glance

​If kind = agent

​Agent types at a glance

​Output formats at a glance

​Fast defaults

​Choose the right column kind

​input

​agent

​company_people_finder

​email_finder

​phone_finder

​reverse_email_lookup

​Agent types

​research_pro

​research_reasoning

​llm

​linkedin

​Built-in company and people context

​company tables

​people tables

​Output formats

​text

​url

​email

​select

​multiselect

​numeric

​money

​date

​phone

​grade

​json

​Dependencies and prompt writing

​Troubleshooting

​Unresolved column references

​Column is not compatible with the table kind

​Cells never start running

​Results are hard to reuse downstream

​A column feels too expensive

​Recommended defaults

Overview

Prerequisites

Endpoints used

Column Guide in 60 Seconds

Column kinds at a glance

If `kind = agent`

Agent types at a glance

Output formats at a glance

Fast defaults

Choose the right column kind

`input`

`agent`

`company_people_finder`

`email_finder`

`phone_finder`

`reverse_email_lookup`

Agent types

`research_pro`

`research_reasoning`

`llm`

`linkedin`

Built-in company and people context

`company` tables

`people` tables

Output formats

`text`

`url`

`email`

`select`

`multiselect`

`numeric`

`money`

`date`

`phone`

`grade`

`json`

Dependencies and prompt writing

Troubleshooting

Unresolved column references

Column is not compatible with the table kind

Cells never start running

Results are hard to reuse downstream

A column feels too expensive

Recommended defaults