Skip to main content
POST
/
v1
/
discovery_tasks
Create Discovery Task
curl --request POST \
  --url https://api.extruct.ai/v1/discovery_tasks \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "query": "<string>",
  "desired_num_results": 100,
  "table": {
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "run": false,
    "columns": [
      "3c90c3cc-0d44-4b50-8888-8dd25736052a"
    ],
    "auto_import": false
  },
  "auto_data_sources": true,
  "data_sources": [],
  "criteria": [
    {
      "key": "<string>",
      "name": "<string>",
      "criterion": "<string>"
    }
  ],
  "exclude_domains": [
    "<string>"
  ],
  "exclude_task_ids": [
    "3c90c3cc-0d44-4b50-8888-8dd25736052a"
  ]
}
'
{
  "id": "<string>",
  "created_at": "2023-11-07T05:31:56Z",
  "query": "<string>",
  "desired_num_results": 123,
  "is_exhausted": false,
  "num_results_discovered": 0,
  "num_results_enriched": 0,
  "num_results_evaluated": 0,
  "num_results": 0,
  "table_id": "<string>",
  "auto_data_sources": true,
  "data_sources": [],
  "criteria": [
    {
      "key": "<string>",
      "name": "<string>",
      "criterion": "<string>"
    }
  ],
  "exclude_domains": [
    "<string>"
  ]
}

Overview

This endpoint creates a Deep Search task and returns a task ID immediately. Deep Search is the asynchronous discovery path for when indexed similarity and firmographic filters fall short.

Example request

export EXTRUCT_API_TOKEN="YOUR_API_TOKEN"

curl -X POST "https://api.extruct.ai/v1/discovery_tasks" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d '{
    "query": "vertical SaaS companies serving freight forwarding",
    "desired_num_results": 100
  }'

Key parameters

  • query (required): Natural-language description of companies to find.
  • desired_num_results (optional): target result count; default 100, min 1, max 250.
  • auto_data_sources (optional): default true.
  • data_sources (optional): ignored when auto_data_sources=true.
  • criteria (optional): explicit criterion definitions; each is graded 1-5 in results. If omitted, criteria are inferred from query.
  • exclude_domains (optional): domains or URLs to exclude from results. Server normalizes each entry to its registered domain (https://www.Acme.com/aboutacme.com). Duplicates are collapsed.
  • exclude_task_ids (optional): up to 100 IDs of completed (status = "done") Deep Search tasks. The visible (returned) companies from those tasks are added to the exclusion set at creation time. Tasks must be owned by you (or someone in your organization).
The combined exclusion set (exclude_domains ∪ visible results of exclude_task_ids) is capped at 1000 unique domains. The resolved list is stored on the task and returned in the response as exclude_domains.

Endpoint behavior

  • This call creates the task and returns immediately. Use the task ID to poll status and read results.
  • Every Deep Search result includes criteria and scores. If you omit criteria, Extruct generates them for you. If you provide criteria, those are used instead.
  • exclude_* inputs are resolved once at creation time and frozen on the task. Resuming or re-reading the task uses the same resolved set — references are not re-evaluated later.
  • Use Get Task to monitor progress and Get Task Results to review the companies.

Excluding prior results

A common pattern is to run a follow-up Deep Search that skips companies already returned by an earlier run:
curl -X POST "https://api.extruct.ai/v1/discovery_tasks" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d '{
    "query": "vertical SaaS companies serving freight forwarding",
    "desired_num_results": 100,
    "exclude_domains": ["acme.com", "https://beta.io/about"],
    "exclude_task_ids": ["<prior-task-uuid>"]
  }'

Success signal

A successful response includes task id and initial status. Save id and use it with task/status/results endpoints.

Common errors

401 Unauthorized

Check that your header is Authorization: Bearer ${EXTRUCT_API_TOKEN}.

400 Bad Request

Returned when an exclude_task_ids entry references a task that is not in status = "done".

403 Forbidden

Returned when an exclude_task_ids entry references a task you do not own (and that is not owned by your organization).

404 Not Found

Returned when an exclude_task_ids entry references a task that does not exist.

422 Unprocessable Entity

Common causes:
  • Invalid JSON or unsupported fields.
  • desired_num_results outside 1..250.
  • An entry in exclude_domains that cannot be parsed as a domain.
  • The resolved exclusion set exceeds the 1000-domain cap. Reduce exclude_domains or exclude_task_ids.
Validate body first:
echo '<json-body>' | jq empty

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
query
string
required

Ideal description of companies to find

desired_num_results
integer
default:100

Target number of results for this task. Maximum is 250.

Required range: 1 <= x <= 250
table
TableForDiscoveryTaskInput · object
auto_data_sources
boolean
default:true

Automatically determine best data sources based on query

data_sources
enum<string>[] | null

Manual data source selection (ignored if auto_data_sources=True)

Available options:
web_search,
linkedin,
maps
criteria
CriterionDefinition · object[] | null

Optional criteria for evaluating discovered companies. Each criterion will be graded on a 1-5 scale.

exclude_domains
string[] | null

Domains (or URLs) of companies the run must not return. Normalized server-side to registered domains.

exclude_task_ids
string<uuid>[] | null

IDs of completed (DONE) Deep Search tasks whose returned companies must be excluded from the run. Up to 100 tasks. Tasks must belong to the caller (or their organization).

Maximum array length: 100

Response

Successful Response

id
string
required
created_at
string<date-time>
required
status
enum<string>
required
Available options:
created,
in_progress,
done,
failed
query
string
required
desired_num_results
integer
required
is_exhausted
boolean
default:false
num_results_discovered
integer
default:0

Total number of company candidates discovered from search

num_results_enriched
integer
default:0

Number of candidates enriched with company profiles

num_results_evaluated
integer
default:0

Number of candidates that had criteria evaluation completed

num_results
integer
default:0

Total number of results

table_id
string | null
auto_data_sources
boolean
default:true
data_sources
enum<string>[] | null
Available options:
web_search,
linkedin,
maps
criteria
CriterionDefinition · object[] | null
exclude_domains
string[] | null

The resolved, sorted list of registered domains that were excluded from this run (union of exclude_domains and the visible results of exclude_task_ids at creation time). Capped at 1000.