> ## Documentation Index
> Fetch the complete documentation index at: https://docs.extruct.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Deep Search

> Run asynchronous discovery tasks for criteria-based candidate evaluation.

## Overview

Deep Search is the asynchronous discovery path for cases where instant index-based search is not enough.
It ranks companies against natural-language criteria using web, Extruct DB, Maps, and LinkedIn, and is implemented through the `discovery_tasks` endpoints in API Reference.

## This Path Works Best When

* Ranking depends on qualitative criteria that are hard to express as firmographic filters.
* You need explanations and criterion-level scoring for each result.
* You are willing to wait for an asynchronous task in exchange for more deliberate evaluation.

## Choose Another Path If

* You want fast recall-first exploration over the Extruct index. Use [Semantic Search](/api-guides/search/semantic-search).
* You already have a strong seed company and want instant similarity expansion. Use [Lookalike Search](/api-guides/search/lookalike-search).

## Prerequisites

```bash theme={null}
export EXTRUCT_API_TOKEN="YOUR_API_TOKEN"
```

Generate tokens in [Dashboard API Tokens](https://app.extruct.ai/api-tokens). For full setup, see [Authentication](/api-reference/authentication).

## Endpoints used

* [Create task (`POST /v1/discovery_tasks`)](/api-reference/discover/create-company-discovery-task)
* [Get task (`GET /v1/discovery_tasks/:task_id`)](/api-reference/discover/get-company-discovery-task)
* [Get results (`GET /v1/discovery_tasks/:task_id/results`)](/api-reference/discover/get-company-discovery-task-results)
* [Resume task (`POST /v1/discovery_tasks/:task_id/resume`)](/api-reference/discover/resume-company-discovery-task)

## Workflow

### 1) Create a Deep Search task

Set `desired_num_results` to your target. Task target is capped at `250`.
If you omit `criteria`, Extruct infers evaluation criteria from `query`.
The create-task response includes `id`, which you will reuse as `TASK_ID`.

```bash theme={null}
TASK_RESPONSE=$(curl -sS -X POST "https://api.extruct.ai/v1/discovery_tasks" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d '{
    "query": "vertical SaaS companies serving freight forwarding",
    "desired_num_results": 150
  }')

TASK_ID=$(echo "${TASK_RESPONSE}" | jq -r '.id')
echo "${TASK_ID}"
```

Requires `jq`. If unavailable, copy `id` manually from response.

Optional: to define the scoring rubric yourself, use this alternate create request instead of the minimal one above.

```bash theme={null}
TASK_RESPONSE=$(curl -sS -X POST "https://api.extruct.ai/v1/discovery_tasks" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d '{
    "query": "vertical SaaS companies serving freight forwarding",
    "desired_num_results": 150,
    "criteria": [
      {
        "key": "has_logistics_focus",
        "name": "Logistics Focus",
        "criterion": "Company serves freight forwarding or logistics operations."
      },
      {
        "key": "b2b_gtm",
        "name": "B2B GTM Fit",
        "criterion": "Company sells primarily to business buyers."
      }
    ]
  }')

TASK_ID=$(echo "${TASK_RESPONSE}" | jq -r '.id')
echo "${TASK_ID}"
```

### 2) Check task progress

Use task status and counters to monitor progress (`num_results_discovered`, `num_results_enriched`, `num_results_evaluated`, `num_results`).

```bash theme={null}
curl --get "https://api.extruct.ai/v1/discovery_tasks/${TASK_ID}" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}"
```

When to proceed: continue after `num_results_evaluated` starts increasing.

### 3) Retrieve results with pagination

```bash theme={null}
RESULTS_RESPONSE=$(curl -sS --get "https://api.extruct.ai/v1/discovery_tasks/${TASK_ID}/results" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  --data-urlencode "offset=0" \
  --data-urlencode "limit=50")

echo "${RESULTS_RESPONSE}" | jq '.results[0]'
```

Example result fragment:

```json theme={null}
{
  "company_name": "FreightFlow",
  "company_website": "https://freightflow.example",
  "relevance": 86,
  "scores": {
    "has_logistics_focus": {
      "grade": 5,
      "explanation": "Primary product serves freight forwarding operations.",
      "sources": ["https://freightflow.example/about"]
    },
    "b2b_gtm": {
      "grade": 4,
      "explanation": "Positioning and case studies are B2B-focused.",
      "sources": ["https://freightflow.example/customers"]
    }
  }
}
```

Rule of thumb: shortlist candidates with `grade` 4-5 on must-have criteria and no `grade` below 3 on blockers.
Use `explanation` and `sources` for manual verification before enrichment.

When to proceed: move forward once you have enough high-fit candidates for your workflow.

### 4) Resume discovery when needed

Use resume to request additional results while staying within the task cap.

```bash theme={null}
curl -X POST "https://api.extruct.ai/v1/discovery_tasks/${TASK_ID}/resume" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d '{"desired_new_results": 25}'
```

### 5) Re-run with exclusions (optional)

Resume is the right tool when you want more results from the same query and criteria. When you want to **change** the query or criteria but skip companies you've already reviewed, start a new task and reference the prior one with `exclude_task_ids`. The new task's results exclude all companies returned by the referenced task at creation time. You can also pass an explicit `exclude_domains` list.

```bash theme={null}
curl -X POST "https://api.extruct.ai/v1/discovery_tasks" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d "{
    \"query\": \"vertical SaaS companies serving freight forwarding in EMEA\",
    \"desired_num_results\": 100,
    \"exclude_task_ids\": [\"${TASK_ID}\"],
    \"exclude_domains\": [\"acme.com\"]
  }"
```

Up to 100 task IDs and a combined cap of 1000 unique resolved domains. Exclusions are frozen at task creation — referenced tasks aren't re-read later.

### 6) Move selected companies to AI Tables

After reviewing Deep Search output, you can move shortlisted companies into AI Tables for enrichment and scoring.
This is only one handoff path. AI Tables also works independently when you already have your own company list.

Use this handoff snippet to map the returned `company_website` into AI Tables `rows[].data.input`:

```bash theme={null}
SHORTLIST_INPUT=$(echo "${RESULTS_RESPONSE}" | jq -r '.results[0].company_website')

curl -X POST "https://api.extruct.ai/v1/tables/${TABLE_ID}/rows" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d "{\"rows\":[{\"data\":{\"input\":\"${SHORTLIST_INPUT}\"}}],\"run\":false}"
```

If you are starting fresh, create `TABLE_ID` first in [AI Tables Basics](/api-guides/ai-tables/ai-table-basics).

## When to choose Deep Search over index search

Prefer Deep Search over Semantic Search or Lookalike Search when the ranking depends on criteria like:

* whether a company serves a specific workflow or sub-vertical
* whether it sells to a specific buyer or team
* whether it meets a custom ICP definition
* whether you need explicit evidence for why each company ranked well

If your need is mostly "find more companies like this" or "search broadly and filter by firmographics," stay on the instant index-based paths first.

## Troubleshooting

### `401 Unauthorized`

The token is missing or invalid.
Check that `EXTRUCT_API_TOKEN` is set and the header is exactly `Authorization: Bearer ${EXTRUCT_API_TOKEN}`.

### `422 Unprocessable Entity`

Common causes:

* Invalid JSON body.
* `desired_num_results` or `desired_new_results` out of allowed range.
* Unsupported request fields.
* An entry in `exclude_domains` that cannot be parsed as a domain, or the resolved exclusion set exceeds 1000 domains.

Validate your request body locally before sending: `echo '<json-body>' | jq empty`.

### Task progress seems slow

Deep Search is asynchronous and includes criteria evaluation per candidate. Track progress through task status and counters.
Poll every 15-30 seconds and continue while `num_results_evaluated` is increasing.
