> ## Documentation Index
> Fetch the complete documentation index at: https://docs.extruct.ai/llms.txt
> Use this file to discover all available pages before exploring further.

> Create a new Deep Search task.

# Create Discovery Task

## Overview

This endpoint creates a Deep Search task and returns a task ID immediately.
Deep Search is the asynchronous discovery path for when indexed similarity and firmographic filters fall short.

## Example request

```bash theme={null}
export EXTRUCT_API_TOKEN="YOUR_API_TOKEN"

curl -X POST "https://api.extruct.ai/v1/discovery_tasks" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d '{
    "query": "vertical SaaS companies serving freight forwarding",
    "desired_num_results": 100
  }'
```

## Key parameters

* `query` (required): Natural-language description of companies to find.
* `desired_num_results` (optional): target result count; default `100`, min `1`, max `250`.
* `auto_data_sources` (optional): default `true`.
* `data_sources` (optional): ignored when `auto_data_sources=true`.
* `criteria` (optional): explicit criterion definitions; each is graded 1-5 in results. If omitted, criteria are inferred from `query`.
* `exclude_domains` (optional): domains or URLs to exclude from results. Server normalizes each entry to its registered domain (`https://www.Acme.com/about` → `acme.com`). Duplicates are collapsed.
* `exclude_task_ids` (optional): up to 100 IDs of completed (`status = "done"`) Deep Search tasks. The visible (returned) companies from those tasks are added to the exclusion set at creation time. Tasks must be owned by you (or someone in your organization).

The combined exclusion set (`exclude_domains` ∪ visible results of `exclude_task_ids`) is capped at **1000 unique domains**. The resolved list is stored on the task and returned in the response as `exclude_domains`.

## Endpoint behavior

* This call creates the task and returns immediately. Use the task ID to poll status and read results.
* Every Deep Search result includes criteria and scores. If you omit `criteria`, Extruct generates them for you. If you provide `criteria`, those are used instead.
* `exclude_*` inputs are resolved once at creation time and frozen on the task. Resuming or re-reading the task uses the same resolved set — references are not re-evaluated later.
* Use [Get Task](/api-reference/discover/get-company-discovery-task) to monitor progress and [Get Task Results](/api-reference/discover/get-company-discovery-task-results) to review the companies.

## Excluding prior results

A common pattern is to run a follow-up Deep Search that skips companies already returned by an earlier run:

```bash theme={null}
curl -X POST "https://api.extruct.ai/v1/discovery_tasks" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${EXTRUCT_API_TOKEN}" \
  -d '{
    "query": "vertical SaaS companies serving freight forwarding",
    "desired_num_results": 100,
    "exclude_domains": ["acme.com", "https://beta.io/about"],
    "exclude_task_ids": ["<prior-task-uuid>"]
  }'
```

## Success signal

A successful response includes task `id` and initial `status`. Save `id` and use it with task/status/results endpoints.

## Common errors

### `401 Unauthorized`

Check that your header is `Authorization: Bearer ${EXTRUCT_API_TOKEN}`.

### `400 Bad Request`

Returned when an `exclude_task_ids` entry references a task that is not in `status = "done"`.

### `403 Forbidden`

Returned when an `exclude_task_ids` entry references a task you do not own (and that is not owned by your organization).

### `404 Not Found`

Returned when an `exclude_task_ids` entry references a task that does not exist.

### `422 Unprocessable Entity`

Common causes:

* Invalid JSON or unsupported fields.
* `desired_num_results` outside `1..250`.
* An entry in `exclude_domains` that cannot be parsed as a domain.
* The resolved exclusion set exceeds the 1000-domain cap. Reduce `exclude_domains` or `exclude_task_ids`.

Validate body first:

```bash theme={null}
echo '<json-body>' | jq empty
```

## Related endpoints

* [Get Task Endpoint](/api-reference/discover/get-company-discovery-task)
* [Get Task Results Endpoint](/api-reference/discover/get-company-discovery-task-results)

## Related guides

* [Deep Search](/api-guides/search/deep-search)


## OpenAPI

````yaml post /v1/discovery_tasks
openapi: 3.1.0
info:
  title: FastAPI
  version: 0.1.0
servers: []
security: []
paths:
  /v1/discovery_tasks:
    post:
      tags:
        - company-discovery
      summary: Create Discovery Task
      operationId: create_discovery_task_v1_discovery_tasks_post
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/DiscoveryTaskInput'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DiscoveryTaskOutput'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
      security:
        - HTTPBearer: []
components:
  schemas:
    DiscoveryTaskInput:
      properties:
        query:
          type: string
          title: Query
          description: Ideal description of companies to find
        desired_num_results:
          type: integer
          maximum: 250
          minimum: 1
          title: Desired Num Results
          description: Target number of results for this task. Maximum is 250.
          default: 100
        table:
          anyOf:
            - $ref: '#/components/schemas/TableForDiscoveryTaskInput'
            - type: 'null'
        auto_data_sources:
          type: boolean
          title: Auto Data Sources
          description: Automatically determine best data sources based on query
          default: true
        data_sources:
          anyOf:
            - items:
                $ref: '#/components/schemas/DiscoveryDataSource'
              type: array
            - type: 'null'
          description: Manual data source selection (ignored if auto_data_sources=True)
        criteria:
          anyOf:
            - items:
                $ref: '#/components/schemas/CriterionDefinition'
              type: array
            - type: 'null'
          description: >-
            Optional criteria for evaluating discovered companies. Each
            criterion will be graded on a 1-5 scale.
        exclude_domains:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Exclude Domains
          description: >-
            Domains (or URLs) of companies the run must not return. Normalized
            server-side to registered domains.
        exclude_task_ids:
          anyOf:
            - items:
                type: string
                format: uuid
              type: array
              maxItems: 100
            - type: 'null'
          title: Exclude Task Ids
          description: >-
            IDs of completed (DONE) Deep Search tasks whose returned companies
            must be excluded from the run. Up to 100 tasks. Tasks must belong to
            the caller (or their organization).
      type: object
      required:
        - query
      title: DiscoveryTaskInput
    DiscoveryTaskOutput:
      properties:
        id:
          type: string
          title: Id
        created_at:
          type: string
          format: date-time
          title: Created At
        status:
          $ref: '#/components/schemas/Status'
        query:
          type: string
          title: Query
        desired_num_results:
          type: integer
          title: Desired Num Results
        is_exhausted:
          type: boolean
          title: Is Exhausted
          default: false
        num_results_discovered:
          type: integer
          title: Num Results Discovered
          description: Total number of company candidates discovered from search
          default: 0
        num_results_enriched:
          type: integer
          title: Num Results Enriched
          description: Number of candidates enriched with company profiles
          default: 0
        num_results_evaluated:
          type: integer
          title: Num Results Evaluated
          description: Number of candidates that had criteria evaluation completed
          default: 0
        num_results:
          type: integer
          title: Num Results
          description: Total number of results
          default: 0
        table_id:
          anyOf:
            - type: string
            - type: 'null'
        auto_data_sources:
          type: boolean
          title: Auto Data Sources
          default: true
        data_sources:
          anyOf:
            - items:
                $ref: '#/components/schemas/DiscoveryDataSource'
              type: array
            - type: 'null'
        criteria:
          anyOf:
            - items:
                $ref: '#/components/schemas/CriterionDefinition'
              type: array
            - type: 'null'
        exclude_domains:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Exclude Domains
          description: >-
            The resolved, sorted list of registered domains that were excluded
            from this run (union of `exclude_domains` and the visible results of
            `exclude_task_ids` at creation time). Capped at 1000.
      type: object
      required:
        - id
        - created_at
        - status
        - query
        - desired_num_results
      title: DiscoveryTaskOutput
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    TableForDiscoveryTaskInput:
      properties:
        id:
          type: string
          format: uuid
          title: Id
          description: Table to associate with discovery results
        run:
          type: boolean
          title: Run
          description: Run the table after adding the results
          default: false
        columns:
          anyOf:
            - items:
                type: string
                format: uuid
              type: array
            - type: 'null'
        auto_import:
          type: boolean
          title: Auto Import
          description: Automatically import discovered companies to the table
          default: false
      type: object
      required:
        - id
      title: TableForDiscoveryTaskInput
    DiscoveryDataSource:
      type: string
      enum:
        - web_search
        - linkedin
        - maps
      title: DiscoveryDataSource
    CriterionDefinition:
      properties:
        key:
          type: string
          pattern: ^[a-zA-Z][a-zA-Z0-9_]*$
          title: Key
          description: Unique identifier for this criterion (e.g., "is_ai_company")
        name:
          type: string
          maxLength: 100
          minLength: 1
          title: Name
          description: Human-readable name for this criterion (e.g., "AI Company")
        criterion:
          type: string
          maxLength: 500
          minLength: 10
          title: Criterion
          description: The actual criterion text to evaluate against companies
      type: object
      required:
        - key
        - name
        - criterion
      title: CriterionDefinition
      description: Definition of a criterion for evaluating companies in discovery tasks
    Status:
      type: string
      enum:
        - created
        - in_progress
        - done
        - failed
      title: Status
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
  securitySchemes:
    HTTPBearer:
      type: http
      scheme: bearer

````