
# Search Methods

All search methods on `TextIndex<T>`. The unified `search()` dispatches to any combination of modes via `SearchOptions.modes`. Specialized methods provide direct access to individual engines.

---

## search()

Hybrid search combining multiple modes with score fusion.

```gcl
fn search(query: String, k: int, options: SearchOptions?): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query text |
| `k` | `int` | Number of results to return |
| `options` | `SearchOptions?` | Optional search options (overrides config) |

**Returns:** Array of `TextResult` sorted by score descending

**Example**

```gcl
var _results = index.search("machine learning", 10, null);

var w = Map<SearchMode, float> {};
w.set(SearchMode::bm25, 0.7);
w.set(SearchMode::semantic, 0.3);
var options = SearchOptions { weights: w };
var _results2 = index.search("query", 5, options);
```

**Notes**

- Automatically combines BM25 + exact + fuzzy + semantic by default
- Uses RRF or linear fusion (configurable per call via `options.fusionMethod`)
- Short-circuit optimization may skip modes when BM25 has a strong winner
- Set `options.typoTolerance = true` to enable BM25 typo tolerance for the query

---

## search_bm25()

BM25 probabilistic ranking search.

```gcl
fn search_bm25(query: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query |
| `k` | `int` | Number of results |

**Returns:** Results ranked by BM25 score

**Example**

```gcl
var _results = index.search_bm25("machine learning algorithms", 10);
```

**Notes**

- Respects configured BM25 variant (Lucene, BM25+, etc.)
- Expands synonyms if configured
- Filters stop words

---

## search_bm25_f()

Multi-field BM25 search with field weights.

```gcl
fn search_bm25_f(query: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query |
| `k` | `int` | Number of results |

**Returns:** Results ranked by field-weighted BM25F scores

**Example**

```gcl
var _results = index.search_bm25_f("machine learning", 10);
// Title matches score higher than body matches
```

**Notes**

- Requires `fields` configuration in `TextIndexConfig`
- Falls back to standard BM25 if no fields configured

---

## search_bm25_batch()

Execute multiple BM25 searches in batch.

```gcl
fn search_bm25_batch(queries: Array<String>, k: int): Array<Array<TextResult>>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `queries` | `Array<String>` | Array of query strings |
| `k` | `int` | Number of results per query |

**Returns:** Array of result arrays (one per query)

**Example**

```gcl
var queries = Array<String> {};
queries.add("machine learning");
queries.add("neural networks");
var _allResults = index.search_bm25_batch(queries, 5);
```

---

## search_semantic()

Vector similarity search using embeddings.

```gcl
fn search_semantic(query: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query |
| `k` | `int` | Number of results |

**Returns:** Results ranked by cosine similarity

**Example**

```gcl
var _results = index.search_semantic("artificial intelligence", 10);
// Finds conceptually similar documents
```

**Notes**

- Requires `embed` function in config (user-provided `fn(text: String): Tensor`)
- Embeds query and searches VectorIndex for nearest neighbors
- Score = `1.0 / (1.0 + distance)`
- Returns empty array if no embed function configured or no vectorIndex exists

---

## search_exact()

Exact normalized substring matching.

```gcl
fn search_exact(query: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query |
| `k` | `int` | Number of results |

**Returns:** Matching documents (score = 1.0)

**Example**

```gcl
var _results = index.search_exact("machine learning", 10);
// Matches "Machine Learning", "MACHINE LEARNING", etc.
```

**Notes**

- Case-insensitive (normalized)
- Exact substring containment

> **Note:** `search_exact` is a binary substring filter. Every match returns `score = 1.0` and results are sorted alphabetically by key, not by relevance. It is intended as a binary boost in hybrid fusion (e.g. the `keyword` preset). For graded relevance ranking, use `search_bm25` or `search_phrase`.

---

## search_fuzzy()

Levenshtein distance fuzzy search.

```gcl
fn search_fuzzy(query: String, k: int, options: FuzzyOptions?): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query |
| `k` | `int` | Number of results |
| `options` | `FuzzyOptions?` | Optional fuzzy parameters (default: max edits 2, mode key) |

**Returns:** Results ranked by similarity score

**Example**

```gcl
// Default: whole-document fuzzy match with maxEdits=2
var _results = index.search_fuzzy("algoritm", 10, null);

// Custom edits and mode
var _results2 = index.search_fuzzy("algoritm lerning", 10, FuzzyOptions {
    maxEdits: 1,
    mode: FuzzyMode::term
});
```

**FuzzyOptions**

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `maxEdits` | `int?` | `2` | Maximum Levenshtein edit distance |
| `mode` | `FuzzyMode?` | `key` | `key` = whole-document, `term` = per-token vocabulary fuzzy |
| `maxTextLength` | `int?` | from `config.fuzzyMaxTextLength` | Skip docs longer than this for `key` mode |

**Notes**

- `FuzzyMode::key` matches whole document keys via Levenshtein on full strings
- `FuzzyMode::term` matches individual query terms against the vocabulary
- Key-mode scoring: for short strings (byte length < 20) Jaro-Winkler is used directly (`score = query.jarowinkler(target)`); for longer strings score is `1.0 - (distance / maxLength)` where `maxLength` is the larger of the two byte lengths. See [Fuzzy Search algorithm](search-modes.md#fuzzy-search-document-level) for details.

---

## search_boolean()

Boolean query search with AND/OR/NOT operators.

```gcl
fn search_boolean(query: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Boolean query string |
| `k` | `int` | Number of results |

**Returns:** Matching documents ranked by BM25 score of matching terms

**Example**

```gcl
var _results = index.search_boolean("(machine OR deep) AND learning NOT survey", 10);
```

**Syntax**

- **AND** -- Both terms required
- **OR** -- Either term matches
- **NOT** -- Exclude term
- **()** -- Grouping
- Implicit AND for adjacent terms
- **WEAKAND(N, t1, t2, ...)** -- At least N of the listed terms must match

---

## search_proximity()

Proximity search finds two terms within N positions of each other.

```gcl
fn search_proximity(term1: String, term2: String, k: int, options: ProximityOptions?): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `term1` | `String` | First term |
| `term2` | `String` | Second term |
| `k` | `int` | Number of results |
| `options` | `ProximityOptions?` | Optional proximity options (default: distance 5) |

**Returns:** Results ranked by proximity score

**Example**

```gcl
var _results = index.search_proximity("machine", "learning", 10, ProximityOptions { distance: 5 });
// Matches if terms within 5 tokens
```

**Scoring:** `score = max(0.0, 1.0 - (minDistance / (distance + 1)))`

---

## search_phrase()

Exact phrase matching with positional verification.

```gcl
fn search_phrase(phrase: String, k: int, options: PhraseOptions?): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `phrase` | `String` | Exact phrase to search |
| `k` | `int` | Number of results |
| `options` | `PhraseOptions?` | Optional phrase options (default: slop 0) |

**Returns:** Results containing the phrase, ranked by BM25

**Example**

```gcl
var _results = index.search_phrase("machine learning", 10, null);
// Matches "machine learning" but not "machine deep learning"

var _sloppy = index.search_phrase("brown fox", 10, PhraseOptions { slop: 1 });
// Allows 1 intervening word: "brown lazy fox"
```

**Notes**

- Terms must appear consecutively (when slop = 0)
- Scored using BM25 of phrase terms

---

## search_prefix()

Prefix search finds documents with terms starting with prefix.

```gcl
fn search_prefix(prefix: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `prefix` | `String` | Term prefix to match |
| `k` | `int` | Number of results |

**Returns:** Results ranked by IDF accumulation

**Example**

```gcl
var _results = index.search_prefix("algo", 10);
// Matches documents with "algorithm", "algorithms", "algorithmic"
```

**Notes**

- Uses the trie built during `build()` for O(prefix_len + matches) lookup
- When `edgeNgram.enabled = true`, uses the edge n-gram index for O(1) lookup

---

## search_wildcard()

Wildcard pattern search on term vocabulary.

```gcl
fn search_wildcard(pattern: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `pattern` | `String` | Wildcard pattern (`*` = any sequence, `?` = any character) |
| `k` | `int` | Number of results |

**Returns:** Results ranked by accumulated BM25 scores from matching terms

**Example**

```gcl
var _results = index.search_wildcard("algo*", 10);
// Matches terms: "algorithm", "algorithms", "algorithmic"

var _results2 = index.search_wildcard("te?t", 10);
// Matches "test", "text"
```

**Notes**

- `*` matches any sequence of characters (including empty)
- `?` matches exactly one character
- Scans term vocabulary and collects matching terms

---

## search_span()

Span query search with ordered/unordered positional constraints.

```gcl
fn search_span(spanQuery: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `spanQuery` | `String` | Span query string |
| `k` | `int` | Maximum results |

**Returns:** Matching documents (score = 1.0 for all matches)

**Syntax**

- **NEAR(t1, t2, dist)** -- Both terms within *dist* positions (any order)
- **ONEAR(t1, t2, dist)** -- t1 appears before t2 within *dist* positions
- **FIRST(t, window)** -- Term appears in first *window* tokens (position < window)

**Example**

```gcl
var _results = index.search_span("NEAR(machine, learning, 3)", 10);

var _ordered = index.search_span("ONEAR(quick, fox, 5)", 10);

var _first = index.search_span("FIRST(introduction, 10)", 10);
```

---

## more_like_this()

Find similar documents via top TF-IDF terms.

```gcl
fn more_like_this(key: String, k: int, options: MoreLikeThisOptions?): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `key` | `String` | Reference document text (the first argument passed to `add()`) |
| `k` | `int` | Number of similar documents to return |
| `options` | `MoreLikeThisOptions?` | Optional MLT options (default: maxQueryTerms 10) |

**Returns:** Similar documents ranked by BM25 score, excluding the reference document

**Example**

```gcl
var _similar = index.more_like_this("Machine learning algorithms for data", 5, null);
// Returns documents similar to the reference

var _customMlt = index.more_like_this("Machine learning algorithms for data", 10, MoreLikeThisOptions { maxQueryTerms: 20 });
```

**Notes**

- Extracts top TF-IDF terms from the reference document
- Uses those terms as a BM25 query
- Excludes the reference document from results

---

## search_faceted()

Faceted search with term and numeric range facets.

```gcl
fn search_faceted(query: String, k: int, facetRequests: Array<FacetRequest>): AdvancedFacetedResult
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query |
| `k` | `int` | Number of results |
| `facetRequests` | `Array<FacetRequest>` | Facet request specifications |

**Returns:** `AdvancedFacetedResult` with ranked results, term facets, and numeric facets

See [Facets & Aggregations](./facets-aggregations.md) for detailed examples and the `FacetRequest` schema.

---

## search_dfr()

Divergence From Randomness scoring search.

```gcl
fn search_dfr(query: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query |
| `k` | `int` | Number of results |

**Notes**

- Uses DFR basic model, after-effect, and normalization from `config.dfr`
- Configure via `dfr: DFROptions { basicModel, afterEffect, normalization }` in `TextIndexConfig`

---

## search_lm_dirichlet()

Language Model search with Dirichlet smoothing.

```gcl
fn search_lm_dirichlet(query: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query |
| `k` | `int` | Number of results |

**Notes**

- Uses Dirichlet prior smoothing (`config.lmDirichlet.mu`, default 2000)
- Configure via `lmDirichlet: LMDirichletOptions { mu }` in `TextIndexConfig`

---

## search_phonetic()

Phonetic matching search using Double Metaphone algorithm.

```gcl
fn search_phonetic(query: String, k: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query |
| `k` | `int` | Number of results |

**Notes**

- Requires `usePhonetic: true` in config and `build()` to generate phonetic index
- Matches sound-alike terms (e.g., "Smith" matches "Smyth")

---

## search_quorum()

Minimum-should-match (quorum) search.

```gcl
fn search_quorum(query: String, k: int, minMatch: int): Array<TextResult>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Search query |
| `k` | `int` | Number of results |
| `minMatch` | `int` | Minimum number of query terms that must match |

**Notes**

- Documents must contain at least `minMatch` of the query terms
- Scored by fraction of query terms matched (matchedCount / queryTermCount, range 0.0-1.0)

---

## suggest()

Auto-suggest term completions matching a prefix.

```gcl
fn suggest(prefix: String, k: int): Array<Suggestion>
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `prefix` | `String` | Term prefix to complete |
| `k` | `int` | Number of suggestions |

**Returns:** Array of `Suggestion` (term, score, document frequency)

---

## did_you_mean()

Spell correction / "did you mean?" for mistyped queries.

```gcl
fn did_you_mean(query: String): DidYouMeanResult
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `String` | Potentially misspelled query |

**Returns:** `DidYouMeanResult` with original query, corrected query, and per-term corrections

---

> **Note:** Internal helper methods (e.g. `search_exact_core`) appear in the type signature but are not part of the public API. They share normalization and tokenization work between the unified `search()` dispatcher and the specialized methods documented above; call the public methods instead.

---

## Equivalent capabilities through `search()`

Several specialized features are reached through unified options on `search()` and `SearchOptions` rather than dedicated methods.

### Typo tolerance for BM25

Set `SearchOptions.typoTolerance = true` together with BM25 mode (the default), or enable globally via `config.typoTolerance.enabled = true`.

```gcl
var modes = Array<SearchMode> {};
modes.add(SearchMode::bm25);

var options = SearchOptions {
    modes: modes,
    typoTolerance: true
};
var _results = index.search("algorthm lerning", 10, options);
```

### Proximity filter on BM25

Set `SearchOptions.proximityFilter = true` and tune the distance via `proximity: ProximityOptions { distance }`.

```gcl
var modes = Array<SearchMode> {};
modes.add(SearchMode::bm25);

var options = SearchOptions {
    modes: modes,
    proximityFilter: true,
    proximity: ProximityOptions { distance: 5 }
};
var _results = index.search("machine learning", 10, options);
```

### Term-level fuzzy

Pass `FuzzyOptions { mode: FuzzyMode::term }` to `search_fuzzy()` to match individual query tokens against the vocabulary instead of whole document keys.

### Boosting per query term

Use `SearchOptions.termBoosts: Array<TermBoost>` to apply per-term multipliers in BM25 scoring.

```gcl
var boosts = Array<TermBoost> {};
boosts.add(TermBoost { term: "machine", boost: 2.0 });
boosts.add(TermBoost { term: "data", boost: 0.5 });

var modes = Array<SearchMode> {};
modes.add(SearchMode::bm25);

var options = SearchOptions { modes: modes, termBoosts: boosts };
var _results = index.search("machine learning data", 10, options);
```

### Restricting to a subset of keys

Use `SearchOptions.filter: Array<String>` to restrict search to a subset of document keys.

```gcl
var allowed = Array<String> {};
allowed.add("doc1");
allowed.add("doc3");

var modes = Array<SearchMode> {};
modes.add(SearchMode::bm25);

var options = SearchOptions { modes: modes, filter: allowed };
var _results = index.search("query", 10, options);
```
