
# Types

Data types used across the `TextIndex<T>` API for results, configuration, and diagnostics.

---

## TextIndex<T>

The main user-facing index type. Holds documents, term vocabulary, and derived structures (tries, vector index, phonetic codes, etc.). Constructed from a `TextIndexConfig` and parametrized by the value type `T` stored alongside each document.

```gcl
type TextIndex<T> {
    config: TextIndexConfig;
    totalEntries: int?;
    totalTokens: int?;
    totalTerms: int?;
    avgTokenCount: float?;
    entries: nodeIndex<String, node<IndexEntry>>?;
    normalizedTerms: nodeIndex<String, node<NormalizedTerm>>?;
    contentHashes: nodeIndex<String, node<IndexEntry>>?;
    vectorIndex: node<VectorIndex<node<IndexEntry>>>?;
    built: bool?;
    // ... derived caches and indices populated by build()
}
```

User code interacts with `TextIndex<T>` exclusively through its instance methods (see [Indexing Methods](./indexing.md) and [Search Methods](./search-methods.md)). Derived structures such as `trigramIndex`, `trieRoot`, `phoneticIndex`, `cachedTFCache` are populated automatically by `build()` and should not be set manually.

**Example**

```gcl
var index = TextIndex<String> { config: TextIndexConfig::keyword() };
index.add("Machine learning algorithms", "doc1");
index.build();
var _results = index.search_bm25("learning", 5);
```

---

## TextResult

Search result entry with score and metadata.

```gcl
@volatile
type TextResult {
    key: String;                       // Document text key
    value: any?;                       // Associated value (cast to T)
    score: float;                      // Relevance score
    matchedTerms: Array<String>?;      // Query terms that matched
    chunkKey: String?;                 // "<key>#<position>" when chunk-level semantic search is enabled
}
```

**Fields**

| Field | Type | Description |
|-------|------|-------------|
| `key` | `String` | Document text key |
| `value` | `any?` | Associated value (cast to `T`); may be null when the entry has no user-provided value |
| `score` | `float` | Relevance score |
| `matchedTerms` | `Array<String>?` | Query terms that matched |
| `chunkKey` | `String?` | `"<key>#<position>"` when chunk-level semantic search is enabled (`config.chunking.strategy != none` + `config.embed != null`). Null otherwise. |

---

## TextEntry

Batch entry for `add_batch()` with optional pre-computed vector.

```gcl
@volatile
type TextEntry {
    key: String;        // Document text
    value: any?;        // Associated value
    vector: Tensor?;    // Pre-computed embedding (optional)
}
```

**Fields**

| Field | Type | Description |
|-------|------|-------------|
| `key` | `String` | Document text |
| `value` | `any?` | Associated value |
| `vector` | `Tensor?` | Pre-computed embedding (optional, skips inference) |

---

## TextIndexStats

Index statistics summary.

```gcl
@volatile
type TextIndexStats {
    totalEntries: int;     // Number of indexed documents
    totalTerms: int;       // Vocabulary size
    avgTokenCount: float;  // Average document length
}
```

**Fields**

| Field | Type | Description |
|-------|------|-------------|
| `totalEntries` | `int` | Number of indexed documents |
| `totalTerms` | `int` | Vocabulary size |
| `avgTokenCount` | `float` | Average document length in tokens |

---

## SearchOptions

Per-query options overriding config defaults. All fields are optional; unset fields fall back to index config.

```gcl
@volatile
type SearchOptions {
    modes: Array<SearchMode>?;
    weights: Map<SearchMode, float>?;
    fusionMethod: FusionMethod?;
    normalization: Normalization?;
    rrf_k: int?;
    fuzzy: FuzzyOptions?;
    phrase: PhraseOptions?;
    proximity: ProximityOptions?;
    typoTolerance: bool?;
    minScore: float?;
    diversify: bool?;
    diversityLambda: float?;
    offset: int?;
    proximityFilter: bool?;
    filter: Array<String>?;
    termBoosts: Array<TermBoost>?;
    quorumMinMatch: int?;
}
```

**Fields**

| Field | Type | Description |
|-------|------|-------------|
| `modes` | `Array<SearchMode>?` | Search modes to execute. Null/empty = default hybrid (BM25+exact+fuzzy+semantic). Single entry = direct dispatch. Multiple entries = fuse with weights. |
| `weights` | `Map<SearchMode, float>?` | Per-mode weights in hybrid fusion (overrides `config.fusion.weights`) |
| `fusionMethod` | `FusionMethod?` | Score fusion method (RRF or linear) |
| `normalization` | `Normalization?` | Score normalization method |
| `rrf_k` | `int?` | RRF constant k (default 60) |
| `fuzzy` | `FuzzyOptions?` | Per-engine fuzzy parameters (used when `SearchMode::fuzzy` is in `modes`) |
| `phrase` | `PhraseOptions?` | Per-engine phrase parameters (used when `SearchMode::phrase` is in `modes`) |
| `proximity` | `ProximityOptions?` | Per-engine proximity parameters (used when `SearchMode::proximity` is in `modes`) |
| `typoTolerance` | `bool?` | Enable typo tolerance in BM25 mode |
| `minScore` | `float?` | Minimum score threshold for results |
| `diversify` | `bool?` | Enable MMR diversity re-ranking |
| `diversityLambda` | `float?` | MMR lambda (0.0 = max diversity, 1.0 = pure relevance, default: 0.7) |
| `offset` | `int?` | Skip the first N results for pagination (default: 0) |
| `proximityFilter` | `bool?` | Discard docs where no query term pair appears within `proximity.distance` |
| `filter` | `Array<String>?` | Restrict the search to a subset of document keys |
| `termBoosts` | `Array<TermBoost>?` | Per-term boost multipliers for BM25 scoring |
| `quorumMinMatch` | `int?` | Minimum match count for quorum queries (default: 1) |

> **Note on function scoring, curation, and ranking rules:** These features are implemented as standalone helpers (`FunctionScoreEngine::apply()`, `CurationHelper::apply_curation()`, `RankingRulesEngine::apply()`) that you call after `search_bm25()` / `search()`. They are not driven through `SearchOptions`. See [Function Scoring & Curation](./function-scoring.md).

**Example**

```gcl
var w = Map<SearchMode, float> {};
w.set(SearchMode::bm25, 0.7);
w.set(SearchMode::fuzzy, 0.3);

var options = SearchOptions {
    weights: w,
    minScore: 0.2,
    diversify: true
};
var _results = index.search("query", 10, options);
```

---

## FuzzyOptions

Per-query fuzzy parameters used by `search_fuzzy()` and the `SearchMode::fuzzy` engine.

```gcl
@volatile
type FuzzyOptions {
    maxEdits: int?;
    mode: FuzzyMode?;
    maxTextLength: int?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `maxEdits` | `int?` | `2` | Maximum Levenshtein edit distance |
| `mode` | `FuzzyMode?` | `key` | `key` = whole-document matching, `term` = per-token vocabulary matching |
| `maxTextLength` | `int?` | from `config.fuzzyMaxTextLength` | Skip docs whose text exceeds this length |

---

## PhraseOptions

Per-query phrase parameters used by `search_phrase()` and the `SearchMode::phrase` engine.

```gcl
@volatile
type PhraseOptions {
    slop: int?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `slop` | `int?` | `0` | Maximum positional deviation between query terms (0 = exact phrase) |

---

## ProximityOptions

Per-query proximity parameters used by `search_proximity()` and the `SearchMode::proximity` engine.

```gcl
@volatile
type ProximityOptions {
    distance: int?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `distance` | `int?` | `5` | Maximum token distance between the two terms |

---

## MoreLikeThisOptions

Per-query options for `more_like_this()`.

```gcl
@volatile
type MoreLikeThisOptions {
    maxQueryTerms: int?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `maxQueryTerms` | `int?` | `10` | Maximum number of top TF-IDF terms extracted from the source document |

---

## SnippetOptions

Per-query options for `snippet()` and `snippets()`.

```gcl
@volatile
type SnippetOptions {
    maxLength: int?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `maxLength` | `int?` | `200` | Maximum snippet length in characters |

---

## Snippet

Result of `snippet()` and `snippets()` — plain text plus highlighted text in one shape.

```gcl
@volatile
type Snippet {
    text: String;
    highlighted: String;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `text` | `String` | Plain snippet text (no markup) |
| `highlighted` | `String` | Same snippet with matched query terms wrapped in `config.highlight.preTag` / `postTag` |

---

## ScoreExplanation

BM25 score explanation with per-term details.

```gcl
@volatile
type ScoreExplanation {
    totalScore: float;
    terms: Array<TermExplanation>;
    variant: BM25Variant;
    k1: float;
    b: float;
    docLen: int;
    avgDocLen: float;
}
```

**Fields**

| Field | Type | Description |
|-------|------|-------------|
| `totalScore` | `float` | Total BM25 score for the query-document pair |
| `terms` | `Array<TermExplanation>` | Per-term score breakdowns |
| `variant` | `BM25Variant` | BM25 variant used for scoring |
| `k1` | `float` | BM25 k1 parameter |
| `b` | `float` | BM25 b parameter |
| `docLen` | `int` | Document length in tokens |
| `avgDocLen` | `float` | Average document length across the index |

---

## TermExplanation

Per-term BM25 score breakdown.

```gcl
@volatile
type TermExplanation {
    term: String;
    tf: float;         // Term frequency
    idf: float;        // Inverse document frequency
    tfNorm: float;     // Length-normalized TF
    score: float;      // Contribution to total score
}
```

**Fields**

| Field | Type | Description |
|-------|------|-------------|
| `term` | `String` | The term being scored |
| `tf` | `float` | Term frequency in the document |
| `idf` | `float` | Inverse document frequency |
| `tfNorm` | `float` | Length-normalized TF |
| `score` | `float` | This term's contribution to the total score |

---

## FieldConfig

Field configuration for BM25F multi-field scoring. The `f` field is a typed
GCL `field` reference (compile-time checked against the document type), not a
string name.

```gcl
@volatile
type FieldConfig {
    f: field;           // Typed reference to the document field whose text is indexed
    weight: float;      // Field importance weight
    fieldB: float?;     // Field-specific length normalization
}
```

**Fields**

| Field | Type | Description |
|-------|------|-------------|
| `f` | `field` | Typed reference to a `String`/`String?` field on the document type |
| `weight` | `float` | Field importance weight (higher = more important) |
| `fieldB` | `float?` | Field-specific length normalization (overrides global `bm25.b`) |

**Example**

```gcl
type Article { title: String; body: String; tags: String?; }

var _fieldCfg = FieldConfig {
    f: Article::title,
    weight: 3.0,
    fieldB: 0.3
};
```

If `TextIndexConfig.fields` is left null, `TextIndex.add_fields` auto-discovers
every `String`/`String?` field on the document type at weight 1.0 on the first
call.

---

## NormOptions

Advanced normalization options applied before the standard pipeline.

```gcl
@volatile
type NormOptions {
    stripAccents: bool?;
    stripControlChars: bool?;
    stripHtmlTags: bool?;
    decodeHtmlEntities: bool?;
    stripUrls: bool?;
    stripEmails: bool?;
    normalizeQuotes: bool?;
    normalizeLineBreaks: bool?;
    normalizeRepeatingChars: bool?;
    maxRepeat: int?;
    rejoinHyphenatedWords: bool?;
}
```

**Fields**

| Field | Type | Description |
|-------|------|-------------|
| `stripAccents` | `bool?` | Remove diacritical marks (e.g., e -> e) |
| `stripControlChars` | `bool?` | Remove control characters |
| `stripHtmlTags` | `bool?` | Strip HTML tags from text |
| `decodeHtmlEntities` | `bool?` | Decode HTML entities (e.g., `&amp;` -> `&`) |
| `stripUrls` | `bool?` | Remove URLs from text |
| `stripEmails` | `bool?` | Remove email addresses from text |
| `normalizeQuotes` | `bool?` | Normalize smart quotes to standard quotes |
| `normalizeLineBreaks` | `bool?` | Normalize line breaks to spaces |
| `normalizeRepeatingChars` | `bool?` | Collapse repeating characters |
| `maxRepeat` | `int?` | Maximum allowed repetitions (used with `normalizeRepeatingChars`) |
| `rejoinHyphenatedWords` | `bool?` | Rejoin hyphenated words (e.g., "self-driving" -> "selfdriving") |

---

## TokenizationOptions

Tokenization and term-normalization settings nested inside `TextIndexConfig.tokenization`.

```gcl
type TokenizationOptions {
    separators: Array<String>?;
    minTermLength: int?;
    maxTermLength: int?;
    filterNumericTerms: bool?;
    caseFold: bool?;
    stripPunctuation: bool?;
    stemming: bool?;
    charMap: Map<String, String>?;
    useDefaultCharMap: bool?;
    normOptions: NormOptions?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `separators` | `Array<String>?` | `[" "]` | Token separator characters |
| `minTermLength` | `int?` | `2` | Minimum term length to index |
| `maxTermLength` | `int?` | `100` | Maximum term length to index |
| `filterNumericTerms` | `bool?` | `true` | Filter out purely numeric terms |
| `caseFold` | `bool?` | `true` | Apply case folding/lowercasing |
| `stripPunctuation` | `bool?` | `true` | Strip punctuation from terms |
| `stemming` | `bool?` | `false` | Apply Porter stemming |
| `charMap` | `Map<String, String>?` | `null` | Custom character mapping for normalization |
| `useDefaultCharMap` | `bool?` | `true` | Use built-in Unicode -> ASCII map |
| `normOptions` | `NormOptions?` | `null` | Advanced normalization options |

---

## StopWordOptions

Stop word handling configuration nested inside `TextIndexConfig.stopWords`.

```gcl
type StopWordOptions {
    mode: StopWordMode?;
    language: TextSearchLanguage?;
    custom: Array<String>?;
    autoThreshold: float?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `mode` | `StopWordMode?` | `none` | Stop word handling mode |
| `language` | `TextSearchLanguage?` | `en` | Language for built-in stop word list |
| `custom` | `Array<String>?` | `null` | Custom stop word list (used with `StopWordMode::custom`) |
| `autoThreshold` | `float?` | `0.85` | Document-frequency threshold for `StopWordMode::auto` |

---

## BM25Options

BM25 scoring parameters nested inside `TextIndexConfig.bm25`.

```gcl
type BM25Options {
    k1: float?;
    b: float?;
    variant: BM25Variant?;
    delta: float?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `k1` | `float?` | `1.5` | Term-frequency saturation parameter |
| `b` | `float?` | `0.75` | Length normalization parameter |
| `variant` | `BM25Variant?` | `lucene` | Scoring variant |
| `delta` | `float?` | `0.5` | Delta parameter for BM25+ and BM25L variants |

---

## RRFOptions

Reciprocal Rank Fusion parameters nested inside `FusionOptions.rrf`.

```gcl
type RRFOptions {
    k: int?;
    topRankBonus: bool?;
    topBonus: float?;
    nearTopBonus: float?;
    nearTopCutoff: int?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `k` | `int?` | `60` | RRF k parameter (higher = less rank emphasis) |
| `topRankBonus` | `bool?` | `true` | Enable top-rank bonus for #1 results |
| `topBonus` | `float?` | `0.05` | Bonus added to rank-1 results |
| `nearTopBonus` | `float?` | `0.02` | Bonus added to near-top results |
| `nearTopCutoff` | `int?` | `2` | Near-top rank cutoff threshold |

---

## FusionOptions

Score-fusion configuration for hybrid search, nested inside `TextIndexConfig.fusion`.

```gcl
type FusionOptions {
    method: FusionMethod?;
    normalization: Normalization?;
    weights: Map<SearchMode, float>?;
    rrf: RRFOptions?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `method` | `FusionMethod?` | `rrf` | Fusion method |
| `normalization` | `Normalization?` | `minmax` | Score normalization for linear fusion |
| `weights` | `Map<SearchMode, float>?` | built-in | Per-mode weights; missing keys fall back to defaults (bm25=0.4, semantic=0.6, fuzzy=0.2, exact=0.3, ...) |
| `rrf` | `RRFOptions?` | `null` | RRF sub-options |

---

## TypoOptions

Typo-tolerance configuration for the BM25 search path, nested inside `TextIndexConfig.typoTolerance`.

```gcl
type TypoOptions {
    enabled: bool?;
    minWordLength: int?;
    maxEdits1: int?;
    maxEdits2: int?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | `bool?` | `false` | Enable automatic typo tolerance |
| `minWordLength` | `int?` | `4` | Minimum word length to apply typo tolerance |
| `maxEdits1` | `int?` | `1` | Maximum typos for words 5-8 characters long |
| `maxEdits2` | `int?` | `2` | Maximum typos for words 9+ characters long |

---

## EdgeNgramOptions

Edge n-gram indexing configuration for fast prefix search, nested inside `TextIndexConfig.edgeNgram`.

```gcl
type EdgeNgramOptions {
    enabled: bool?;
    min: int?;
    max: int?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | `bool?` | `false` | Enable edge n-gram indexing |
| `min` | `int?` | `2` | Minimum prefix length to index |
| `max` | `int?` | `20` | Maximum prefix length to index |

---

## ShortCircuitOptions

Short-circuit optimization for strong BM25 signals during hybrid search, nested inside `TextIndexConfig.shortCircuit`.

```gcl
type ShortCircuitOptions {
    enabled: bool?;
    minScore: float?;
    minGap: float?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | `bool?` | `true` | Enable short-circuit optimization |
| `minScore` | `float?` | `0.85` | Minimum normalized score to trigger short-circuit |
| `minGap` | `float?` | `0.15` | Minimum score gap to second result required |

---

## DiversifyOptions

MMR diversity re-ranking configuration nested inside `TextIndexConfig.diversify`.

```gcl
type DiversifyOptions {
    enabled: bool?;
    lambda: float?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | `bool?` | `false` | Enable MMR diversity re-ranking |
| `lambda` | `float?` | `0.7` | Lambda 0-1 (0.0 = max diversity, 1.0 = pure relevance) |

---

## ChunkingOptions

Text chunking for semantic search and RAG pipelines, nested inside `TextIndexConfig.chunking`.

```gcl
type ChunkingOptions {
    strategy: ChunkStrategy?;
    size: int?;
    overlap: int?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `strategy` | `ChunkStrategy?` | `none` | Chunking strategy |
| `size` | `int?` | `256` | Chunk size in words |
| `overlap` | `int?` | `50` | Chunk overlap in words |

---

## DFROptions

Divergence From Randomness scoring parameters nested inside `TextIndexConfig.dfr`.

```gcl
type DFROptions {
    basicModel: DFRBasicModel?;
    afterEffect: DFRAfterEffect?;
    normalization: DFRNormalization?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `basicModel` | `DFRBasicModel?` | `G` | Basic information model |
| `afterEffect` | `DFRAfterEffect?` | `Laplace` | After-effect model |
| `normalization` | `DFRNormalization?` | `H2` | Length normalization |

---

## LMDirichletOptions

Language Model Dirichlet smoothing parameters nested inside `TextIndexConfig.lmDirichlet`.

```gcl
type LMDirichletOptions {
    mu: float?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `mu` | `float?` | `2000` | Dirichlet smoothing parameter |

---

## HighlightOptions

Snippet/highlight markup configuration nested inside `TextIndexConfig.highlight`. Used by `snippet()` and `snippets()`.

```gcl
type HighlightOptions {
    preTag: String?;
    postTag: String?;
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `preTag` | `String?` | `"<em>"` | Markup inserted before a matched term |
| `postTag` | `String?` | `"</em>"` | Markup inserted after a matched term |

---

## TermBoost

Term-level boost for weighted BM25 scoring.

```gcl
@volatile
type TermBoost {
    term: String;
    boost: float;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `term` | `String` | Term to boost |
| `boost` | `float` | Boost multiplier (>1.0 increases, <1.0 decreases weight) |

---

## CurationRule

A curation rule for pinning, boosting, or suppressing specific documents.

```gcl
@volatile
type CurationRule {
    documentKey: String;
    position: int?;
    boost: float?;
    suppress: bool?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `documentKey` | `String` | Document key to apply the rule to |
| `position` | `int?` | Pin to this position (0-indexed) |
| `boost` | `float?` | Multiply score by this factor |
| `suppress` | `bool?` | Remove from results if true |

---

## FunctionScoreConfig

Configuration for function scoring (decay functions and field value factors).

```gcl
@volatile
type FunctionScoreConfig {
    decayFunctions: Array<DecayFunction>?;
    fieldValueFactors: Array<FieldValueFactor>?;
    scoreMode: ScoreMode?;
    boostMode: BoostMode?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `decayFunctions` | `Array<DecayFunction>?` | Decay functions to apply |
| `fieldValueFactors` | `Array<FieldValueFactor>?` | Field value factors to apply |
| `scoreMode` | `ScoreMode?` | How to combine multiple function scores (default: `multiply`) |
| `boostMode` | `BoostMode?` | How to combine function score with base score (default: `multiply`) |

---

## FacetRequest

A facet request specifying which field to aggregate and how.

```gcl
@volatile
type FacetRequest {
    f: field;
    facetType: FacetType?;
    ranges: Array<NumericRangeBucket>?;
    maxTerms: int?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `f` | `field` | Typed reference to the document field to facet on |
| `facetType` | `FacetType?` | Type of facet (term or numericRange, default: term) |
| `ranges` | `Array<NumericRangeBucket>?` | Range buckets (required for numericRange type) |
| `maxTerms` | `int?` | Maximum number of facet values to return (default: 10) |

---

## NumericRangeBucket

A numeric range bucket for numeric range faceting.

```gcl
@volatile
type NumericRangeBucket {
    label: String;
    from: float?;
    to: float?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `label` | `String` | Human-readable label (e.g., "0-100", "cheap") |
| `from` | `float?` | Lower bound (inclusive). Null means -infinity. |
| `to` | `float?` | Upper bound (exclusive). Null means +infinity. |

---

## AdvancedFacetedResult

Result of an advanced faceted search with term and numeric range facets.

```gcl
@volatile
type AdvancedFacetedResult {
    results: Array<TextResult>;
    termFacets: Map<field, Array<TermCount>>;
    numericFacets: Map<field, Array<NumericBucketCount>>;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `results` | `Array<TextResult>` | Ranked search results |
| `termFacets` | `Map<field, Array<TermCount>>` | Term facet counts keyed by typed field ref |
| `numericFacets` | `Map<field, Array<NumericBucketCount>>` | Numeric range facet counts keyed by typed field ref |

---

## TermCount

A single facet value with its count.

```gcl
@volatile
type TermCount {
    value: String;
    count: int;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `value` | `String` | Facet value |
| `count` | `int` | Number of matching documents |

---

## NumericBucketCount

A numeric range bucket with its document count.

```gcl
@volatile
type NumericBucketCount {
    label: String;
    count: int;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `label` | `String` | Bucket label |
| `count` | `int` | Number of matching documents |

---

## Suggestion

Auto-suggest result with term, relevance score, and document frequency.

```gcl
@volatile
type Suggestion {
    term: String;
    score: float;
    df: int;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `term` | `String` | Suggested term |
| `score` | `float` | Relevance score |
| `df` | `int` | Document frequency (number of documents containing this term) |

---

## DidYouMeanResult

Spell correction result with original query, corrected query, and per-term corrections.

```gcl
@volatile
type DidYouMeanResult {
    originalQuery: String;
    correctedQuery: String?;
    corrections: Array<String>;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `originalQuery` | `String` | The original (possibly misspelled) query |
| `correctedQuery` | `String?` | The corrected query, or `null` if no correction needed |
| `corrections` | `Array<String>` | Per-term corrections |

---

## DecayFunction

Decay function configuration for function scoring (gaussian, linear, or exponential decay).

```gcl
@volatile
type DecayFunction {
    f: field;
    origin: float;
    scale: float;
    offset: float?;
    decayType: DecayType;
    decayValue: float?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `f` | `field` | Typed reference to the numeric (`float`/`int`) field to compute distance from |
| `origin` | `float` | Origin point (optimal value) |
| `scale` | `float` | Scale parameter controlling decay width |
| `offset` | `float?` | Offset before decay starts (default: 0) |
| `decayType` | `DecayType` | Decay function type (gaussian, linear, exponential) |
| `decayValue` | `float?` | Target decay value at `scale` distance from origin (default: 0.5) |

---

## FieldValueFactor

Field value factor for function scoring -- boosts results based on a numeric field.

```gcl
@volatile
type FieldValueFactor {
    f: field;
    factor: float?;
    modifier: FieldModifier?;
    missing: float?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `f` | `field` | Typed reference to the numeric (`float`/`int`) field to read |
| `factor` | `float?` | Multiplier for the field value (default: 1.0) |
| `modifier` | `FieldModifier?` | Transformation function (default: `none`) |
| `missing` | `float?` | Default value if field is not present (default: 1.0) |

---

## BooleanQuery

Parsed boolean-query AST produced by `BooleanParser::parse()` and consumed by `BooleanEngine`. Each node is either an operator (AND/OR/NOT/WEAKAND) with `left`/`right` children (or `children` for WEAKAND) or a leaf term.

```gcl
type BooleanQuery {
    operator: BooleanOperator?;          // null for leaves
    term: String?;                       // leaf term text
    left: BooleanQuery?;                 // left operand
    right: BooleanQuery?;                // right operand / NOT operand
    weakAndThreshold: int?;              // Minimum match threshold for WEAKAND queries
    children: Array<BooleanQuery>?;      // Child queries for WEAKAND
}
```

---

## ParseResult

Return shape of `BooleanParser::parse()` — the AST plus the position in the input stream where parsing stopped.

```gcl
type ParseResult {
    query: BooleanQuery;
    nextPos: int;
}
```

---

## SpanQuery

Parsed span-query AST for `search_span()` with NEAR / ONEAR / FIRST / TERM nodes.

```gcl
type SpanQuery {
    operator: SpanOperator;
    term: String?;        // TERM nodes only
    left: SpanQuery?;     // q1 for NEAR/ONEAR, q for FIRST
    right: SpanQuery?;    // q2 for NEAR/ONEAR; null for FIRST
    distance: int;        // max distance (NEAR/ONEAR) or window size (FIRST)
}
```

---

## RankingCandidate

Candidate row fed into `RankingRulesEngine::apply()` — carries the precomputed signals that each `RankingRule` consults as tie-breakers.

```gcl
@volatile
type RankingCandidate {
    key: String;
    value: any?;
    score: float;
    matchedTerms: Array<String>?;
    matchedWordCount: int?;     // 'words' rule
    typoCount: int?;            // 'typo' rule
    minProximity: int?;         // 'proximity' rule
    firstMatchPosition: int?;   // 'attribute' rule
    isExactMatch: bool?;        // 'exactness' rule
    sortValue: float?;          // 'sort' rule
}
```

| Field | Type | Description |
|-------|------|-------------|
| `key` | `String` | Document key |
| `value` | `any?` | Associated value |
| `score` | `float` | Base score (from BM25 or another engine) |
| `matchedTerms` | `Array<String>?` | Query terms that matched |
| `matchedWordCount` | `int?` | Number of query terms matched (for `words` rule) |
| `typoCount` | `int?` | Number of typos in matches (for `typo` rule) |
| `minProximity` | `int?` | Minimum proximity between matched terms (for `proximity` rule) |
| `firstMatchPosition` | `int?` | Position of first match (for `attribute` rule) |
| `isExactMatch` | `bool?` | Whether query matches exactly (for `exactness` rule) |
| `sortValue` | `float?` | Custom sort-field value (for `sort` rule) |

Native C qsort in `ranking_rules_native.c` consumes this type directly.

---

## PercolateIndex

Reverse search index: register stored queries and match incoming documents against them.

```gcl
type PercolateIndex {
    config: TextIndexConfig;
    queries: nodeIndex<String, node<PercolatedQuery>>?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `config` | `TextIndexConfig` | Shared configuration for normalization and tokenization |
| `queries` | `nodeIndex<String, node<PercolatedQuery>>?` | Registered percolate queries by ID |

**Methods**

| Method | Description |
|--------|-------------|
| `add_query(id, queryText, mode)` | Register a query for percolation |
| `remove_query(id)` | Remove a registered query |
| `percolate(text, k)` | Match a document against registered queries, return matching query IDs |

---

## PercolatedQuery

A registered percolate query with cached tokens for efficient matching.

```gcl
type PercolatedQuery {
    id: String;
    queryText: String;
    mode: PercolateMode;
    cachedTokens: Array<String>?;
    cachedBooleanTerms: Array<String>?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `id` | `String` | Unique query identifier |
| `queryText` | `String` | The query text |
| `mode` | `PercolateMode` | Matching mode (`bm25` for any-term matching, `boolean` for AND/OR logic) |
| `cachedTokens` | `Array<String>?` | Pre-tokenized query terms (cached for reuse) |
| `cachedBooleanTerms` | `Array<String>?` | Pre-extracted boolean terms (cached for boolean mode) |

---

## MetricAggregation

Metric aggregation request for computing statistics over search results.

```gcl
@volatile
type MetricAggregation {
    f: field;
    metric: MetricType;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `f` | `field` | Typed reference to the numeric document field to aggregate |
| `metric` | `MetricType` | Aggregation type (sum, avg, min, max, cardinality) |

---

## HistogramAggregation

Histogram aggregation request for bucketing numeric field values.

```gcl
@volatile
type HistogramAggregation {
    f: field;
    interval: float;
    minValue: float?;
    maxValue: float?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `f` | `field` | Typed reference to the numeric document field to aggregate |
| `interval` | `float` | Bucket interval width |
| `minValue` | `float?` | Minimum value for bucketing (auto-detected if null) |
| `maxValue` | `float?` | Maximum value for bucketing (auto-detected if null) |

---

## AggregationRequest

Combined aggregation request with metrics and histograms.

```gcl
@volatile
type AggregationRequest {
    metrics: Array<MetricAggregation>?;
    histograms: Array<HistogramAggregation>?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `metrics` | `Array<MetricAggregation>?` | Metric aggregation requests |
| `histograms` | `Array<HistogramAggregation>?` | Histogram aggregation requests |

---

## AggregatedSearchResult

Search result with aggregation outputs.

```gcl
@volatile
type AggregatedSearchResult {
    results: Array<TextResult>;
    metricResults: Array<MetricResult>?;
    histogramResults: Array<HistogramResult>?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `results` | `Array<TextResult>` | Ranked search results |
| `metricResults` | `Array<MetricResult>?` | Computed metric values |
| `histogramResults` | `Array<HistogramResult>?` | Computed histogram buckets |

---

## MetricResult

Result of a metric aggregation.

```gcl
@volatile
type MetricResult {
    f: field;
    metric: MetricType;
    value: float;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `f` | `field` | Typed reference to the aggregated field |
| `metric` | `MetricType` | Aggregation type used |
| `value` | `float` | Computed result |

---

## HistogramBucket

A single histogram bucket with range and count.

```gcl
@volatile
type HistogramBucket {
    from: float;
    to: float;
    count: int;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `from` | `float` | Bucket lower bound (inclusive) |
| `to` | `float` | Bucket upper bound (exclusive) |
| `count` | `int` | Number of documents in this bucket |

---

## HistogramResult

Result of a histogram aggregation.

```gcl
@volatile
type HistogramResult {
    f: field;
    buckets: Array<HistogramBucket>;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `f` | `field` | Typed reference to the aggregated field |
| `buckets` | `Array<HistogramBucket>` | Histogram buckets with counts |

---

## Document

Structured document with section decomposition for hierarchical text processing.

```gcl
type Document {
    name: String;
    path: String;
    format: String?;
    documentType: String?;
    wordCount: int?;
    charCount: int?;
    fileSize: int?;
    sections: Array<Section>;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `name` | `String` | Document name |
| `path` | `String` | File path or URI |
| `format` | `String?` | File format (e.g., "md", "html", "pdf") |
| `documentType` | `String?` | Document type classification |
| `wordCount` | `int?` | Total word count |
| `charCount` | `int?` | Total character count |
| `fileSize` | `int?` | File size in bytes |
| `sections` | `Array<Section>` | Hierarchical sections |

---

## Section

A section within a Document with title, sentences, and type.

```gcl
type Section {
    title: String;
    position: int;
    sentences: Array<Sentence>;
    sectionType: SectionType;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `title` | `String` | Section title (heading text) |
| `position` | `int` | Ordinal position in the document |
| `sentences` | `Array<Sentence>` | Sentences within this section |
| `sectionType` | `SectionType` | Type of section (paragraph, heading, table, etc.) |

---

## Sentence

A sentence within a Section.

```gcl
type Sentence {
    text: String;
    position: int;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `text` | `String` | Sentence text |
| `position` | `int` | Ordinal position within the section |

---

## DocumentStats

Statistics about a processed document.

```gcl
@volatile
type DocumentStats {
    file: String?;
    format: String?;
    file_size_bytes: int?;
    success: bool?;
    word_count: int?;
    char_count: int?;
    line_count: int?;
    sentence_count: int?;
    heading_count: int?;
    document_type: String?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `file` | `String?` | Source file path |
| `format` | `String?` | Detected format |
| `file_size_bytes` | `int?` | File size in bytes |
| `success` | `bool?` | Whether parsing succeeded |
| `word_count` | `int?` | Number of words |
| `char_count` | `int?` | Number of characters |
| `line_count` | `int?` | Number of lines |
| `sentence_count` | `int?` | Number of sentences |
| `heading_count` | `int?` | Number of headings |
| `document_type` | `String?` | Classified document type |

---

## TokenInfo

Token with normalized form, original form, and positional information. Used by pre-tokenized search methods.

```gcl
@volatile
type TokenInfo {
    text: String;
    original: String;
    position: int;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `text` | `String` | Normalized token text (stemmed, casefolded) |
| `original` | `String` | Original token form before normalization |
| `position` | `int` | Token position in document (0-indexed) |

---

## TermFrequency

Term frequency metadata with original form and positional offsets.

```gcl
@volatile
type TermFrequency {
    original: String;
    count: int;
    positions: Array<int>;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `original` | `String` | Original term form (before normalization) |
| `count` | `int` | Number of occurrences in document |
| `positions` | `Array<int>` | Positional offsets of each occurrence |

---

## TermScorePair

Term-score pair used internally for More Like This query term extraction.

```gcl
@volatile
type TermScorePair {
    termNode: node<NormalizedTerm>;
    score: float;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `termNode` | `node<NormalizedTerm>` | Reference to the normalized term |
| `score` | `float` | TF-IDF score for this term in the source document |

---

## ChunkInfo

Text chunk metadata returned by `TextChunker.chunk()`. Used for semantic search and RAG pipelines.

```gcl
@volatile
type ChunkInfo {
    content: String;
    position: int;
    startChar: int;
    endChar: int;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `content` | `String` | Chunk text content |
| `position` | `int` | Chunk position within source document (0-indexed) |
| `startChar` | `int` | Start character offset in source document |
| `endChar` | `int` | End character offset in source document |

---

## TrieNode

Prefix tree node for O(prefix_length + matches) prefix and wildcard search. Built at `build()` time from the normalized term vocabulary.

```gcl
type TrieNode {
    children: Map<int, node<TrieNode>>?;
    terms: Array<node<NormalizedTerm>>?;
    isTerminal: bool?;
}
```

| Field | Type | Description |
|-------|------|-------------|
| `children` | `Map<int, node<TrieNode>>?` | Character codepoint to child node mapping |
| `terms` | `Array<node<NormalizedTerm>>?` | Terms at this node (when a vocabulary term ends here) |
| `isTerminal` | `bool?` | Whether a vocabulary term ends at this node. Nullable for ABI safety with older indexes that pre-date this field — readers must treat `null` as `false`. |

---

## Internal types

These types appear in the inventory but are not constructed by user code. They are documented here only so the public type names that reference them are unambiguous.

- **`IndexEntry`** — One per indexed document. Stores the normalized text, content hash, token count, forward/inverted-index links, packed positions, and (optionally) the embedding vector. Exposed on `TextIndex.entries` for diagnostics; mutated only through `add()` / `remove()` / `update()`.
- **`NormalizedTerm`** — Vocabulary entry (extends `Term`). Holds IDF, max term score, and compact posting arrays (`postingEntries`, `postingTFs`, `postingDocLens`, `postingFieldnormIds`, `postingBlockMaxScores`). Built and refreshed by `build()`.
- **`BM25Result`** — Internal scoring record produced by `BM25Engine` (`engine/bm25_engine.gcl`). Mapped to `TextResult` before being returned to the caller; user code never observes it directly.
