
# Indexing Methods

Methods for adding, removing, updating, and building the `TextIndex<T>`.

---

## add()

Add a single document to the index.

```gcl
fn add(key: String, value: T): String
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `key` | `String` | Document text content to index (the text that will be searched against) |
| `value` | `T` | Associated data to store with the document (returned in search results, typically a document ID) |

**Returns:** The normalized key string (after case folding, normalization, etc.).

**Example**

```gcl
// key = text content to index, value = associated identifier
var _normalizedKey = index.add("Machine learning algorithms", "doc1");
var _normalizedKey2 = index.add("Natural language processing", "doc2");
```

**Notes**

- Normalizes text, tokenizes, builds inverted index
- Embeds document if `embed` function configured
- Skips duplicates if `deduplicateContent` enabled
- Must call `build()` before searching

---

## add_batch()

Add multiple documents in a single operation.

```gcl
fn add_batch(batchEntries: Array<TextEntry>)
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `batchEntries` | `Array<TextEntry>` | Array of `TextEntry` with key, value, optional vector |

**Example**

```gcl
var entries = Array<TextEntry> {};
entries.add(TextEntry { key: "doc1 text", value: "doc1", vector: null });
entries.add(TextEntry { key: "doc2 text", value: "doc2", vector: precomputedTensor });
index.add_batch(entries);
```

**Notes**

- Pre-computed vectors skip embedding inference
- Useful for bulk imports

---

## add_fields()

Add a typed document. Field text is read off the document via the configured
`FieldConfig.f` references — or, if `config.fields` is null, auto-discovered
on the document type. `T` may be the document type directly or `node<T>`;
node resolution is transparent.

```gcl
fn add_fields(value: T)
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `value` | `T` | Typed document (or `node<T>`) to index |

**Example**

```gcl
type Article { title: String; body: String; tags: String?; }

var index = TextIndex<Article> {
    config: TextIndexConfig {
        fields: [
            FieldConfig { f: Article::title, weight: 3.0 },
            FieldConfig { f: Article::body,  weight: 1.0 },
            FieldConfig { f: Article::tags,  weight: 2.0 }
        ]
    }
};
index.add_fields(Article {
    title: "Machine Learning",
    body: "Algorithms for learning from data",
    tags: "ml ai data-science"
});
```

**Notes**

- When `config.fields` is null, `add_fields` auto-discovers every `String` /
  `String?` field on `T` at weight 1.0 on the first call.
- Empty `config.fields` raises `"TextIndexConfig.fields is set but empty"`.
- A `T` with no `String` fields and no explicit `config.fields` raises
  `"add_fields: no String fields found on T"`.
- Per-field text is read directly off `IndexEntry.value` at query time —
  no separate side index.

---

## remove()

Remove a document from the index.

```gcl
fn remove(key: String)
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `key` | `String` | Document key to remove |

**Example**

```gcl
index.remove("doc1");
```

**Notes**

- Unlinks all term associations
- Updates statistics (totalEntries, avgTokenCount)
- IDF scores remain unchanged (call `build()` again for exact IDF)

---

## update()

Update a document's value (removes and re-adds).

```gcl
fn update(key: String, value: T)
```

**Parameters**

| Parameter | Type | Description |
|-----------|------|-------------|
| `key` | `String` | Document key |
| `value` | `T` | New value |

**Example**

```gcl
index.update("doc1", "new_value");
```

**Notes**

- Equivalent to `remove(key)` + `add(key, value)`
- Re-indexes document text

---

## build()

Finalize the index after adding documents.

```gcl
fn build()
```

**Example**

```gcl
index.add("doc1", "value1");
index.add("doc2", "value2");
index.build();  // Compute IDF, detect auto stop words
```

**Notes**

- **Required before searching**
- Computes IDF scores for all terms
- Auto-detects stop words (if configured)
- Computes average document length
- Safe to call multiple times: re-running `build()` after additional `add()` calls recomputes IDF and refreshes derived structures (TF cache, posting arrays, trigram/edge n-gram/phonetic/trie indices)

---

## Internal build helpers

`build()` orchestrates several lower-level helpers that are exposed on `TextIndex<T>` for advanced use cases (e.g. incremental rebuilds of a single derived structure). Callers should normally use `build()`; only reach for these directly when you need fine-grained control.

| Method | Description |
|--------|-------------|
| `build_internal(isRebuild: bool)` | Core build pass: computes `avgTokenCount`, detects auto stop words, pre-normalizes synonyms, computes IDF and `maxTermScore`, populates posting arrays and the BM25 TF cache |
| `build_trie_index()` | (Re)builds the forward trie used by prefix and wildcard search |
| `build_reverse_trie_index()` | (Re)builds the reverse trie used for leading-wildcard patterns like `*ation` |
| `build_trigram_index()` | (Re)builds the trigram inverted index used by fuzzy pre-filtering |
| `index_edge_ngrams(token, entryNode)` | Indexes edge n-grams for a single token; called from `add()` and `build()` when `edgeNgram.enabled = true` |
