# Percolation (Reverse Search)

Percolation inverts the normal search workflow. Instead of running a query against a corpus of documents, you register a set of standing queries and then match incoming documents against them. When a new document arrives, the percolator returns the IDs of all queries that match.

This is useful for real-time alerting, content classification, monitoring pipelines, and rule-based routing.

## Core Concepts

In a standard search, you have:
- A fixed set of documents (the index)
- A query submitted at search time

In percolation, this is reversed:
- A fixed set of queries (registered in advance)
- A document submitted at match time

The percolator answers: "Which of my stored queries match this document?"

## PercolateIndex

The `PercolateIndex` type stores registered queries and provides methods to add, remove, and match against them.

```gcl
type PercolateIndex {
    config: TextIndexConfig;
    queries: nodeIndex<String, node<PercolatedQuery>>?;

    fn add_query(id: String, queryText: String, mode: PercolateMode);
    fn remove_query(id: String);
    fn percolate(text: String, k: int): Array<String>;
}
```

### Creating a PercolateIndex

A `PercolateIndex` requires a `TextIndexConfig` that controls how both queries and incoming documents are tokenized and normalized. The same config is used for both sides, ensuring consistent term matching.

```gcl
var _percolator = PercolateIndex {
    config: TextIndexConfig {
        stopWords: StopWordOptions {
            mode: StopWordMode::default,
            language: TextSearchLanguage::en
        }
    }
};
```

## PercolateMode

Two matching modes are available:

| Mode | Behavior | Use Case |
|------|----------|----------|
| `bm25` | At least one query term must appear in the document | Broad matching, topic alerts |
| `boolean` | Supports `AND` / `OR` operators for precise logic | Exact condition matching, compliance rules |

### bm25 Mode

In `bm25` mode, a query matches if any of its terms appear in the incoming document. This provides broad, recall-oriented matching suitable for topic-based alerts.

```gcl
percolator.add_query("tech-news", "artificial intelligence machine learning", PercolateMode::bm25);
// Matches any document containing "artificial", "intelligence", "machine", or "learning"
```

### boolean Mode

In `boolean` mode, queries use `AND` and `OR` operators for precise logic. `AND` requires all content terms to be present; `OR` requires at least one.

```gcl
// AND: both terms must appear
percolator.add_query("security-alert", "vulnerability AND critical", PercolateMode::boolean);

// OR: at least one term must appear
percolator.add_query("market-watch", "stocks OR bonds OR crypto", PercolateMode::boolean);
```

When no operator is present in a boolean-mode query, it falls back to any-term matching (same as `bm25` mode).

## Lifecycle

### Adding Queries

Register queries with a unique string ID, the query text, and a matching mode. Query terms are tokenized and cached at registration time for efficient matching.

```gcl
var percolator = PercolateIndex {
    config: TextIndexConfig { stopWords: StopWordOptions { mode: StopWordMode::default } }
};

// Register standing queries
percolator.add_query("earthquake-alert", "earthquake seismic tremor", PercolateMode::bm25);
percolator.add_query("weather-severe", "hurricane AND warning", PercolateMode::boolean);
percolator.add_query("market-crash", "stock AND crash AND market", PercolateMode::boolean);
percolator.add_query("sports-scores", "touchdown goal score", PercolateMode::bm25);
```

If you add a query with an ID that already exists, it replaces the previous query.

### Removing Queries

Remove a stored query by its ID:

```gcl
percolator.remove_query("sports-scores");
```

### Matching Documents

The `percolate()` method takes an incoming document text and a maximum number of matches to return. It tokenizes the document using the same config and checks it against all stored queries.

```gcl
var _matchedIds = percolator.percolate("A major earthquake struck the coast today", 10);
// matchedIds == ["earthquake-alert"]

var _matchedIds2 = percolator.percolate("Hurricane warning issued for the eastern seaboard", 10);
// matchedIds2 == ["weather-severe"]

var _matchedIds3 = percolator.percolate("Stock market crash sends indexes plummeting", 10);
// matchedIds3 == ["market-crash"]
```

The `k` parameter limits how many matching query IDs are returned. Matches are returned in storage order (the order queries were added), not ranked by relevance.

## Use Cases

### Real-Time Alerts

Monitor a document stream and trigger alerts when documents match predefined criteria:

```gcl
var alertEngine = PercolateIndex {
    config: TextIndexConfig { stopWords: StopWordOptions { mode: StopWordMode::default } }
};

// Set up alert rules
alertEngine.add_query("outage-alert", "outage AND production", PercolateMode::boolean);
alertEngine.add_query("security-breach", "unauthorized AND access", PercolateMode::boolean);
alertEngine.add_query("performance-degradation", "latency timeout slow", PercolateMode::bm25);

// Process incoming log entries
fn process_log_entry(logText: String) {
    var alerts = alertEngine.percolate(logText, 5);
    for (var i = 0; i < alerts.size(); i++) {
        info("ALERT triggered: ${alerts[i]} for log: ${logText}");
    }
}

process_log_entry("Production outage detected in region us-east-1");
// Triggers: outage-alert

process_log_entry("Unauthorized access attempt from IP 10.0.0.5");
// Triggers: security-breach

process_log_entry("API response latency exceeding 5 seconds");
// Triggers: performance-degradation
```

### Content Classification

Classify incoming documents by matching them against category-defining queries:

```gcl
var classifier = PercolateIndex {
    config: TextIndexConfig {
        stopWords: StopWordOptions { mode: StopWordMode::default },
        tokenization: TokenizationOptions { stemming: true }
    }
};

// Define category queries
classifier.add_query("category:tech", "software programming algorithm computer", PercolateMode::bm25);
classifier.add_query("category:finance", "investment portfolio dividend stock", PercolateMode::bm25);
classifier.add_query("category:health", "clinical trial treatment patient", PercolateMode::bm25);
classifier.add_query("category:sports", "championship tournament athlete team", PercolateMode::bm25);

// Classify a document
var _categories = classifier.percolate(
    "New clinical trial shows promising results for cancer treatment in patients",
    10
);
// categories == ["category:health"]
```

### News Monitoring

Let users subscribe to topics and receive notifications when matching articles appear:

```gcl
var newsMonitor = PercolateIndex {
    config: TextIndexConfig { stopWords: StopWordOptions { mode: StopWordMode::default } }
};

// User subscriptions
newsMonitor.add_query("user:alice", "renewable energy AND solar", PercolateMode::boolean);
newsMonitor.add_query("user:bob", "electric vehicle tesla battery", PercolateMode::bm25);
newsMonitor.add_query("user:carol", "space AND exploration", PercolateMode::boolean);

// Match incoming article
var article = "Tesla announces new battery technology for electric vehicles";
var subscribers = newsMonitor.percolate(article, 100);
// subscribers == ["user:bob"]

// Notify matched users
for (var i = 0; i < subscribers.size(); i++) {
    info("Notify ${subscribers[i]}: new matching article");
}
```

### Document Routing

Route incoming documents to processing pipelines based on content:

```gcl
var router = PercolateIndex {
    config: TextIndexConfig { stopWords: StopWordOptions { mode: StopWordMode::none } }
};

// Define routing rules
router.add_query("pipeline:urgent", "critical AND urgent", PercolateMode::boolean);
router.add_query("pipeline:review", "review OR approval", PercolateMode::boolean);
router.add_query("pipeline:archive", "completed finished resolved", PercolateMode::bm25);

// Route an incoming document
var doc = "Critical issue requires urgent attention from the security team";
var _pipelines = router.percolate(doc, 5);
// pipelines == ["pipeline:urgent"]
```

## Configuration Considerations

The `TextIndexConfig` passed to `PercolateIndex` affects how both queries and documents are processed:

- **`tokenization.stemming`**: When enabled, "running" in a query matches "runs" in a document.
- **`stopWords.mode`**: Stop words are removed from both queries and documents during tokenization.
- **`stopWords.language`**: Determines which language-specific stop words are used.
- **`tokenization.minTermLength` / `tokenization.maxTermLength`**: Terms outside these bounds are filtered out.
- **`tokenization.caseFold`**: Case folding is applied during normalization (enabled by default).

Use the same config you would use for a regular `TextIndex` to get consistent matching behavior.

```gcl
// Example with stemming and custom stop words
var percolator = PercolateIndex {
    config: TextIndexConfig {
        stopWords: StopWordOptions {
            mode: StopWordMode::default,
            language: TextSearchLanguage::en
        },
        tokenization: TokenizationOptions {
            stemming: true,
            minTermLength: 2
        }
    }
};

// "running" is stemmed to "run" at registration time
percolator.add_query("fitness", "running exercise training", PercolateMode::bm25);

// "runs" is also stemmed to "run" at match time -> matches
var _matches = percolator.percolate("She runs every morning as exercise", 10);
// matches == ["fitness"]
```
