# Function Scoring & Curation

This document covers three complementary mechanisms for controlling search result ranking beyond basic BM25 relevance: **Function Scoring** (decay functions and field value factors), **Curation** (manual pinning, boosting, and suppression), and **Ranking Rules** (Meilisearch-style multi-factor sorting).

All three are applied as **post-processing steps** by calling the helper engines on the result of a regular `search_bm25()` / `search()` call. They are not driven through `SearchOptions` or `TextIndexConfig`.

## Function Scoring

The `FunctionScoreEngine` re-scores search results by applying decay functions and field value factors. This is useful when you want to incorporate numeric signals -- such as recency, popularity, or geographic distance -- into the final ranking.

### FunctionScoreConfig

The top-level configuration type that controls function scoring behavior:

```gcl
@volatile
type FunctionScoreConfig {
    decayFunctions: Array<DecayFunction>?;
    fieldValueFactors: Array<FieldValueFactor>?;
    scoreMode: ScoreMode?;    // How to combine multiple function scores (default: multiply)
    boostMode: BoostMode?;    // How to combine with base score (default: multiply)
}
```

### Decay Functions

Decay functions reduce a document's score based on how far a numeric field value is from an ideal origin point. Three decay curves are available, each producing a multiplier in [0.0, 1.0]. All three are calibrated so that decay equals `decayValue` (default `0.5`) at `distance == scale`:

| Decay Type | Formula | Behavior |
|------------|---------|----------|
| `gaussian` | `exp(ln(decayValue) * (distance/scale)^2)` | Bell curve; smooth falloff, most gradual near the origin |
| `linear` | `max(0, 1 - distance/scale)` | Straight-line decrease; drops to zero at `scale` distance |
| `exponential` | `exp(ln(decayValue) * distance/scale)` | Fast initial drop, long tail |

The `decayValue` parameter (clamped to `(0, 1)`, default `0.5`) is the multiplier the curve produces at `distance == scale`. Lowering it makes the curve fall off faster; raising it (closer to `1.0`) makes the curve flatter. With the default `0.5`:

- **Gaussian** simplifies to `exp(ln(0.5) * (d/s)^2)` ≈ `exp(-0.693 * (d/s)^2)`, i.e. the score halves at `distance == scale`.
- **Exponential** simplifies to `exp(ln(0.5) * d/s)` ≈ `exp(-0.693 * d/s)`, again halving at `distance == scale`.

The `linear` curve is the documented exception: it is simply `max(0, 1 - distance/scale)` and always reaches `0` at `distance == scale`, independent of `decayValue`.

Configuration fields for `DecayFunction`:

| Field | Type | Description |
|-------|------|-------------|
| `f` | `field` | Typed reference to the numeric (`float`/`int`) field on the document |
| `origin` | `float` | Ideal value where distance is zero |
| `scale` | `float` | Controls decay rate; at this distance, the decay equals `decayValue` |
| `offset` | `float?` | Distance within which no decay is applied (default: 0) |
| `decayType` | `DecayType` | Curve type: `gaussian`, `linear`, or `exponential` |
| `decayValue` | `float?` | Target decay at `scale` distance (default: 0.5) |

#### Example: Recency Decay

Boost recent documents using Gaussian decay on a timestamp field:

```gcl
type Article { title: String; body: String; timestamp: float; }

var _fsConfig = FunctionScoreConfig {
    decayFunctions: [
        DecayFunction {
            f: Article::timestamp,
            origin: 1700000000.0,    // Current epoch time
            scale: 86400.0,          // 1 day in seconds
            offset: 3600.0,          // No decay within 1 hour
            decayType: DecayType::gaussian,
            decayValue: 0.5          // Score halves at 1 day distance
        }
    ]
};
```

#### Example: Distance Decay

Penalize documents far from a target value using linear decay:

```gcl
type Product { name: String; price: float; }

var _fsConfig = FunctionScoreConfig {
    decayFunctions: [
        DecayFunction {
            f: Product::price,
            origin: 100.0,           // Target price
            scale: 50.0,             // Score drops to 0 at $50 away
            decayType: DecayType::linear
        }
    ]
};
```

### Field Value Factors

Field value factors apply a multiplier based on a numeric field's value. This lets you boost documents by popularity, rating, or any stored numeric signal.

Configuration fields for `FieldValueFactor`:

| Field | Type | Description |
|-------|------|-------------|
| `f` | `field` | Typed reference to the numeric (`float`/`int`) field on the document |
| `factor` | `float?` | Multiplicative factor applied to the field value (default: 1.0) |
| `modifier` | `FieldModifier?` | Transformation applied to the value (default: `none`) |
| `missing` | `float?` | Fallback value if field is absent (default: 1.0) |

### FieldModifier Transformations

Modifiers transform the raw field value before it is used as a score factor:

| Modifier | Formula | Use Case |
|----------|---------|----------|
| `none` | `val` | Use raw value directly |
| `log` | `log(val)` | Dampen large values (returns 0.0 for val <= 0) |
| `log1p` | `log(1 + val)` | Dampen large values, safe for zero (returns 0.0 for val < 0) |
| `sqrt` | `sqrt(val)` | Moderate dampening (returns 0.0 for val < 0) |
| `square` | `val * val` | Amplify differences between values |

#### Example: Popularity Boost

Boost by a popularity count, dampened with `log1p` so a document with 1,000,000 views does not completely dominate one with 100 views:

```gcl
type Post { content: String; view_count: float; }

var _fsConfig = FunctionScoreConfig {
    fieldValueFactors: [
        FieldValueFactor {
            f: Post::view_count,
            factor: 1.5,
            modifier: FieldModifier::log1p,
            missing: 1.0
        }
    ]
};
```

### ScoreMode -- Combining Multiple Functions

When multiple decay functions and/or field value factors are configured, `scoreMode` controls how their individual scores are combined into a single function score:

| ScoreMode | Behavior |
|-----------|----------|
| `multiply` | Product of all function scores (default) |
| `sum` | Sum of all function scores |
| `avg` | Arithmetic mean of all function scores |
| `max` | Highest individual function score |
| `min` | Lowest individual function score |

### BoostMode -- Combining with Base Score

After computing the combined function score, `boostMode` controls how it is merged with the original BM25 (or other) base score:

| BoostMode | Formula | Use Case |
|-----------|---------|----------|
| `multiply` | `baseScore * funcScore` | Default; function score scales relevance proportionally |
| `sum` | `baseScore + funcScore` | Additive boost; ensures even zero-relevance documents get a function score |
| `replace` | `funcScore` | Ignore base relevance entirely; sort purely by function score |

### Complete Function Scoring Example

Combine recency decay with popularity boosting:

```gcl
type Article { title: String; published_at: float; likes: float; }

// Configure function scoring
var fsConfig = FunctionScoreConfig {
    decayFunctions: [
        DecayFunction {
            f: Article::published_at,
            origin: 1700000000.0,
            scale: 604800.0,            // 1 week
            decayType: DecayType::exponential,
            decayValue: 0.3
        }
    ],
    fieldValueFactors: [
        FieldValueFactor {
            f: Article::likes,
            factor: 1.0,
            modifier: FieldModifier::log1p,
            missing: 0.0
        }
    ],
    scoreMode: ScoreMode::multiply,
    boostMode: BoostMode::multiply
};

// Run a BM25 search, then re-score with function scoring
var results = index.search_bm25("machine learning", 20);

// Convert TextResult to BM25Result (the input shape FunctionScoreEngine expects).
// FunctionScoreEngine reads numeric fields directly off result.value via the
// typed field refs in DecayFunction.f / FieldValueFactor.f.
var entries = Array<BM25Result> {};
for (var i = 0; i < results.size(); i++) {
    var r = results[i];
    entries.add(BM25Result {
        key: r.key,
        value: r.value,
        score: r.score,
        matchedTerms: r.matchedTerms ?? Array<String> {}
    });
}

var rescored = FunctionScoreEngine::apply(entries, fsConfig);

// rescored is sorted by new combined score, descending
for (var i = 0; i < rescored.size(); i++) {
    info("${rescored[i].key}: ${rescored[i].score}");
}
```

## Curation

The `CurationHelper` provides manual control over search results by letting you suppress, boost, or pin specific documents. Curation rules are applied in a fixed order: suppress first, then boost (with re-sort), then pin.

### CurationRule

```gcl
@volatile
type CurationRule {
    documentKey: String;   // Document key to apply the rule to
    position: int?;        // Pin to this position (0-indexed)
    boost: float?;         // Multiply score by this factor
    suppress: bool?;       // Remove from results if true
}
```

### Rule Application Order

1. **Suppress**: Documents with `suppress: true` are removed from results.
2. **Boost**: Remaining documents with a `boost` value have their scores multiplied. Results are re-sorted by score.
3. **Pin**: Documents with a `position` value are placed at their specified positions. Other documents fill the remaining slots in score order.

### Example: Suppress Outdated Content

```gcl
var results = index.search_bm25("security best practices", 10);

var rules = Array<CurationRule> {};
rules.add(CurationRule { documentKey: "outdated-guide-2019", suppress: true });
rules.add(CurationRule { documentKey: "deprecated-policy", suppress: true });

var _curated = CurationHelper::apply_curation(results, rules);
```

### Example: Boost Sponsored Content

```gcl
var results = index.search_bm25("running shoes", 10);

var rules = Array<CurationRule> {};
rules.add(CurationRule { documentKey: "premium-shoe-listing", boost: 3.0 });

var _curated = CurationHelper::apply_curation(results, rules);
// "premium-shoe-listing" score is multiplied by 3.0, then results are re-sorted
```

### Example: Pin Featured Results

```gcl
var results = index.search_bm25("getting started", 10);

var rules = Array<CurationRule> {};
// Pin the quickstart guide at position 0 (first result)
rules.add(CurationRule { documentKey: "quickstart-guide", position: 0 });
// Pin the FAQ at position 1 (second result)
rules.add(CurationRule { documentKey: "faq", position: 1 });

var _curated = CurationHelper::apply_curation(results, rules);
// Positions 0 and 1 are locked; remaining results fill slots 2+ in score order
```

### Example: Combined Curation

```gcl
var results = index.search_bm25("laptop", 20);

var rules = Array<CurationRule> {};
// Remove recalled products
rules.add(CurationRule { documentKey: "recalled-model-x", suppress: true });
// Boost the editor's pick
rules.add(CurationRule { documentKey: "editors-choice-laptop", boost: 2.5 });
// Pin the sale item at position 0
rules.add(CurationRule { documentKey: "flash-sale-laptop", position: 0 });

var curated = CurationHelper::apply_curation(results, rules);
```

## Ranking Rules Engine

The `RankingRulesEngine` provides Meilisearch-style multi-factor ranking. Instead of a single numeric score, documents are sorted by an ordered list of ranking rules applied as cascading tie-breakers. The first rule is the primary sort key; the second rule breaks ties from the first, and so on.

### Available Ranking Rules

| Rule | Signal | Sort Direction | Description |
|------|--------|---------------|-------------|
| `words` | `matchedWordCount` | Descending | More matching query terms = better |
| `typo` | `typoCount` | Ascending | Fewer typos = better |
| `proximity` | `minProximity` | Ascending | Closer term proximity = better |
| `attribute` | `firstMatchPosition` | Ascending | Earlier first match = better |
| `sort` | `sortValue` | Descending | Higher custom sort value = better |
| `exactness` | `isExactMatch` | Exact first | Exact matches ranked above partial |

### RankingCandidate

Each document must be wrapped in a `RankingCandidate` with pre-computed ranking signals:

```gcl
@volatile
type RankingCandidate {
    key: String;
    value: any?;
    score: float;
    matchedTerms: Array<String>?;
    matchedWordCount: int?;      // For 'words' rule
    typoCount: int?;             // For 'typo' rule
    minProximity: int?;          // For 'proximity' rule
    firstMatchPosition: int?;    // For 'attribute' rule
    isExactMatch: bool?;         // For 'exactness' rule
    sortValue: float?;           // For 'sort' rule
}
```

### Example: Standard Ranking Pipeline

```gcl
// Define ranking rules in priority order
var rules = Array<RankingRule> {};
rules.add(RankingRule::words);
rules.add(RankingRule::typo);
rules.add(RankingRule::proximity);
rules.add(RankingRule::attribute);
rules.add(RankingRule::exactness);

// Build ranking candidates from search results
var candidates = Array<RankingCandidate> {};
candidates.add(RankingCandidate {
    key: "doc1",
    value: "Introduction to ML",
    score: 2.5,
    matchedWordCount: 2,
    typoCount: 0,
    minProximity: 1,
    firstMatchPosition: 0,
    isExactMatch: true
});
candidates.add(RankingCandidate {
    key: "doc2",
    value: "Advanced ML Topics",
    score: 2.3,
    matchedWordCount: 2,
    typoCount: 1,
    minProximity: 3,
    firstMatchPosition: 5,
    isExactMatch: false
});
candidates.add(RankingCandidate {
    key: "doc3",
    value: "Machine Learning Basics",
    score: 2.4,
    matchedWordCount: 1,
    typoCount: 0,
    minProximity: 999,
    firstMatchPosition: 2,
    isExactMatch: false
});

// Apply ranking rules
var _ranked = RankingRulesEngine::apply(candidates, rules);
// doc1 and doc2 tie on 'words' (both 2), then doc1 wins on 'typo' (0 vs 1)
// doc3 comes last (only 1 matching word)
```

### Example: Custom Sort by Rating

Use the `sort` rule to incorporate a custom signal such as user rating:

```gcl
var rules = Array<RankingRule> {};
rules.add(RankingRule::words);
rules.add(RankingRule::sort);   // Use custom sort value as secondary criterion

var candidates = Array<RankingCandidate> {};
candidates.add(RankingCandidate {
    key: "product-a",
    value: "Widget A",
    score: 1.0,
    matchedWordCount: 2,
    sortValue: 4.8    // Average rating
});
candidates.add(RankingCandidate {
    key: "product-b",
    value: "Widget B",
    score: 1.0,
    matchedWordCount: 2,
    sortValue: 3.2
});

var _ranked = RankingRulesEngine::apply(candidates, rules);
// Both tie on 'words'; product-a wins on 'sort' (higher rating)
```

## When to Use Each Feature

| Feature | Best For |
|---------|----------|
| **Decay Functions** | Time-sensitive ranking (recency), distance-based scoring, freshness |
| **Field Value Factors** | Popularity boosting, rating incorporation, any numeric signal |
| **CurationHelper** | Editorial control, featured content, hiding outdated results, merchandising |
| **RankingRulesEngine** | Meilisearch-style multi-signal ranking, search quality tuning with explicit priorities |

In many applications, you will combine these: use function scoring for automatic signal incorporation, curation for manual editorial overrides, and ranking rules when you need explicit control over tie-breaking priority order.
