# Function Scoring & Curation This document covers three complementary mechanisms for controlling search result ranking beyond basic BM25 relevance: **Function Scoring** (decay functions and field value factors), **Curation** (manual pinning, boosting, and suppression), and **Ranking Rules** (Meilisearch-style multi-factor sorting). All three are applied as **post-processing steps** by calling the helper engines on the result of a regular `search_bm25()` / `search()` call. They are not driven through `SearchOptions` or `TextIndexConfig`. ## Function Scoring The `FunctionScoreEngine` re-scores search results by applying decay functions and field value factors. This is useful when you want to incorporate numeric signals -- such as recency, popularity, or geographic distance -- into the final ranking. ### FunctionScoreConfig The top-level configuration type that controls function scoring behavior: ```gcl @volatile type FunctionScoreConfig { decayFunctions: Array?; fieldValueFactors: Array?; scoreMode: ScoreMode?; // How to combine multiple function scores (default: multiply) boostMode: BoostMode?; // How to combine with base score (default: multiply) } ``` ### Decay Functions Decay functions reduce a document's score based on how far a numeric field value is from an ideal origin point. Three decay curves are available, each producing a multiplier in [0.0, 1.0]. All three are calibrated so that decay equals `decayValue` (default `0.5`) at `distance == scale`: | Decay Type | Formula | Behavior | |------------|---------|----------| | `gaussian` | `exp(ln(decayValue) * (distance/scale)^2)` | Bell curve; smooth falloff, most gradual near the origin | | `linear` | `max(0, 1 - distance/scale)` | Straight-line decrease; drops to zero at `scale` distance | | `exponential` | `exp(ln(decayValue) * distance/scale)` | Fast initial drop, long tail | The `decayValue` parameter (clamped to `(0, 1)`, default `0.5`) is the multiplier the curve produces at `distance == scale`. Lowering it makes the curve fall off faster; raising it (closer to `1.0`) makes the curve flatter. With the default `0.5`: - **Gaussian** simplifies to `exp(ln(0.5) * (d/s)^2)` ≈ `exp(-0.693 * (d/s)^2)`, i.e. the score halves at `distance == scale`. - **Exponential** simplifies to `exp(ln(0.5) * d/s)` ≈ `exp(-0.693 * d/s)`, again halving at `distance == scale`. The `linear` curve is the documented exception: it is simply `max(0, 1 - distance/scale)` and always reaches `0` at `distance == scale`, independent of `decayValue`. Configuration fields for `DecayFunction`: | Field | Type | Description | |-------|------|-------------| | `f` | `field` | Typed reference to the numeric (`float`/`int`) field on the document | | `origin` | `float` | Ideal value where distance is zero | | `scale` | `float` | Controls decay rate; at this distance, the decay equals `decayValue` | | `offset` | `float?` | Distance within which no decay is applied (default: 0) | | `decayType` | `DecayType` | Curve type: `gaussian`, `linear`, or `exponential` | | `decayValue` | `float?` | Target decay at `scale` distance (default: 0.5) | #### Example: Recency Decay Boost recent documents using Gaussian decay on a timestamp field: ```gcl type Article { title: String; body: String; timestamp: float; } var _fsConfig = FunctionScoreConfig { decayFunctions: [ DecayFunction { f: Article::timestamp, origin: 1700000000.0, // Current epoch time scale: 86400.0, // 1 day in seconds offset: 3600.0, // No decay within 1 hour decayType: DecayType::gaussian, decayValue: 0.5 // Score halves at 1 day distance } ] }; ``` #### Example: Distance Decay Penalize documents far from a target value using linear decay: ```gcl type Product { name: String; price: float; } var _fsConfig = FunctionScoreConfig { decayFunctions: [ DecayFunction { f: Product::price, origin: 100.0, // Target price scale: 50.0, // Score drops to 0 at $50 away decayType: DecayType::linear } ] }; ``` ### Field Value Factors Field value factors apply a multiplier based on a numeric field's value. This lets you boost documents by popularity, rating, or any stored numeric signal. Configuration fields for `FieldValueFactor`: | Field | Type | Description | |-------|------|-------------| | `f` | `field` | Typed reference to the numeric (`float`/`int`) field on the document | | `factor` | `float?` | Multiplicative factor applied to the field value (default: 1.0) | | `modifier` | `FieldModifier?` | Transformation applied to the value (default: `none`) | | `missing` | `float?` | Fallback value if field is absent (default: 1.0) | ### FieldModifier Transformations Modifiers transform the raw field value before it is used as a score factor: | Modifier | Formula | Use Case | |----------|---------|----------| | `none` | `val` | Use raw value directly | | `log` | `log(val)` | Dampen large values (returns 0.0 for val <= 0) | | `log1p` | `log(1 + val)` | Dampen large values, safe for zero (returns 0.0 for val < 0) | | `sqrt` | `sqrt(val)` | Moderate dampening (returns 0.0 for val < 0) | | `square` | `val * val` | Amplify differences between values | #### Example: Popularity Boost Boost by a popularity count, dampened with `log1p` so a document with 1,000,000 views does not completely dominate one with 100 views: ```gcl type Post { content: String; view_count: float; } var _fsConfig = FunctionScoreConfig { fieldValueFactors: [ FieldValueFactor { f: Post::view_count, factor: 1.5, modifier: FieldModifier::log1p, missing: 1.0 } ] }; ``` ### ScoreMode -- Combining Multiple Functions When multiple decay functions and/or field value factors are configured, `scoreMode` controls how their individual scores are combined into a single function score: | ScoreMode | Behavior | |-----------|----------| | `multiply` | Product of all function scores (default) | | `sum` | Sum of all function scores | | `avg` | Arithmetic mean of all function scores | | `max` | Highest individual function score | | `min` | Lowest individual function score | ### BoostMode -- Combining with Base Score After computing the combined function score, `boostMode` controls how it is merged with the original BM25 (or other) base score: | BoostMode | Formula | Use Case | |-----------|---------|----------| | `multiply` | `baseScore * funcScore` | Default; function score scales relevance proportionally | | `sum` | `baseScore + funcScore` | Additive boost; ensures even zero-relevance documents get a function score | | `replace` | `funcScore` | Ignore base relevance entirely; sort purely by function score | ### Complete Function Scoring Example Combine recency decay with popularity boosting: ```gcl type Article { title: String; published_at: float; likes: float; } // Configure function scoring var fsConfig = FunctionScoreConfig { decayFunctions: [ DecayFunction { f: Article::published_at, origin: 1700000000.0, scale: 604800.0, // 1 week decayType: DecayType::exponential, decayValue: 0.3 } ], fieldValueFactors: [ FieldValueFactor { f: Article::likes, factor: 1.0, modifier: FieldModifier::log1p, missing: 0.0 } ], scoreMode: ScoreMode::multiply, boostMode: BoostMode::multiply }; // Run a BM25 search, then re-score with function scoring var results = index.search_bm25("machine learning", 20); // Convert TextResult to BM25Result (the input shape FunctionScoreEngine expects). // FunctionScoreEngine reads numeric fields directly off result.value via the // typed field refs in DecayFunction.f / FieldValueFactor.f. var entries = Array {}; for (var i = 0; i < results.size(); i++) { var r = results[i]; entries.add(BM25Result { key: r.key, value: r.value, score: r.score, matchedTerms: r.matchedTerms ?? Array {} }); } var rescored = FunctionScoreEngine::apply(entries, fsConfig); // rescored is sorted by new combined score, descending for (var i = 0; i < rescored.size(); i++) { info("${rescored[i].key}: ${rescored[i].score}"); } ``` ## Curation The `CurationHelper` provides manual control over search results by letting you suppress, boost, or pin specific documents. Curation rules are applied in a fixed order: suppress first, then boost (with re-sort), then pin. ### CurationRule ```gcl @volatile type CurationRule { documentKey: String; // Document key to apply the rule to position: int?; // Pin to this position (0-indexed) boost: float?; // Multiply score by this factor suppress: bool?; // Remove from results if true } ``` ### Rule Application Order 1. **Suppress**: Documents with `suppress: true` are removed from results. 2. **Boost**: Remaining documents with a `boost` value have their scores multiplied. Results are re-sorted by score. 3. **Pin**: Documents with a `position` value are placed at their specified positions. Other documents fill the remaining slots in score order. ### Example: Suppress Outdated Content ```gcl var results = index.search_bm25("security best practices", 10); var rules = Array {}; rules.add(CurationRule { documentKey: "outdated-guide-2019", suppress: true }); rules.add(CurationRule { documentKey: "deprecated-policy", suppress: true }); var _curated = CurationHelper::apply_curation(results, rules); ``` ### Example: Boost Sponsored Content ```gcl var results = index.search_bm25("running shoes", 10); var rules = Array {}; rules.add(CurationRule { documentKey: "premium-shoe-listing", boost: 3.0 }); var _curated = CurationHelper::apply_curation(results, rules); // "premium-shoe-listing" score is multiplied by 3.0, then results are re-sorted ``` ### Example: Pin Featured Results ```gcl var results = index.search_bm25("getting started", 10); var rules = Array {}; // Pin the quickstart guide at position 0 (first result) rules.add(CurationRule { documentKey: "quickstart-guide", position: 0 }); // Pin the FAQ at position 1 (second result) rules.add(CurationRule { documentKey: "faq", position: 1 }); var _curated = CurationHelper::apply_curation(results, rules); // Positions 0 and 1 are locked; remaining results fill slots 2+ in score order ``` ### Example: Combined Curation ```gcl var results = index.search_bm25("laptop", 20); var rules = Array {}; // Remove recalled products rules.add(CurationRule { documentKey: "recalled-model-x", suppress: true }); // Boost the editor's pick rules.add(CurationRule { documentKey: "editors-choice-laptop", boost: 2.5 }); // Pin the sale item at position 0 rules.add(CurationRule { documentKey: "flash-sale-laptop", position: 0 }); var curated = CurationHelper::apply_curation(results, rules); ``` ## Ranking Rules Engine The `RankingRulesEngine` provides Meilisearch-style multi-factor ranking. Instead of a single numeric score, documents are sorted by an ordered list of ranking rules applied as cascading tie-breakers. The first rule is the primary sort key; the second rule breaks ties from the first, and so on. ### Available Ranking Rules | Rule | Signal | Sort Direction | Description | |------|--------|---------------|-------------| | `words` | `matchedWordCount` | Descending | More matching query terms = better | | `typo` | `typoCount` | Ascending | Fewer typos = better | | `proximity` | `minProximity` | Ascending | Closer term proximity = better | | `attribute` | `firstMatchPosition` | Ascending | Earlier first match = better | | `sort` | `sortValue` | Descending | Higher custom sort value = better | | `exactness` | `isExactMatch` | Exact first | Exact matches ranked above partial | ### RankingCandidate Each document must be wrapped in a `RankingCandidate` with pre-computed ranking signals: ```gcl @volatile type RankingCandidate { key: String; value: any?; score: float; matchedTerms: Array?; matchedWordCount: int?; // For 'words' rule typoCount: int?; // For 'typo' rule minProximity: int?; // For 'proximity' rule firstMatchPosition: int?; // For 'attribute' rule isExactMatch: bool?; // For 'exactness' rule sortValue: float?; // For 'sort' rule } ``` ### Example: Standard Ranking Pipeline ```gcl // Define ranking rules in priority order var rules = Array {}; rules.add(RankingRule::words); rules.add(RankingRule::typo); rules.add(RankingRule::proximity); rules.add(RankingRule::attribute); rules.add(RankingRule::exactness); // Build ranking candidates from search results var candidates = Array {}; candidates.add(RankingCandidate { key: "doc1", value: "Introduction to ML", score: 2.5, matchedWordCount: 2, typoCount: 0, minProximity: 1, firstMatchPosition: 0, isExactMatch: true }); candidates.add(RankingCandidate { key: "doc2", value: "Advanced ML Topics", score: 2.3, matchedWordCount: 2, typoCount: 1, minProximity: 3, firstMatchPosition: 5, isExactMatch: false }); candidates.add(RankingCandidate { key: "doc3", value: "Machine Learning Basics", score: 2.4, matchedWordCount: 1, typoCount: 0, minProximity: 999, firstMatchPosition: 2, isExactMatch: false }); // Apply ranking rules var _ranked = RankingRulesEngine::apply(candidates, rules); // doc1 and doc2 tie on 'words' (both 2), then doc1 wins on 'typo' (0 vs 1) // doc3 comes last (only 1 matching word) ``` ### Example: Custom Sort by Rating Use the `sort` rule to incorporate a custom signal such as user rating: ```gcl var rules = Array {}; rules.add(RankingRule::words); rules.add(RankingRule::sort); // Use custom sort value as secondary criterion var candidates = Array {}; candidates.add(RankingCandidate { key: "product-a", value: "Widget A", score: 1.0, matchedWordCount: 2, sortValue: 4.8 // Average rating }); candidates.add(RankingCandidate { key: "product-b", value: "Widget B", score: 1.0, matchedWordCount: 2, sortValue: 3.2 }); var _ranked = RankingRulesEngine::apply(candidates, rules); // Both tie on 'words'; product-a wins on 'sort' (higher rating) ``` ## When to Use Each Feature | Feature | Best For | |---------|----------| | **Decay Functions** | Time-sensitive ranking (recency), distance-based scoring, freshness | | **Field Value Factors** | Popularity boosting, rating incorporation, any numeric signal | | **CurationHelper** | Editorial control, featured content, hiding outdated results, merchandising | | **RankingRulesEngine** | Meilisearch-style multi-signal ranking, search quality tuning with explicit priorities | In many applications, you will combine these: use function scoring for automatic signal incorporation, curation for manual editorial overrides, and ranking rules when you need explicit control over tie-breaking priority order.