BLOG • DIGITAL INSIGHTS
Sitefinity indexes all fields as text or date by default. This post explains how to use Elasticsearch ingest pipelines and index templates to convert fields to numeric types at storage time - even when you have no control over how Sitefinity calls the search provider.
When you re-index in Sitefinity, all fields are stored as either text or date - there is no built-in way to index numeric fields as numeric types in the search engine.
Our first approach was to replace the DynamicContentInboundPipe. This quickly turned out to be a dead end. So we started digging into the Elasticsearch API.
We found that Elasticsearch has a concept called index templates - you can define a template on a non-existing index and it will apply custom mappings when that index is created. This helped somewhat, as Kibana was able to detect our new fields, but the shards started failing. After lengthy research we discovered the culprit: the fielddata flag is not enabled by default. Further investigation revealed that even when the mapping shows fields as numeric, they are still stored as text in the underlying documents.
To overcome this, we turned to the Elasticsearch ingest / pipeline APIs. These allow you to define a pipeline that documents pass through as they are being stored - keep "as being stored" in mind, it matters. We ran /simulate to verify the fields would be converted to numeric in the documents, and indeed they were. We triggered a re-index in Sitefinity... and it still wasn't quite right.
The documents were still showing text fields. If you dig deep enough into the Elasticsearch documentation you'll find that the indexing client must specify which pipeline the documents should pass through. This makes sense in principle, but doesn't work in our case - we cannot influence how Sitefinity calls the underlying search provider implementation. Inbound pipes operate at an abstraction layer that doesn't know which search engine will be used, so there are limitations on what you can control.
The solution was to tell Elasticsearch to automatically push documents going to a specific non-existing index (whose name we know in advance) through our custom pipeline. And that's where index templates come back in.
We created an index template that sets the default_pipeline for the target index, plus a pipeline that converts the documents as they arrive. That combination solved it.
Here is an example gist with the requests to get you started:
Elasticsearch is very flexible, and the pipelines / ingestion API is powerful - it lets you write scripts in addition to using out-of-the-box converters, field setters and more. Using this API you can influence how documents are stored even when they come from software you have no control over.
Sitefinity, on the other hand, provides a clean API but it operates at an abstraction level - it doesn't know whether documents are going to Elasticsearch, a Lucene index, Azure Search or something else. Our guess is that the Sitefinity team used the common denominator across all search engines, which means text and date fields only.
This leads to an important conclusion: if your task requires knowledge of which specific search engine will be used, you probably won't be able to solve it through Sitefinity's publishing extensibility model. You'll need to solve it on the receiving side - the search engine API itself.
Note: This is experimental and was tested against Elasticsearch 6.6.
Explore more insights and case studies from our team.