Sitefinity index numeric fields

Problem
When you re-index in Sitefinity all fields are either text or date.

Solution
At first we tried to replace the  DynamicContentInboundPipe. This quickly turned to be a no-go. So we started to dig into the Elastic Search API.

We found that there is something called "templates" which you can define on non-existing index and it will change the mappings of this index when the index is created. This helped a bit, as Kibana was able to detect our new fields, but the shards begin to fail. After a lengthy research, we realized that the reason is the famous "the fielddata is not switched by default". We researched a bit more to find out that even the mapping shows the fields as numeric, they are still of text type in the storage (e.g. the documents store them as text). 

To overcome this problem, we used the ingest / pipeline APIs. This allows you to define a pipeline, that the documents will go through as they are being stored. Keep the "as being stored" in mind, it's important :). We ran /simulate to see if the fields will be converted to numeric in the document and indeed they were. Reindex in Sitefinity aaaaaand not quite ...

The documents are still showing the text fields. If you dig enough in the ES documentation you will find that the one who is indexing should define the pipeline the documents have to go through. This makes sense but does not work in our case as we cannot influence how Sitefinity is calling the actual implementation of the search provider (inbound pipes are the abstract one, that does not know what search engine will be used, hence the limitations). So we had to find a way to tell Elastic Search to run the documents that are going to a non-existing index (we know the name of the index) to push the documents through our new pipeline. Remember the index templates? 

So we did just that - created an index template, which sets the default_pipeline for the non-existing index and a pipeline to actually convert the documents as they come in.

Conclusion
Elastic search is very flexible. And the pipelines / ingestion API is pretty powerful as well. It allows you to write scripts in addition to having out of the box converters, field setters and so on. Using this API you can influence the way documents are being stored, even if they come from a software you have no control over. Sitefinity on the other hand does give you a very nice API, but it is abstract. It doesn't know the documents are going to be stored in Elastic Search. Lucene indexes, Azure or something else. So our guess is that the team used the common denominator for all search engines (text and date fields only). 

This leads to another very important conclusion - if the task requires you to know what search engine will be used - you probably will not be able to use the Publishing extensibility model and will have to solve the problem on the receiving site (the search engine API).


Last but not least - this is experimental and we're doing it against Elastic Search 6.6!

Here is an example Gist with the requests that should get you started: