Model Setup
Model Training
All models are automatically checked on a daily basis and fully trained every 7 days (whenever underlying data changes).
To ensure that the model trains with more and better data regularly, it is recommended to resolve knowledge gaps with Instant Answers or Existing Sources.
Note
Adding questions with Rules Engine have an immediate effect, meaning that the boosted pages will come up as top results in the searches.Question Generation
Model re-training ensures that the model relearns with new or updated data (both answers and questions) and auto-generates search-improving questions behind the scenes.
GDPR Compliance
Raffle is GDPR-compliant: the search does not use, set or store cookies, and anonymizes content in search questions that are passed through any Raffle widget.
When a user searches for a phrase, the original question is used to search for relevant answers. After which, these questions are anonymized with “tags” using Raffle’s anonymization widget, before they are saved for search analytics. Simply put, Raffle does not keep questions that contain sensitive information in the database. These appear in the Trending Questions and Content Gaps lists.
Question tags:
- CPR/SSN/ID numbers are replaced with “<ID>”
- E-mails are replaced with “<EMAIL>”
- Phone numbers are replaced with “<NUMBER>”
Note
Only the content in questions are anonymized, content in answers are not included in the anonymization process.Data Clusters
Clusters are groups that have been created as a folder structure to give customers the possibility of exploring data (per widget) on different levels.
User searches in the Trending Questions and Content Gaps lists are clustered according to relevance, which means that a question has to be related to other questions in order to be part of a cluster and displayed on the list.
Trending Question clusters are sorted according to the frequency of asked questions, while Knowledge Gaps are sorted according to the similarity between the question and answers.
Cluster titles are based on the most common topics of a group, taking into account all data Raffle has collected. Simply put, if a widget has been live for a year, the titles are then based on all the data for the whole year. This also means that some titles may include words that are not currently present in the list of questions for the selected time frame.
Question Generation
Once a page has been scraped, it is then divided into sections which the Raffle model searches through for relevant answers. A section is then picked by the model and shown as an answer snippet, in the list of search answers. Questions are autogenerated after model training and can be viewed only in the Raffle backstage.
Synonyms and Misspellings
Synonyms and misspellings are automatically handled by the semantic aspect of Raffle Search. Raffle understands the context of a search query, and returns answers accordingly.
To train the Raffle model associated with your account with specific keywords or phrases, custom synonyms or abbreviations, refer to Rules Engine or provide a list of search phrases (with their target URLs or pages) to Raffle Support.
Answer Sections
The Raffle model searches through all connected indexes, finds and displays the most important and relevant section in a page, and ranks answers accordingly.
Note that the displayed section in the snippet may or may not contain the exact match of words in the search phrase.
Order and Ranking
Raffle Search results are model-based. A page could be manually boosted to come up as the top result, but the exact section displayed or the order in which answers are shown CANNOT be manually set, as this is decided by the Raffle model itself.
Search Data
Raffle Search, Chat and Summary uses the connected indexes as the sole basis for the displayed answers in a widget.
Additional Notes:
- Chat and Summary both use the top answers in the search results (done in the background) as references for the generated answers
- The generated answers cannot be manually changed but can be influenced with Rules Engine
- The reference limit can be further adjusted to refine the generated response