In this post we are going to discuss about SharePoint search architecture, SharePoint search components, SharePoint search databases, and the SharePoint search topology.
The search architecture contains:
- Search components
- Search databases
Overview of search components and search databases:
There are some major changes in SharePoint server 2013 when compared to SharePoint server 2010. Below are the 6 different search service components in SharePoint 2013.
- Crawl Component
- Content Processing Component
- Indexing Component
- Query Processing Component
- Analytics Processing Component
- Search Administration Component
- This component takes care of crawling the content sources and collects the crawled properties and metadata and then passes crawled items to the content processing component.
- The crawl component uses one or more crawl databases to temporarily store information about crawled items and to track crawl history.
- The crawl database contains detailed tracking and historical information about crawled items.
- This database holds information such as the last crawl time, the last crawl ID and the type of update during the last crawl.
- Manages crawl operations.
- Each crawl database can have one or more crawl components associated with it.
Content processing component:
- This component receives the information (crawled items) from the crawl component and then processes and sends it to the indexing component. It also interacts with the analytics processing component and is responsible for mapping crawled properties to the managed properties.
- The content processing component writes information about links and URLs to the link database.
- This component receives the processed items from the content processing component and writes those items to the search index. Each Index files are stored on a disk in the server that hosts the index component.
- It also receives the queries from Query processing component and sends back the results.
- Dividing the search index into separate portions, called index partition.
- Each index partition holds one or more index replicas(mirror/copy) that contain the same information.
Index partition vs index replicas:
An index replica is a copy of the index. This is commonly used for availability. For example, create a replica of the index on more than one server so that your queries can be served by more than one server.
A partition is a chunk of the index. Create a new partition for scale, the recommendation is 10M items. So if you are indexing a lot of content you may create multiple partitions.
Query Processing Component:
This component handles incoming query requests and sends them to the indexing component for results. It also takes care of query optimization.
- Query Processing Component analyzes and processes search queries and results.
- When the query processing component receives a query from the search front-end, it performs linguistic processing first (like word breaking and stemming), then analyzes and further processes the query to optimize precision, recall and relevance. In the end, the processed query will be submitted to the index component.
- The index component returns a result set based on the processed query back to the query processing component.
Analytics Processing Component:
- The analytics processing component performs two types of analyses: search analytics and usage analytics.
- This component uses information from these analyses to improve search relevance, create search reports, and generate recommendations and deep links.
- The results from the analyses are added to the items in the search index. In addition, results from usage analytics are stored in the analytics reporting database.
Search analytics VS usage analytics
Search analytics is about extracting information such as -- links, the number of times an item is clicked, anchor text, data related to people, and metadata – from the link database. This information is important to relevance.
Usage analytics is about analyzing usage log information received from the front-end via the event store. Usage analytics generates usage and statistics reports.
About the link database
The link database stores information extracted by the content processing component. In addition, it stores information about search clicks; the number of times people click on a search result from the search result page. This information is stored unprocessed, to be analyzed by the analytics processing component.
About the Analytics reporting database
The analytics reporting database stores the results of usage analytics. In addition, the analytics reporting database also stores statistics information from the analyses. SharePoint uses this information to create Excel reports that show different statistics.
About the event store:
- The event store holds usage events that are captured on the front-end, such as the number of times an item is viewed.
- These usage events are stored as log files on the application server that hosts the analytics processing component.
Search administration Component:
- The search administration component is responsible for running a number of system processes that are essential to search.
- This component manages administrative processes as well as changes to the search topology, such as adding or removing search components and servers.
Search administration database
Stores search configuration data, such as the topology, crawl rules, query rules, and the mappings between crawled and managed properties. Only one search administration database per Search service application.