The way you configure Site Search passes through the concept of build triggers. Build triggers represent the connection between a DatoCMS project and a specific frontend, hosted on a particular platform (Netlify, Vercel, etc).
Since the content of a DatoCMS project can be read and used on multiple frontends, multiple build triggers can be created in a single project.
Once a build trigger is configured, it is possible to:
Command the rebuild of a frontend directly from the DatoCMS interface;
Activate Site Search, so that each time the frontend is rebuilt, the crawling of the site and re-indexing of its pages starts;
The configuration of the build trigger actually depends on the hosting solution you chose for the frontend, so please refer to the various guides inside our Marketplace.
Once you have created and properly configured a build trigger, you can activate the Site Search:
Go to the Settings > Build triggers section of your project and select a build trigger;
Check the Site search option and specify your Website frontend URL: that's the address from which crawling will begin;
Press the Publish changes button. This will start the rebuild of the frontend, and subsequently a website spidering.
Anytime you want, you can also trigger a respidering of your frontend using a specific CMA endpoint.
Once the publishing of the website ends, in the Settings > Deployment > Activity log section you will see that DatoCMS will start spidering your website. When the spidering ends (it may take a while, depending on the size of your website), you'll see a Site spidering completed with success event in your log.
Clicking on the Show details link will present you the complete list of spidered pages.
The spidering starts from the URL you configure as Website frontend URL in your build trigger settings, and recursively follows all the hyperlinks pointing to your domain. If your website has a Sitemap file (sitemap.xml
under the root of your domain), we'll use it as well. Sitemap Index files are also supported.
Through the HTML global lang
attribute present on a page — or language-detection heuristics, if it's missing — we detect the language of every page spidered page, so that indexing will happen with proper stemming. That is, if the visitor searches for "cats", we'll also return results for "cat", "catlike", "catty", etc.
The crawler does not execute JavaScript on the spidered pages, it only parses plain HTML. If your website is a Single Page App, you'll need to setup pre-rendering to make it readable by our bot. The User-Agent
used by our crawler is DatoCmsSearchBot
.
The time needed to finish the spidering operation depends by the number of pages in your website and by your hosting's performances, but normally it's about ~20 indexed pages/sec;