Getting Started (CLI)
You can also setup the crawler on the Hub directly too
Create a Findkit Hub account and organization at hub.findkit.com and install our CLI
npm install --global findkit
You should use Node.js v18 or later.
Authenticate the CLI with the Hub
This will generate the required access keys and instructs you how to add them to the Hub.
Initialize a Findkit project. You can do this within your existing website project.
It will create a
findkit.toml file for you which is meant to be versioned
within your website sources. Read through the file comments and update the
options as needed but do not setup the crawl schedules yet. We'll setup them
The file looks like this:
## Project Id / Public Token
id = "p4w0dDvmO"
name = "Example Project"
## Uncomment to enable automatic scheduled crawling
#schedule_full_crawl = "weekly"
host = "www.example.com"
use_sitemap = true
## The page content is automatically extracted but you can
## be also use explicit content selectors. https://findk.it/selectors
#content_selector = ".content"
## You can add multiple domains to single project
#host = "www.another.example"
#use_sitemap = true
Deploy and Test
Once ready deploy the project:
You must run the deploy every time you make changes to the .toml file
This creates the project to the Hub but does not run the crawls yet because we didn't enable the crawl schedules but now we can manually crawl a single page to try things out:
findkit crawl start --manual 'https://www.example.com/page'
After few seconds you can search for the page
findkit search 'keyword'
and it should return the crawled url if the
keyword appeared on the page.
If it did not you can inspect the page status and crawled keywords in the index
findkit inspect 'https://www.example.com/page'
When you update the
findkit.toml file you must run
findkit deploy again.
You can also inspect the index using the "Inspect Index" feature on the Hub.
You should try indexing few different pages manually like this. Once everything seems good you can start a full crawl on your site.
findkit crawl start
You can view the crawl status with
and view the crawl logs with
When everything looks good you can enable the crawl schedule in the TOML file and redeploy it so the search indices will be kept up to date with the site content.
For real-life example you checkout this documentation site's
Now you can setup the UI for your website.
You can test the
findkit.toml changes without deploying them with the
findkit crawl test command.
findkit crawl test 'https://www.example.com/page'
This will run the crawler on the specified page using the local
and it will print the crawler results.
Especially the first test run might take a while as the backend warms up.
One you are happy with the results you can deploy the changes with
findkit deploy and the following crawls will use the changes.