What (and how) are you using for free, local web-search and web-fetching with LLM agents?
这条记录涉及生成能力或端侧推理进展,适合跟踪模型效率、部署门槛和应用机会。
I am relatively new to self-hosted agentic LLMs and want to figure out what the most popular and high-quality tools are that I can provide or connect to a self-hosted agent to search for information on-demand on the web or read provided links (web-fetching).
I've heard about the ability to self-host SearXNG, but I have a few questions: * How do I provide access to it for the agent? Should I write/download an MCP (Model Context Protocol) server for it, some Skill with scripts or use a custom script and put it in a harness like Pi.dev? * How should I deal with extracting useful data from HTML? Should I use something like microsoft/markitdown to feed only the useful text to the LLM as a result? * How do I handle bot detection? I know some websites (especially those protected by Cloudflare) reject "robots" visiting their sites, meaning I might need to use headless browsers to simulate human behavior. But how do I deal with CAPTCHAs from search engines, Cloudflare, or Google?
I recently came across an advertised project that bundles solutions for these problems: Johell1NS/browser-search . Has anyone tried it?
If you know of any other tool setups or approaches to handle web-search and web-fetch locally (preferably via docker-compose
), I would be glad to hear them.