Introduction
polymath is a highly modular open source web crawler. It has been designed to be easily modular during its processing phases. And it has other qualities too:
- Support PDF by default
- Can be used via a CLI or Kafka*
- Low CPU and RAM usage, thanks to Rust
* HTTP API is planned, but wouldn’t be enabled by default.
License
The polymath source code and documentation are released under the Mozilla Public License v2.0.