Introduction

polymath is a highly modular open source web crawler. It has been designed to be easily modular during its processing phases. And it has other qualities too:

  • Support PDF by default
  • Can be used via a CLI or Kafka*
  • Low CPU and RAM usage, thanks to Rust

* HTTP API is planned, but wouldn’t be enabled by default.

License

The polymath source code and documentation are released under the Mozilla Public License v2.0.