LinkTrace¶
LinkTrace is a document-oriented crawler. Every crawled page becomes a rich Document object containing metadata, content, and discovered relationships.
Perfect for: Site structure analysis, link tracking, concurrent page fetching, HTML document transformation.
Not: A Scrapy replacement. Scrapy is a powerful full-featured framework — linktrace is deliberately lightweight with no pipelines, middleware, or project scaffolding to configure. If you want crawling results in minutes rather than hours of setup, and a gentler learning curve, linktrace is for you.
Quick Start¶
pip install linktrace
import asyncio
from linktrace import Spider
async def main():
spider = Spider(start_url="https://example.com", max_depth=2)
documents = await spider.run_async()
for doc in documents:
print(doc.title, len(doc.internal_links))
asyncio.run(main())
Documentation¶
- Getting Started — installation and your first crawl
- Core Concepts — Spider, Crawler, and the Document model
- API Reference — classes, parameters, and return types
- Examples — common crawling recipes
- Troubleshooting — fixes for common issues