Mithril Search Engine
About
Mithril is a distributed web search engine built from scratc by students at the University of Michigan. This is our EECS 498 System Design of Search Engine capstone project under Professor Nicole Hamilton.
What is our crawler doing?
If you’ve seen our crawler hitting your site, we’re collecting web pages for academic purposes. Our crawler:
- Respects robots.txt directives
- Maintains a reasonable request rate per domain
- Identifies itself with a clear user-agent
We’re not scraping for commercial purposes or training an LLM - just building a search engine for a class project.
The Team
We’re “The Fellowship of the Ring Buffer” - seven senior/sophomore CS and Math students interested in distributed systems, performance optimization, and systems programming.
Technical Details
Our engine consists of several components:
- Crawler: Multi-threaded with connection pooling and adaptive rate limiting
- Index: SPIMI-based inverted index with variable-byte encoding and position data
- Query Engine: Boolean and phrase queries with optimized constraint solving
- Ranking: [In development]
The entire system is written in C++ with a focus on performance. Our inverted index uses memory-mapped files, aggressive compression, and our own custom data structures.
Contact
If you have concerns about our crawler or wish to have your host(s) blacklisted from being crawled, please contact:
- mithril498@umich.edu
Built by the “Fellowship of the Ring Buffer” at the University of Michigan, 2025