Joho the Blog » Ethanz on Google Coop
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

Ethanz on Google Coop

Ethan Zuckerman discovers that Google Coop’s roll your own search engine has high precision but poor recall, i.e., it gives few irrelevant returns, but misses stuff it should find.

A little poking solves the mystery pretty quickly. Google Coop Search works by searching against the main Google search catalog, retrieving 1000 results and filtering them against the sites you’ve included in your catalog. This makes sense, computationally – these searches are fast, almost as fast as normal Google searches. Rather than conducting 3000 “site:” searches and collating and reranking the results, Google is sacrificing recall, getting 1000 results and discarding those not in your set of chosen sites, which requires one call to the index and a really big regular expression match.

…In other words, the little engine I’ve built is useful only if the sites I’ve chosen are relatively high ranking and authoritative sites on the topics I’m searching on.

[Tags: ]

Previous: « || Next: »

Leave a Reply

Comments (RSS).  RSS icon