How Big Is the Web?
An article in the Boston Globe today on the death of the floppy points to the “information explosion” as a contributory cause. Citing a UC Berkeley study, the article says:
The Web now has more than a half-billion pages, 95 percent of them accessible to the public…and the volume grows by more than 7 million pages daily.
Google has indexed over 3 billion. I thought the common wisdom was that there are over 20 billion pages, although I haven’t been able to track that figure down. But a half-billion is clearly wrong.
Given my own dysfactia, I don’t report this to make fun of the author for making a mistake. Talk about your kettle talking to a frying pan! But usually you can see how a mistake happens. Maybe the author dropped a zero. Maybe the number actually represents a year’s growth, not the total number of pages. Any suggestions? Just curious.
And, more important, any links to recent studies of the size of the Web?
Categories: Uncategorized dw
Maybe the writer just called out to a colleague, “How many web pages are there?” and said colleague, relying on memories of old data, answered, “About a half billion.” That’s they way I accumulate most of my erroneous facts: asking someone else who doesn’t know.
It’s a tough one to answer – measuring the web is such an inexact science.
I would have thought the Nua Surveys chaps would have an answer. They provide all sorts of wonderful information about the size of the world’s online communities, split into demographic chunks, but there biggest survey still doesn’t tell us how big the web is, only how many people are estimated to be online (http://www.nua.ie/surveys/how_many_online/index.html).
So then I thought Jupiter’s CyberAtlas might help – but it focuses more on traffic patterns and infrastructure. (http://cyberatlas.internet.com/)
Good old Google brings up some useful stuff though, including this fascinating nugget:
“Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web. The deep Web contains 7,500 terabytes of information, compared to 19 terabytes of information in the surface Web. The deep Web contains nearly 550 billion individual documents compared to the 1 billion of the surface Web. More than an estimated 100,000 deep Web sites presently exist. Sixty of the largest deep Web sites collectively contain about 750 terabytes of information – sufficient by themselves to exceed the size of the surface Web by 40 times.
Lifted from here: http://www.brightplanet.com/deepcontent/deep_web_faq.asp#Anchor_dwfaq5
These mind-numbingly large numbers don’t surprise me in the least. A couple of years ago I worked with some Compaq guys to help publicise their collaboration with Celera on the Human Genome Project. At peak, this project was adding 7 terabytes of new data PER DAY to the ‘Deep Web’. That’s over a $1 million worth of storage every single day.
At this point, IT really does becomes something much more like a utility – like flicking a switch to get more light into your living room, without for a moment bothering to think about the incredibly deep infrastructure necessary to allow you to not think about it.
Thinking further; your question “How Big is the Web?” is one that can probably never be answered. It ends up being something like the Newfie clock that made the benign f2f spam rounds a few weeks back: http://www.yugop.com/ver3/stuff/03/fla.html
If just one company is responsible for yakking up 7 new terabytes of Web data every day – it’s impossible to pin the thing down to an accurate size for more than a nanosecond.
The only reasonable answer, then, is something like: “real freakin’ big, and getting bigger…”
/m
If you don’t mind a late comment, I’m wondering if somewhere the writer not only got mixed up with his facts (sites as against pages/ lost a zero or two) but also managed to bring in that Bright Planet survey – the very next FAQ after the one cited above states: “A full 95% of the deep Web is publicly accessible information – not subject to fees or subscriptions.”
95% – publicly accessible – has a ring.
Amazing what you can do with facts, and even more amazing what you can do with figures!
haha. odd, ain’t it? first of all they call it deep web when 95% of it is accessible rescources.
How large is the world wide web?
It would be pretty time-consuming to sit down and try to visit every page on the web. It’s even difficult for a computer to do this automatically (this process is called crawling, and the computer, in this case, is called…