logo
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

August 14, 2009

Search Pidgin

I know I’m not the only one who’s finding WolframAlpha sometimes frustrating because I can’t figure out the magic words to use to invoke the genii. To give just one example, I can’t figure out how to see the frequency of the surnames Kumar and Weinberger compared side-by-side in WolframAlpha’s signature fashion. It’s a small thing because “surname Kumar” and “surname Weinberger” will get you info about each individually. But over and over, I fail to guess the way WolframAlpha wants me to phrase the question.

Search engines are easier because they have already trained us how to talk to them. We know that we generally get the same results whether we use the stop words “when,” “the,” etc. and questions marks or not. We eventually learn that quoting a phrase searches for exactly that phrase. We may even learn that in many engines, putting a dash in front of a word excludes pages containing it from the results, or that we can do marvelous and magical things with prefaces that end in a colon site:, define:. We also learn the semantics of searching: If you want to find out the name of that guy who’s Ishmael’s friend in Moby-Dick, you’ll do best to include some words likely to be on the same page, so “‘What was the name of that guy in Moby-Dick who was the hero’s friend?'” is way worse than “Moby-Dick harpoonist’.” I have no idea what the curve of query sophistication looks like, but most of us have been trained to one degree or another by the search engines who are our masters and our betters.

In short, we’re being taught a pidgin language — a simplified language for communicating across cultures. In this case, the two cultures are human and computers. I only wish the pidgin were more uniform and useful. Google has enough dominance in the market that its syntax influences other search engines. Good! But we could use some help taking the next step, formulating more complex natural language queries in a pidgin that crosses application boundaries, and that isn’t designed for standard database queries.

Or does this already exist?

Tags: search pidgin nlp natural_language_processing google everything_is_miscellaneous

Tweet
Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • everything_is_miscellaneous • google • metadata • natural_language_processing • nlp • pidgin • search Date: August 14th, 2009 dw

4 Comments »

August 9, 2009

Twitterelevancy

With it’s new Fresh view, Delicious builds on the TweetNews idea of using links in Tweets (and other measures) as a way to find what’s newest and most interesting. As the blog post about it says:

Underneath the hood, Fresh factors several features into the ranking like related bookmark and tweet counts, “eats our own dogfood”  by leveraging BOSS to filter for high quality results, as well as stitches tweets to related articles even if the tweets do not provide matching URLs (as ~81% of tweets do not contain URLs). Try clicking the ‘x Related Tweets’ link for any given story to see the Twitter conversation appear instantly inline.

It’s a welcome reslicing, not a whole new beast, but it seems useful.

[Tags: delivious everything_is_miscellaneous twitter news ]

Tweet
Follow me

Categories: Uncategorized Tagged with: delivious • everythingIsMiscellaneous • everything_is_miscellaneous • metadata • news • social networks • tagging • twitter Date: August 9th, 2009 dw

1 Comment »

August 7, 2009

Tags again

Jeez, it would save me a lot of time if Keynote (or Powerpoint, if you insist) let me tag slides and objects in slides (especially images). I spend way too much time looking for that slide of a “smart room” or the one that shows business vs. end-user use of Web 2.0, or that photo of an old broadcast tower. (Later that day: Maybe I should add, having just rewritten the Wikipedia entry on Interleaf, that back in the early 1990s, Interleaf gave us exactly that capability.)

Instead, I have two hacks, both a pain in the butt. First, I keep a humungous file of slides I think I’ll want to use again. Second, I’ve started putting tags into the speaker notes by putting the tags in brackets. But I use the speaker notes to speak from, so larding them up with tags is sub-optimal.

And especially if you save Keynote files in the pre-2009 multi-file formats, then it’d be a snap for third parties to build tools that extract the tags and manage them. (I have a fussy home-made utility that extracts the text from the speaker notes and builds an editable file of them. If you want it, let me know.)

Tags are easy! Tags are useful! Let tags be tags!

[Tags: tags everything_is_miscellaneous keynote powerpoint metadata whines ]

Tweet
Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • everything_is_miscellaneous • keynote • metadata • powerpoint • tagging • tags • whines Date: August 7th, 2009 dw

2 Comments »

July 11, 2009

Reslicing publications

The OCLC has an experimental site up that provides classification information for books and pubs. You type in the book’s title and author (or ISBN number, or other such ID), and it returns info about the various editions and how they’re classified in the OCLC’s Dewey Decimal Classification System or by the Library of Congress. You can then see the other books that share its Dewey Decimal number (for example, here’s Everything Is Miscellaneous, #303.4833>>Social sciences>>Social sciences, sociology & anthropology>>Social processes), at the OCLC’s useful Dewey Browser. Alas, when you click on the Library of Congress number, you get taken to a demand by the LC that you subscribe to Classification Web, instead of to the free LC Catalog (where my Misc book is listed like this).

Lots of metadata about the metadata…Gotta love it!

[Tags: everything_is_miscellaneous dewey_decimal oclc libraries books metadata ]

Tweet
Follow me

Categories: Uncategorized Tagged with: books • dewey_decimal • everythingIsMiscellaneous • everything_is_miscellaneous • libraries • metadata • oclc • taxonomy Date: July 11th, 2009 dw

4 Comments »

July 9, 2009

Real photographs

A few years ago, I sat next to an AP photographer on a press bus as he deftly photoshopped an image he’d just taken. I asked him if he was allowed to do that, and he said the rule was that he could do anything with Photoshop that he could have done in a darkroom.

I thought of him when I saw the NY Times’ embarrassed retraction of a photo essay it had published. It turns out that the photographer had “digitally manipulated” the photos without telling his editor. Unfortunately, the NYT removed all of the photos, rather than keeping them up with the metadata that the digital manipulation had gone beyond editorial guidelines, and without telling us what those guidelines are. For all photos are manipulated. The photographer frames them, decides on what to focus on and how much of the photo should be in focus, etc., and then completes the manipulation in the darkroom, whether it’s analog or digital. To think otherwise is to fall prey to the fallacy of photographic realism that Susan Sontag warned us against.

Wouldn’t it be interesting to see what the NYT’s guidelines are and then hold a contest to see who can create the most deceptive photo while staying within those guidelines?

Scott Rosenberg, a founder of Salon and the author of a terrific new history of blogging (Say Everything), provides us with reflections on what could be one of the entries, based on stories he did for the San Francisco Examiner and Wired about the photographer Pedro Meyer. Really interesting. (Embarassingly, Scott cites me at the very end.)

[Tags: photography realism journalism media everything_is_miscellaneous ]

Tweet
Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • everything_is_miscellaneous • journalism • media • metadata • photography • realism Date: July 9th, 2009 dw

6 Comments »

June 9, 2009

Meaning-mining Wikipedia

DBpedia extracts information from Wikipedia, building a database that you can query. This isn’t easy because much of the information in Wikipedia is unstructured. On the other hand, there’s an awful lot that’s structured enough so that an algorithm can reliably deduce the semantic content from the language and the layout. For example, the boxed info on bio pages is pretty standardized, so your algorithm can usually assume that the text that follows “Born: ” is a date and not a place name. As the DBpedia site says:

The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples). It features labels and short abstracts for these things in 30 different languages; 609,000 links to images and 3,150,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories.

Over time, the site will get better and better at extracting info from Wikipedia. And as it does so, it’s building a generalized corpus of query-able knowledge.

As of now, the means of querying the knowledge requires some familiarity with building database queries. But, the world has accumulated lots of facility with putting front-ends onto databases. DBpedia is working on something differentL accumulating an encyclopedic database, open to all and expressed in the open language of the Semantic Web.

(Via Mirek Sopek.) [Tags: wikipedia semantic_web everything_is_miscellaneous ]

Tweet
Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • everything_is_miscellaneous • knowledge • metadata • semantic_web • web 2.0 • wikipedia Date: June 9th, 2009 dw

5 Comments »

May 8, 2009

WolframAlpha vs. Google

David Talbot at Technology Review has run the same queries through Google and WolframAlpha. (WA isn’t yet open to the general public, i.e., to you and me.) The queries tend to be of the sort that WA will be better at: comparisons and computations. WA comes out well, but be sure to read David’s writeup of comments on his article.

The overall conclusion is, I think, that it’s going to take a while for WA to train us on the sorts of questions it can answer and how best to ask those questions.

(Some me-centric links: Live blog of Wolfram’s presentation at Harvard. Video of that presentation. My podcast interview with him. My too-early assessment of WA.)

[Tags: wolfram everything_is_miscellaneous google wolframalpha ]

Tweet
Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • everything_is_miscellaneous • google • metadata • wolfram • wolframalpha Date: May 8th, 2009 dw

1 Comment »

May 7, 2009

Wolfram podcast

My interview with Stephen Wolfram about WolframAlpha is now available. Some other me-based resources:

The unedited version weighs in at a full 55 minutes. The edited version will spare you some of my throat-clearing, and some dumb questions.

A post about what I think the significance of WolframAlpha will be.

Live blog of Wolfram’s presentation at Harvard.

Wolfram’s presentation at Harvard.

[Tags: wolfram wolframalpha search google metadata science ]

Tweet
Follow me

Categories: Uncategorized Tagged with: education • everythingIsMiscellaneous • expertise • google • knowledge • metadata • science • search • taxonomy • web 2.0 • wolfram • wolframalpha Date: May 7th, 2009 dw

Be the first to comment »

May 4, 2009

How important is WolframAlpha?

The Independent calls WolframAlpha “An invention that could change the Internet forever.” It concludes: “Wolfram Alpha has the potential to become one of the biggest names on the planet.”

Nova Spivak, a smart Semantic Web guy, says it could be as important as Google.

Ton Zijlstra, on the other hand, who knows a thing or two about knowledge and knowledge management, feels like it’s been overhyped. After seeing the video of Wolfram talking at Harvard, Ton writes:

No crawling? Centralized database, adding data from partners? Manual updating? Adding is tricky? Manually adding metadata (curating)? For all its coolness on the front of WolframAlpha, on the back end this sounds like it’s the mechanical turk of the semantic web.

(“The mechanical turk of the semantic web.” Great phrase. And while I’m in parentheses, ReadWriteWeb has useful screenshots of WolframAlpha, and here’s my unedited 55-minute interview with Wolfram.)

I am somewhere in between, definitely over in the Enthusiastic half of the field. I think WolframAlpha [WA] will become a standard part of the Internet’s tool set, but is not transformative.

WA works because it’s curated. Real human beings decide what topics to include (geography but not 6 Degrees of Courtney Love), which data to ingest, what metadata is worth capturing, how that metadata is interrelated (= an ontology), which correlations to present to the user when she queries it (daily tonnage of fish captured by the French compared to daily production of garbage in NYC), and how that information should be presented. Wolfram insists that an expert be present in each data stream to ensure the quality of the data. Given all that human intervention, WA then performs its algorithmic computations … which are themselves curated. WA is as curated as an almanac.

Curation is a source of its strength. It increases the reliability of the information, it enables the computations, and it lets the results pages present interesting and relevant information far beyond the simple factual answer to the question. The richness of those pages will be big factor in the site’s success.

Curation is also WA’s limitation. If it stays purely curated, without areas in which the Big Anyone can contribute, it won’t be able to grow at Internet speeds. Someone with a good idea — provide info on meds and interactions, or add recipes so ingredients can be mashed up with nutritional and ecological info — will have to suggest it to WolframAlpha, Inc. and hope they take it up. (You could to this sorta kinda through the API, but not get the scaling effects of actually adding data to the system.) And WA will suffer from the perspectival problems inevitable in all curated systems: WA reflects Stephen Wolfram’s interests and perspective. It covers what he thinks is interesting. It covers it from his point of view. It will have to make decisions on topics for which there are no good answers: Is Pluto a planet? Does Scientology go on the list of religions? Does the page on rabbits include nutritional information about rabbit meat? (That, by the way, was Wolfram’s example in my interview of him. If you look at the site from Europe, a “rabbit” query does include the nutritional info, but not if you log in from a US IP address.) But WA doesn’t have to scale up to Internet Supersize to be supersized useful.

So, given those strengths and limitations, how important is WA?

Once people figure out what types of questions it’s good at, I think it will become a standard part of our tools, and for some areas of inquiry, it may be indispensable. I don’t know those areas well enough to give an example that will hold up, but I can imagine WA becoming the first place geneticists go when they have a question about a gene sequence or chemists who want to know about a molecule. I think it is likely to be so useful within particular fields that it becomes the standard place to look first…Like IMDB.com for movies, except for broad, multiple fields, with the ability to cross-compute.

But more broadly, is WA the next Google? Does it transform the Internet?

I don’t think so. Its computational abilities mean it does something not currently done (or not done well enough for a crowd of users), and the aesthetics of its responses make it quite accessible. But how many computational questions do you have a day? If you want to know how many tons of fish France catches, WA will work as an almanac. But that’s not transformational. If you want to know how many tons divided by the average weight of a French person, WA is for you. But the computational uses that are distinctive of WA and for which WA will frequently be an astounding tool are not frequent enough for WA to be transformational on the order of a Google or Wikipedia.

There are at least two other ways it could be transformational, however.

First, its biggest effect may be on metadata. If WA takes off, as I suspect it will, people and organizations will want to get their data into it. But to contribute their data, they will have to put it into WA’s metadata schema. Those schema then become a standard way we organize data. WA could be the killer app of the Semantic Web … the app that gives people both a motive for putting their data into ontologies and a standardized set of ontologies that makes it easy to do so.

Second, a robust computational engine with access to a very wide array of data is a new idea on the Internet. (Ok, nothing is new. But WA is going to bring this idea to mainstream awareness.) That transforms our expectations, just as Wikipedia is important not just because it’s a great encyclopedia but because it proved the power of collaborative crowds. But, WA’s lesson — there’s more that can be computed than we ever imagined — isn’t as counter-intuitive as Wikipedia’s, so it is not as apple-cart-upsetting, so it’s not as transformational. Our cultural reaction to Wikipedia is to be amazed by what we’ve done. With WA, we are likely to be amazed by what Wolfram has done.

That is the final reason why I think WA is not likely to be as big a deal as Google or Wikipedia, and I say this while being enthusiastic — wowed, even — about WA. WA’s big benefit is that it answers questions authoritatively. WA nails facts down. (Please take the discussion about facts in a postmodern age into the comments section. Thank you.) It thus ends conversation. Google and Wikipedia aim at continuing and even provoking conversation. They are rich with links and pointers. Even as Wikipedia provides a narrative that it hopes is reliable, it takes every opportunity to get you to go to a new page. WA does have links — including links to Wikipedia — but most are hidden one click below the surface. So, the distinction I’m drawing is far from absolute. Nevertheless, it seems right to me: WA is designed to get you out of a state of doubt by showing you a simple, accurate, reliable, true answer to your question. That’s an important service, but answers can be dead-ends on the Web: you get your answer and get off. WA as question-answerer bookends WA’s curated creation process: A relatively (not totally) closed process that has a great deal of value, but keeps it from the participatory model that generally has had the biggest effects on the Net.

Providing solid, reliable answers to difficult questions is hugely valuable. WolframAlpha’s approach is ambitious and brilliant. WolframAlpha is a genius. But that’s not enough to fundamentally alter the Net.

Nevertheless, I am wowed.[Tags: wolfram wolframalpha wikipedia google search metadata semantic_web ]

Tweet
Follow me

Categories: Uncategorized Tagged with: digital culture • education • everythingIsMiscellaneous • expertise • google • infohistory • knowledge • libraries • metadata • science • search • web 2.0 • wikipedia • wolfram • wolframalpha Date: May 4th, 2009 dw

19 Comments »

April 30, 2009

The syntax of retweeting

Joi Ito posts about whether we’ve agreed upon the syntax of retweeting: If I want to twitter one of your tweets and add my own comment, do I do it as “RT @you: your comment Me: My comment” or as “RT @you:your comment [Me: my comment]” or what? Of course, there was a bunch of twittering about this, which Joi captures.

It’s fun to watch syntax emerge. As Ethanz tweets: “Microformat development in 140 chars or less…”

[Tags: twitter standards ethan_zuckerman joi_ito everything_is_miscellaneous ]

Tweet
Follow me

Categories: Uncategorized Tagged with: digital culture • ethan_zuckerman • everythingIsMiscellaneous • everything_is_miscellaneous • joi_ito • metadata • standards • twitter Date: April 30th, 2009 dw

Be the first to comment »

« Previous Page | Next Page »


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
TL;DR: Share this post freely, but attribute it to me (name (David Weinberger) and link to it), and don't use it commercially without my permission.

Joho the Blog uses WordPress blogging software.
Thank you, WordPress!