logo
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

May 5, 2009

Scanning in a book

Over the past year, I’ve digitized a bunch of my family’s old photo albums by photographing the pages with a digital camera. This is far faster than scanning them, and the quality is good enough and infinitely better than not having any digitized versions.

Now I’m contemplating using the same technique to make a digital copy of my 1978 doctoral dissertation. The object consists of 350 pages of typed, double-spaced 8.5″x11″ pages, bound. At 15 secs per page, that’s about 1.5 hours of time (= 4 Daily Shows, or 3 SNLs with the dross fast-forwarded).

I’d appreciate advice about the digital side of it, given that I’d like the “scans” to be readable online and, ultimately, be OCR-able.

1. My camera goes up to 10 megapixels, which I assume is way more than I need for this project. I don’t care about reproducing the pages as physical artifacts. I’m only interested in the text on them. How many mpixies should I be shooting at?

2. What would be the most convenient way to post these from a reader’s point of view? Anything other than PDF? (Google Books lets you submit your books in PDF format, so I’d like to produce a PDF version in any case.)

3. Depending on your answer to #2, do you have any suggestions of tools to use? (I’m doing this on a Mac.)

4. Any other advice?

Thanks!

[Tags: scanning pdf ]

Tweet
Follow me

Categories: misc Tagged with: misc • pdf • scanning Date: May 5th, 2009 dw

14 Comments »

May 2, 2009

And now, medieval music for cockatoos

I love the dancing cockatoos.

And I love the research that’s found a connection between animals with language skills and animals that can dance.

But I wonder what it means that cockatoos probably can’t dance to most of the music our culture has created throughout its history.



Canconier by Canconier

Did it take until the 20th century for us to de-evolve music to the point that cockatoos could dance to it?

Prednisone: A powerful anti-inflammatory medication that requires careful administration.

Did you know that the timing and dosage of prednisone can significantly impact its effectiveness? This corticosteroid is typically taken orally, but the schedule varies based on your specific condition.

Key points to remember:

• Follow your doctor’s instructions precisely

• Take with food to minimize stomach upset

• Don’t stop abruptly – tapering is often necessary

Curious about potential side effects or interactions? Consult your healthcare provider or pharmacist for personalized advice.

(Note: I also love Magnatune, and the occasional Monty Python reference.)

[Tags: birds cockatoos language dancing monty_python ]

Tweet
Follow me

Categories: misc Tagged with: birds • cockatoos • dancing • language • misc • monty_python Date: May 2nd, 2009 dw

3 Comments »

April 27, 2009

Encarta nostalgia: SGML and the Semantic Web

I’m not going to much mourn Encarta’s demise. Wikipedia is too big, too fast, too useful, too much fun. But Encarta was an ambitious project that broke some ground. So, pardon me if I sigh wistfully for a moment, and have a little moment of Encarta appreciation. Ahhhh.

When Encarta began, it was taken as validating this whole crazy CD-ROM approach to knowledge. It was searchable. It had multimedia. It let you do some slicing and dicing. It was breezy, at least compared to its hundred-pound competitors. But for my circle, the big news was below the surface: Encarta used SGML. It was, in fact, one of the first commercial SGML projects delivered into the hands of average customers.

SGML — Standard Generalized Markup Language — was the Semantic Web of its time: roughly the same arguments in its favor, roughly the same approach. This isn’t entirely accidental, for two reasons: 1. HTML is a form of SGML. 2. SGML got a lot of things right.

SGML was a way of specifying the structural elements of a document. In the case of an encyclopedia, elements might include volumes, articles, article titles, subheadings, body text, illustrations, captions, references, and see-also’s. You could also specify the metadata for each element: this illustration is of a dress, its topic is “clothing,” its era is 1920-1930. SGML also let you specify rules about what constitutes a valid instance of a document. For example, the rules might say that a valid encyclopedia article has to have one and only one title, it can any number of illustrations, and every illustration has to have a caption. Once you have created a valid set of documents, you can then use your fancy-dancy computers to assemble views at will: Show me all the illustrations whose “topic” is “clothing” from the era 1920-1930. Etc. Incredibly useful.

You haven’t heard about SGML (at least not much) for a few reasons.

First, industries that wanted to be able to share data wrapped themselves in knots trying to tie down the specific specifications for their documents. Endless and endlessly geeky arguments ensued about how exactly to encode a table of parts.

Second, outside of technical documentation designers, most people don’t think about documents in terms of their structural elements. Rather, they think of documents as a series of formatting decisions. SGML was not designed to capture format. From SGML’s point of view, the title of an article is simply an element called “title” and it’s up to someone else to decide whether titles are bolded, underlined, or printed in red. Now, let me hasten to add that people actually do think of documents in terms of their structure: We decide to make this piece of text bold because it’s the title. But we seem to be reluctant to note those decisions in terms of structure; we’d rather just drag-select the text and hit the bold-it key. That’s why Microsoft Word over the years has made “procedural markup” (drag-select-bold) more prominent in its UI than “declarative markup” (declare this paragraph to be a Title element, and then tell it how to format Titles).

Third, HTML swept the world. HTML is a set of SGML elements and rules specified by a certain Sir Tim. Because HTML is designed not for encyclopedia articles or for shopping lists but for anything that might be put on the Web, it has highly generic elements that do not reflect the content of particular types of pages: It has six levels of headings, two types of lists, one type of image, etc. The SGML folks initially sneered at this. It looked “brain dead” to them. The documents were too generic. There wasn’t enough semantics: That something is a second-level heading expresses its place in the document’s structure, but not the fact that it’s the name of a repair procedure or a list of ingredients. And, HTML seemed too interested in capturing formatting. That’s why newer versions of HTML want you to use <em> (em=emphasis) instead of the original <i>: the old way had you making a formatting decision (“Italicize it”) rather then a structural one (“The role this point plays is that of being emphatic, which the browser should visually express in the way it feels is proper”).

The other side of the coin is that HTML is way way way easier to use than having to design and then follow a set of SGML design rules, with specific elements, for every different sort of document you want to create. Its simplicity meant that people actually succeeded at it. Furthermore, it was in the interest of the browsers to forgive all errors: If browser X rejects a page because it didn’t follow HTML’s rules, you would be driven to see if browser Y could display the page. If Y could, you’d consider X — not the page — to be broken. The browser economics favored sloppiness and forgiveness, neither of which were hallmarks of the SGML’s discipline-based culture.

Now, as the great dialectical pendulum swings, the Semantic Web has arisen to remind us of the value of metadata. If it can avoid the perfectionism and discipline that left SGML as a tool for the few, it will add back in some of the smarts the loose ‘n’ low-hangin’ HTML usefully took out. As the name implies, the Semantic Web is more about expressing the structure of meaning and concepts in a field than about expressing the structure of documents. For an encyclopedia, you wouldn’t want to wait for the Semantic Web to create the entire web of meaning, because that web would have to be as wide as the topical coverage of the encyclopedia itself. You might instead want to come up with a set of standard document elements, perhaps applied somewhat loosely, with the ability to slather on rich layers of metadata, and then watch webs of semantics get spun. Which is pretty much exactly what we’re seeing at Wikipedia.

Meanwhile, Encarta remains an example — along with the Oxford English Dictionary and others — of the value of rigorously structured and metadated documents.

[Tags: encarta sgml semantic_web ]

Tweet
Follow me

Categories: misc Tagged with: encarta • misc • sgml Date: April 27th, 2009 dw

1 Comment »

April 16, 2009

WolframAlpha alpha

Seb Schmoller went to a webinar put on by Stephen Wolfram about the upcoming WolframAlpha search engine (well, answering engine) and came away impressed…

[Tags: wolfram wolframalpha google search ]

Tweet
Follow me

Categories: misc Tagged with: google • misc • search • wolfram • wolframalpha Date: April 16th, 2009 dw

2 Comments »

April 10, 2009

This election brought to you by Starbucks

Well, not exactly. Starbucks is offering a free cup of coffee to everyone who votes in the Indonesian elections. (Via Mong Palatino at GlobalVoices

[Tags: elections starbucks indonesia globalvoices ]

Tweet
Follow me

Categories: misc Tagged with: elections • globalvoices • indonesia • misc • starbucks Date: April 10th, 2009 dw

2 Comments »

April 4, 2009

Obama Obama’s exceptionalism

James Fallows has an insightful analysis of a seemingly simple answer Obama gave to a question about American exceptionalism. And in it, you can see the workings of Fallows’ writerly mind.

[Tags: obama exceptionalism james_fallows ]

Tweet
Follow me

Categories: misc Tagged with: exceptionalism • misc • obama Date: April 4th, 2009 dw

2 Comments »

April 1, 2009

When the tubes were tubes

The always insight/delight/ful Molly Wright Steenson describes 19th century pneumatic tube cylinder delivery systems. Quite astounding…and all in 5 minutes and 20 slides. From the O’Reilly Ignite series.

[Tags: tubes pneumatic_tubes packet_switch_networks networks molly_wright_steenson ]

Tweet
Follow me

Categories: misc Tagged with: culture • infohistory • misc • networks • tubes Date: April 1st, 2009 dw

1 Comment »

Veggie in New Orleans

Last night in New Orleans I had a delicious veggie meal at Bennochin (1212 Royal St), an African restaurant. The restaurant is humble, inexpensive, and friendly. It’s not a veggie restaurant, but they have lots of dishes that are, or can be made, animal free. It’s next door to Mona’s, which has some veggie, cheesy Italian dishes. Since New Orleans is not the veggie-friendliest of cities — they like their seafood here! — I thought I’d make a note of it…

[Tags: new_orleans vegetarian restaurants veggie ]

Tweet
Follow me

Categories: misc Tagged with: misc • restaurants • vegetarian • veggie Date: April 1st, 2009 dw

Be the first to comment »

March 21, 2009

The wisdom of snake mobs

Amboseli baboons engage in what’s called “snake mobbing”: Rather than fleeing from a predatory snake, they approach it and sound the alarm or even, at times, attack it. (“The Information Continuum,” Barbara J. King, p.43)

Surely this must be a metaphor for something.

[Tags: baboons snakes metaphors ]

Tweet
Follow me

Categories: misc Tagged with: baboons • culture • metaphors • misc • snakes Date: March 21st, 2009 dw

1 Comment »

March 18, 2009

Animated graffiti


By blu.

Awesome.

[Tags: graffiti animation awesome ]

Tweet
Follow me

Categories: misc Tagged with: animation • awesome • graffiti • misc Date: March 18th, 2009 dw

2 Comments »

« Previous Page | Next Page »


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
TL;DR: Share this post freely, but attribute it to me (name (David Weinberger) and link to it), and don't use it commercially without my permission.

Joho the Blog uses WordPress blogging software.
Thank you, WordPress!