Joho the Blog » The Semantic Web and SGML
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

The Semantic Web and SGML

Frank thinks that Clay‘s fogged the issues around the Semantic Web. Frank points to places where the careful construction of industry metadata has resulted in integrated systems that work well.

I don’t think Clay is arguing that all metadata is bad. Rather, he’s saying that it doesn’t scale. Yes, the insurance industry might be able to construct a taxonomy that works for it, but the Semantic Web goes beyond the local. It talks about how local taxonomies can automagically knit themselves together. The problem with the Semantic Web is, from my point of view, that it can’t scale because taxonomies are tools, not descriptions, and thus don’t knit real well.

We’ved been through this before with SGML. (I’ve been working on a long piece on SGML and the Semantic Web for months now.) We know how hard it is to come up with Document Type Definition (akin to a taxonomy) for industries and we don’t have an expectation that they’ll somehow knit themselves together into a universal DTD. For exactly the same reasons, the Semantic Web won’t scale.

And the ironic thing is that even the desire to have a Semantic Web is a failure to learn from the failure of SGML to establish itself as a universal document standard…except in the form of HTML against which the Semantic Web is a reaction.

IMO.

Previous: « || Next: »

18 Responses to “The Semantic Web and SGML”

  1. Exactly, Dr. Weinberg (IMO).

    Maybe saves me a trip over to “Sam’s Club”, but I had other things to discuss.

    FP: “these large pieces are institutional and inward looking and define limits to behavior and limits to language and process that the Internet itself strips away for individuals.”

    Unfortunately, no.

    The small pieces, each unique individual which you describe so well, Frank, are likewise “institutions” that are “inward looking and define limits to behavior”.

    Some do better than others. For example, your post is illuminating in many respects, but due to the “egalitarian” nature of Blogaria, the only way to detect it is by quantitative analysis of how many people link and read your article. As opposed to the lame views that any Tom, Dick, or Harry can regurgitate, since all it costs to reconstitute someone else’s ideas are minimal time and zero cash-outlay.

    (Blech.)

    Benjamin Whorf was somewhat correct, but nails this one down pat: “The fact of the matter is that the ‘real world’ is to a large extent unconsciously built up on the language habits of the group.”

    What “bugs” me about a lot of the discussion surrounding the Semantic Web is this:

    1) EDI was and is an extremely effective system, built upon millions, if not billions, of “man”-hours of development and learning-to-use. It gets ditched in favor of XML.

    2) XML (let alone the Semantic Web) doesn’t solve this basic recurring problem effectively: Person Alpha puts “I wanna order 1000 widgets” into a field that is EDI-defined as text, rather than the quantity field. Assuming it was unintentional, an e or phone call or some form of communication ensues between Person Alpha and Person Gamma. Adding a level of complexity or two on top of the problem just allows hackers/crackers/whatever to figure out how they can get a $1M check cut to them.

    3) The Semantic Web may be an idea whose time hasn’t come, or may be an idea whose time is not going to come. We will know this, probably differently, at different points in the future. But the argument that it is here now is rather disingenuous. And if I was Mr. Ayers, or Mr. Bricklin, it would be a matter of great concern (and alarm) that The Philosopher Xian agrees with my views 100%, but maybe that’s just me.
    http://66.70.191.189/cgi-bin/mt-comments.cgi?entry_id=2016

    If the Semantic Web was anywhere even close to half-as-good as the claims that are being made for it, it would ALREADY have gained massive widespread acceptance. Meaning VAST implementation.

    That hasn’t happened. Which is not to say it is not GOING to happen, but that the claims of it’s utility are wildly exaggerated. (I’ve emphasized enough words, so leave “wildly” in lower case.)

    Mr. Shiky’s point, as I understand it, is that the Semantic Web doesn’t currently handle “the easy stuff” of things that are easily and logically deductible. So how about seeing some progress on that front, before claiming this thing is going to “think” and start taking on a will of it’s own and debug itself, to boot!

    It will still come down to WHOSE will is going to expressed by the program code, and that’s taking the optimistic pov in the debate, that a consciously thinking computer is in fact do-able.

    And the discussion of what the Semantic Web is going to be obsfucates the FACT that a LOT of the claimed benefits already exist, and many more are “relatively” easily derivable with existing toolsets and mindsets.

    The Semantic Web already exists, so much of the “vision of the future” has, in actual fact, already happened but gone unnoticed, at least imv.

    4) I am, by your definition Frank, a “raw-meat” kind of guy. However, I do not (repeat “not”) have a perspective which is “so occluded by this disfunction that he tends to trivialize the complexities of large scale projects, particularly in the area of relationships and creating common goals.”

    In fact, I have a very keen perspective of how complex social relationships are, how incredibly complex and beautiful people are, and how that beauty tends to shine through, by way of using simplicity (without over-simplifying) as a vehicle.

    I am not all that different than Dave Winer or, perhaps, a hundred or a hundred-thousand other programmers.

    When speaking of inability to work well with others, I believe the point also applies equally, if not moreso, to those backing the Semantic Web.

    After FOAF has been in place for.. how long?.. most of the elements I looked at were “unstable” and 9-11 of this year the “Gender” element was finally added to the spec!?! I appreciate FOAF is a GOOD START, but I’ve worked on ERP systems that held info on faceless vendors which captured far more suitable, and far more accurate, data elements.

    Btw, EVERY bit of logic I have (such as it is…;-) has been derived from programming Business Systems, so is immediately suspect for that sole reason, according to many of the posts I’ve read by the high-and-mighty. (That’s a low blow, perhaps, but after the number of posts I’ve read slamming “the lowly coder”, and THE lowest form of such life in the Universe, btw, being a coder who works in the Report Program Generator language, although I’ve done some fair bit of systems programming in RPGII, RPGIII, RPGIV and ILE RPG.)

    Whatever…

    Ah..

    ..whatever will be, will be. I s’pose. (I will be retiring for a nap, for one thing.)

  2. Sorry, Mr. Paynter.

    Reading someone’s blog gives the false impression of familiarity, which I’ve been trying to avoid.

  3. JayT,
    No offense to the raw meat men was intended, although I’ll have to admit to a less than admirable need to aim a low blow at one of those guys whose work I admire but with whom I have difficulty enjoying a casual, cheerful relationship. In any event, those who code make this whole thing happen and so – my thanks.

    As re. X12 EDI, this is a historical matter. X12 is a remnant of the days of leased lines, slow point to point connections, mainframes talking to mainframes. It had it’s day, and the evolution of the Internet backbone, routing, VPN and IPSec, combined with the evolution of document interchange standards to create a new playing field for “Electronic Data Interchange” groups and their applications. Some of this is economics, the difference in price between assuring secure bandwidth on the backbone versus leasing point to point circuits. Some of it is strictly evolutionary. There are still lots of those System 36 and AS400 class machines out there pumping business transactions. Why euthanize them before their time?

    But I agree with you to a large extent about the Semantic Web. It arrived about the time Dave Winer was gifting us with his interoperability approaches, including OPML, the horse he’s riding to a lather now. I think a lot of the discussion revolves around AI, and that’s an interesting separate set of questions that Clay Shirky may have mixed in here to add to my (and others’) confusion.

  4. {{MESSAGE}}

  5. But the literature of the Semantic Web is littered with scary AI references, so I think Clay is right to call ’em on this. See TBL’s original “roadmap.”

  6. USERNAME, nice meta-message.

  7. FP,

    I’m juxtaposing “raw meat men”, that verbiage along with a not-too-subtle link to Dave Winer’s Scripting News, along side “No offense to the raw meat men was intended”. Having finally gotten around to reading Mark Pilgrim’s (imo) outstanding article linked to by FP-AKMA..

    ..Well, I have my “ultra liberal parser” working in high gear. (snicker)

    Also, you are confusing the transport mechanism for EDI with the specification. EDI is VERY heavily implemented and transported over the Net vs. expensive VANs. No doubt millions of USD have been wasted converting this to transmit XML for no particular benefit. The Net provided the benefit, not XML.

    And, as a matter of fact, there ARE still a few S/36s around because IBM over-engineered their systems and built them to last. But the AS/400 is identical to the iSeries. It was just a change in branding strategy, so yes again, there are tons of 400s pumping transactions!

    Yet there are people trying to euthanize the iSeries still, as they have been for 2 decades (the AS/400 being an incremental advancement over the S/38).

    Mark my words, eventually they will succeed, or just turn the 400 INTO a *nix box when it runs AIX 5L native (next year). Because “eventually” is a long time, eventually everything comes to pass.

    (half-grin)

    Forgot to write in prior post: I would add that many “raw meat men” (and ladies) are indie sub-contractors.

    Contrary to the stereotype, getting along with people is not optional. Doesn’t matter how congenial you are off-the-job, if you don’t get along with people at work then you are OUTTA work. Contractors don’t have any of the recourse that employees have, of course.

    In the old days, you could get away with an attitude if you were THAT good, and you could find short-term engagements. While there’s still a lotta BS in the industry, I don’t find so many BS artists coding in my neck-a-the-woods as I used to. In fact, not many at all.

    Thanks for taking time to reply, FP.

  8. JayT – the EDI discussion is interesting… food for thought. We should return to it another time.

    DW – I think the whole AI thing is less than well bounded, and shame on Tim Berners Lee for casting such a long shadow on what is a difficult enough problem to solve without the metaphysical nonsense associated with machine intelligence. Logical Inference engines and Expert Systems are old hat and the earliest implementations suffered for lack of platform capacity and performance. By now, I would think those LISP applications should be doable if anyone is still interested. AI is this scary catch all term that includes everything from agents and bots to inference engines and complex context searches. I wish it wasn’t part of the discussion. It complicates things when simplicity is called for. “Rules based computing” might be as good a term as AI, though it lacks cachet.

    What I would hope all sides of this issue would attempt is a simplification that will leverage the efforts of everyone who is involved… that sounds like bureaucratic baffle-gab, let me try again…

    XML is where it’s at, Web-wise. Shirky points out that efforts are uncoordinated and the semantic web is likely to grow organically from below, rather than as some top down monster-project. This seems to me to be most likely. The NSF web project could hold surprises, but I don’t even have a peek into that toybox, so I don’t know what gizmos they’ve got going there that might have some overarching influence.

    I think Moore’s law requires oodles more iterations before we get to the thinking machine phase. Maybe THAT will never be scalable. But the semantic web already exists, and it’s growing in an organic way, much like the WWW did and does. Shirky’s strawman approach to erroneously postulated syllogisms aside, simplicity is required as the foundation for this work, and cooperation and collaboration, and a standards respectful group of workers, and stakeholders that can provide some broad implementation context… markets.

    The semantic web is the evolving matrix for interoperability and context sharing across platforms, between agencies and companies and individuals… all the E-thisandthat promised in the nineties will be polished and enhanced in functionality by semantic web applications that we see growing around us.

    I think this article by Ed Dumbill reporting on TBL last summer at www2003, basically arguing that web services and the semantic web apps are complementary and not competing, provides a view that I can agree with:

    http://www.xmlhack.com/read.php?item=1978

    Dumbill says, “Berners-Lee described the two technologies in the context of system integration: characterizing the Semantic Web as data integration, and Web Services as program integration. He also identified areas where the two could work together: discovery mechanisms such as UDDI and WSDL are ideally placed to be implemented using semantic web technology; RDF could be sent as a SOAP payload, remote RDF query and update should use SOAP; semantic web business rules engines could interact using SOAP.”

    In this more recent article,

    http://www.xml.com/pub/a/2003/09/17/udell.html

    Jon Udell says, “I don’t think the Semantic Web will come from a specification that tells us how to name and categorize everything. But it could arise, I suspect, from our linguistic instincts and from the social contexts that nurture them. If that’s true, then we need to be able to

    * Speak easily and naturally.
    * Hear what we are saying.
    * Imitate and be imitated.”

    Udell’s discussion provides a lot of oh-wow space in my head… lot’s of affirmative nodding here.

    I have a sense that we – humans – have that tendency to create binary situations… it’s either semantic webular or it’s web serviceable… can’t be both – unh-unh. But indeed, it must be both and it must and will be more, and I think Clay Shirky was firing a round for effect, but I’d be surprised if someone as well informed as he obviously is couldn’t find the integrative approach if he really wanted to look for it.

  9. The problem I have with the SW is that as we “speak easily and naturally” (Udell’s words), computers get worse at understanding us. The single “information space” (TBL’s words) that I take as being central to the SW project already exists. It’s called “language.” The SW can’t scale because computers profoundly don’t understand language. Computers work with schemas which are language sets reduced precisely so that computers can work with them. That’s the problem, IMO.

  10. I’ve been thinking about what “can’t scale” means. Hopefully I’ll be able to share something objective about that. Until I understand it better, I can only agree that right now the binary challenges are daunting. Essentially we poor humans are faced with the task of coding every one of those unforgiving ones and zeros.

  11. A Semantic Conversation

    When Clay Shirky’s paper on Semantic Weblogging first came out and I saw the people referencing it, I thought, “Oh boy! Fun conversation!” But that was before I saw that many of the links to Clay’s paper were from what are called ‘b-links’ I believe –…

  12. shirky touches off a storm of semantic web posts

    Clay’s latest essay, The Semantic Web, Syllogism, and Worldview, has resulted in quite the flurry of interesting responses. Mark Pilgrim has a number of these responses collected in his “B-Links” sidebar, but I’m going to put th…

  13. Semantic Web aphorisms

    from the much-discussed Semantic Web essay In order for the Semantic web to work, you would need “a world where

  14. Just on the point of DTDs, the scalability issues come from the way in which they are in effect individually hard-wired to stovepipe applications/application domains.

    The same problem has carried through into vanilla XML: although data can be mixed at a syntax level using namespaces, the semantics need to defined case-by-case elsewhere.

    This is the problem that RDF at least in part solves – the comparatively low-level semantics of relationships between entities (resources) in the languages is shared using the common framework.

    As Ken MacLeod put it, it’s N × M vs. N + M.

    This moves the interop problem up a level of abstraction and allows partial interpretation across domains where before there may have been none.

  15. As far as “The description of the meaning of the content in the Web” is concerned, the description itself is just another data (or information), i.e. metadata (or metainformation). In any case, the proper software tools have to be build to ‘understand’ the metadata/metainformation.

    As far as “The automatic manipulation of these meanings” is concerned and the manipulation of meanings, this is a bit skeptical because to the machines, as I’ve tried to explain elsewhere here and here, those descriptions are just data it can manipulate and not meanings.

    No, I don’t believe that metadata and metainformation will not be able to provide a level of quality in the process of information seeking and access to information, I’m just a bit skeptical about the hype and high level of optimism that the semantic web will deliver us from the chaos of the web.

    An interesting parallel are the natural languages. Each language is composed of words and phrases that have certain meaning(s) and/or concepts attached to them. To be able to navigate within the conceptual space of the language (i.e. understand the language) one needs to learn what each of the words represents: because each word or phrase is a metadata/metainformation for the actual concept in the particular language. So, it is good to be optimistic that eventually we’ll come around to be able to represent the vast and chaotic multitude of information on the web with a set of metadata/metainformation and ontologies that all software will ‘understand’.

    Well… Esperanto hasn’t yet become the world language it was meant to be… And it does not seem that it will became anytime soon… And even if it does, there still will be multiple meanings for various phrases…

  16. A Semantic Conversation

    Where Shelley talks about the semantic web equivalent of the BLINK tag, and the world gets down and kisses her feet.

  17. Niccceee pagee

  18. cheap computer software
    NEW ORLEANS – Though 50 percent of New Orleans remains flooded and teams are still working to recover the dead, there are signs that hopelessness is beginning to lift two weeks after Hurricane Katrina plowed ashore. Burnt-orange rubble from terra-cotta tiles, wrenched from roofs and scattered about the French Quarter, wait in neat piles for collection along the curb. Bourbon Street is cleaner than it ever is during Mardi Gras. And Donald Jones, a 57-year-old lifelong resident, is no longer armed when walking his street.
    buy software solution

Leave a Reply

Comments (RSS).  RSS icon