Joho the Blog » too big to know

July 7, 2012

[2b2k] Big Data needs Big Pipes

A post by Stacy Higginbotham at GigaOm talks about the problems moving Big Data across the Net so that it can be processed. She draws on an article by Mari Silbey at SmartPlanet. Mari’s example is a telescope being built on Cerro Pachon, a mountain in Chile, that will ship many high-resolution sky photos every day to processing centers in the US.

Stacy discusses several high-speed networks, and the possibility of compressing the data in clever ways. But a person on a mailing list I’m on (who wishes to remain anonymous) pointed to GLIF, the Global Lambda Integrated Facility, which rather surprisingly is not a cover name for a nefarious organization out to slice James Bond in two with a high-energy laser pointer.

The title of its “informational brochure” [pdf] is “Connecting research worldwide with lightpaths,” which helps some. It explains:

GLIF makes use of the cost and capacity advantages offered by optical multiplexing, in order to build an infrastructure that can take advantage of various processing, storage and instrumentation facilities around the world. The aim is to encourage the shared use of resources by eliminating the traditional performance bottlenecks caused by a lack of network capacity.

Multiplexing is the carrying of multiple signals at different wavelengths on a single optical fiber. And these wavelengths are known as … wait for it … lambdas. Boom!

My mailing list buddy says that GLIF provides “100 gigabit optical waves”, which compares favorably to your pathetic earthling (um, American) 3-20 megabit broadband connection,(maybe 50mb if you have FIOS), and he notes that GLIF is available in Chile.

To sum up: 1. Moving Big Data is an issue. 2. We are not at the end of innovating. 3. The bandwidth we think of as “high” in the US is a miserable joke.

By the way, you can hear an uncut interview about Big Data I did a few days ago for Breitband, a German radio program that edited, translated, and broadcast it.

Follow me

Categories: broadband, science, too big to know Tagged with: 2b2k • big data • broadband Date: July 7th, 2012 dw

2 Comments »

July 3, 2012

[2b2k]The inevitable messiness of digital metadata

This is cross posted at the Harvard Digital Scholarship blog

Neil Jeffries, research and development manager at the Bodleian Libraries, has posted an excellent op-ed at Wikipedia Signpost about how to best represent scholarly knowledge in an imperfect world.

He sets out two basic assumptions: (1) Data has meaning only within context; (2) We are not going to agree on a single metadata standard. In fact, we could connect those two points: Contexts of meaning are so dependent on the discipline and the user's project and standpoint that it is unlikely that a single metadata standard could suffice. In any case, the proliferation of standards is simply a fact of life at this point.

Given those constraints, he asks, what's the best way to increase the interoperability of the knowledge and data that are accumulating on line at at pace that provokes extremes of anxiety and joy in equal measures? He sees a useful consensus emerging on three points: (a) There are some common and basic types of data across almost all aggregations. (b) There is increasing agreement that these data types have some simple, common properties that suffice to identify them and to give us humans an idea about whether we want to delve deeper. (c) Aggregations themselves are useful for organizing data, even when they are loose webs rather than tight hierarchies.

Neil then proposes RDF and linked data as appropriate ways to capture the very important relationships among ideas, pointing to the Semantic MediaWiki as a model. But, he says, we need to capture additional metadata that qualifies the data, including who made the assertion, links to differences of scholarly opinion, omissions from the collection, and the quality of the evidence. "Rather than always aiming for objective statements of truth we need to realise that a large amount of knowledge is derived via inference from a limited and imperfect evidence base, especially in the humanities," he says. "Thus we should aim to accurately represent the state of knowledge about a topic, including omissions, uncertainty and differences of opinion."

Neil's proposals have the strengths of acknowledging the imperfection of any attempt to represent knowledge, and of recognizing that the value of representing knowledge lies mainly in its getting linked it to its sources, its context, its controversies, and to other disciplines. It seems to me that such a system would not only have tremendous pragmatic advantages, for all its messiness and lack of coherence it is in fact a more accurate representation of knowledge than a system that is fully neatened up and nailed down. That is, messiness is not only the price we pay for scaling knowledge aggressively and collaboratively, it is a property of networked knowledge itself.

Follow me

Categories: everythingIsMiscellaneous, too big to know Tagged with: 2b2k • linked data • metadata • semantic web Date: July 3rd, 2012 dw

3 Comments »

June 29, 2012

[aspen][2b2k] Ideo’s Tim Brown

Tim Brown of Ideo is opening his Aspen Ideas Festival talk with a slide presentation called “From Newton to Design”. He says he’s early in thinking it through.

He points to a problem in how we’ve thought about design, trained designers, and have practiced design. The great thing about designing simple products is that you can know almost everything about them: who made them, who they’re for, how they were produced, etc. But as products get more complicated, it gets harder even for a team of designers to really understand what’s going on. They get so complicated that there are lots of places design can fail.

When we go out to urban planning , that becomes even more obvious, he says. He shows Union Sq. when it was designed and how wildly NYC has grown around it. Or, at the Courtyard Marriott chain, every element of the user’s experience has been thought through. He shows a script that specifies every interaction. But you can’t anticipate everything. E.g., JetBlue is one of the best designed customer experiences and even they got it wrong a couple of winters ago.

What’s going on? It’s all about complexity. Henri Poincaré in the 19th century tried to solve the three body problem that had been set by the French govt as an open source competition. HP couldn’t solve it. It sounds like a simple problem, but it’s very hard. [BTW, there’s a fascinating history of three French aristocrats hand-computing the movement of Halley’s Comet, which depended on calculating the gravitational influences of multiple bodies. Can’t find the ref at the moment]

Our basic ideas about design have been based on Newton, says Tim. Design assumes the ability to predict the future based on the present. We need to think more like Darwin: design as an evolutionary process. Design is more about emergence, never finished.

He presents a few principles of Darwinian design that he’s been exploring.

1. Design behaviors, not objects — the behaviors that come from our interactions with objects. If you’ve traveled on the high speed trains in Europe, there are signs urging men to be more accurate when peeing. But at Schiphol Airport, they print a fly at the right spot in the urinal; men became 80% more accurate. That’s designing behavior; the actual object doesn’t matter.

2. Design for information flow. Nicholas Christakis has looked at how networks affect behavior. Tesco uses its loyalty card — which cost them 20% of their margins — to increase sales.

3. Faster iteration = faster evolution. Viruses evolve faster than we do because they iterate faster than we do. E.g., State Farm tried out a new idea how to build relationships with the new generation. They built one storefront for this, and learned from it. “Launch to learn.”

4. Use selective emergence. This intrigues him, alathough he doesn’t know how useful it will be in design. Rather than random mutations, you choose what might be interesting and design things that get us there through many iterations. I.e., genetic algorithms. E.g., the Strandbeest walks along beaches with a hip joint unlike any in nature because the artist used genetic algorithms.

5. Take an experimental approach. I.e., testing hypotheses. Cf. Eric Ries, the Lean Startup (build, measure, learn). E.g., Ideo.org has been working on sanitation in Ghana. Where you can’t dig septic pits, Ideo has been experimenting with low cost receptacle toilets (with bio-digesters). But people didn’t want to pay for the service. So, they gave some to families and went away for three days. All the families changed their minds and said they are willing to pay for the service (which is provided by a local franchise).

6. Focus on simple rules. This comes from emergence theory. E.g., complex bird flocking patterns are based on simple rules. [Canonical example: Termite mounds.] E.g., Bi-Rite stores in SF uses simple rules: If an employee is within 10′ of a customer, you look the customer in the eye. If within 4′, you talk with them. This creates a wonderful service experience.

7. Design is never done. E.g., World of Warcraft is constantly being designed by its players.

8. The power of purpose. This creates the self-governance these complex environments succeed. Arab Spring and Occupy Wall Street are examples. Companies are experimenting with new ways of thinking about their business and products. E.g., Patagonia tells you not to buy its products because it also wants to preserve the environment.

The prototypical design artefact is a blue print. Once you created the blue print, the design was done. It was the instruction set for someone to make it. That’s how we think about design: finish and done. What replaces it: Code. It might be DNA (and Tim has people researching this), but more often it’s programming code. It’s an instruction set that can continue to evolve.

Now James Fallows [swoon] interviews him.

JF: You embody your principles. The rules are differen from a prior version. [ACK! Crash. Missed about 2 minutes]

TB: We’ve just finished designing the prototype experience for the new health care exchanges. It will affect how people choose which health care insurance to choose. Today it’s done with paper. Under the new health care laws, lots of people will get to make these choices. We worked with the CA Healthcare Foundation to prototype the user experience. What are the key pieces are parts? How can we keep the choices reasonably simple? Then each state will use this a platform to develop their own.

JF: And the govt had the wit to come to you to do this?

TB: The CA Health Care Foundation…

JF: What are the barriers? Does it cost more to do it your way?

TB: It’s often less costly. Most often they don’t have a good understanding of what their customers go through. When a health care org comes to us, relatively frequently we find out that a senior exec had to go through the health care experience. It’s true of all organizations. We don’t ask the right questions. The urgency to change is not there, and the resistance to change is always huge.

JF: Has the TSA come to you?

TB: Yes, but … well, we learned a lot. In the previous admin, we worked with them to find areas of change. Although going through the scanners has to improve, a lot of it has to do with the behavior of the people. They looked at a training program that was intended to take away some of the rule-based system they used. The more rules you apply, the less sensitive the system is. You need to give the people in that system much more independence to make judgments.

JF: Who do you hire?

TB: We look for a wide range of people. Many disciplines. We look for deep skills, and for empathy. It’s hard to solve problems for others without that. Also, most of what we do is too complex for individuals, so we work in teams, and thus people need an enthusiasm for empathy.

JF: Any unusual interview techniques?

TB: We put people into a situation in which they’re practicing design. E.g., intern program. Also, competitions. And we use Open Ideo as a way of seeing how people work.

JF: Beyond the toilet, what else are you doing for “design for poverty.”

TB: I got excited when I saw the opportunities for design in some social design work. At Open Ideo we’re working on clean water, early ed programs, etc. Ideo.org is a non-profit org. We want it to be sustainable and scalable so we look for external funding for it.

JF: How do you approach environmental sustainability?

TB: We try to build that into every project. Every project affects the environment. We try to bring sustainable thinking around systems, materials, energy flows, etc.

JF: What projects are you proudest of?

TB: The work we do in health care, including with Kaiser Permanente. Also, consumer-facing, post-crash financial services. PNC digital wallet. “Keep the change.” Etc. This is not an area where design has had much to do.

JF:

TB: For physical objects, it peaked maybe 20-30 years ago (with Apple as an exception). But we’re in ascendance for behavior-based designed. We get 25,000 apps a year for 100 openings. We’re a 600-person company. Etsy, Kickstarter, sw designed better than ever before…great things are happening. Soon if not already the number of digital designers will be greater than all other designers combined.

Q & A

Q: Your principles are so close to Buckminister Fuller’s [says the guy from the Fuller institute]. But the boundary between social and evolutionary systems is illusory.

TB: Yes, Fuller figured this out a long time ago. We’re perhaps resurrecting ideas, as every generation does. Design has operated as a priesthood for too long. When I started, I was only interested in how beautiful something is. That’s so much simpler. Opening design up to many more will convince us all that we’re all part of this big design ecosystem and have a responsibility to be thoughtful about the contributions we’re making to the world around us. I hope professional designers learn to enable that, more than controlling it. The B School at Stanford is introducing non-designers to design, which is great.

Q: What can we do to simplify the rules?

TB: The unstated bit of my thesis is that you still have to stop and design something. We develop an idea, perhaps more through iteration. That process doesn’t change. For rebuilding a complex system, maybe big data will help us to see patterns that allow us to understand what we’re designing’s complex effects…but I don’t think we’re there yet. We should be thinking about the hooks we’re building in. I’m big into APIs that allow other people to build with what you’ve built.

Q: Is it training or DNA that determines a good employee for you?

TB: Both. We hire people straight out of grad school because they’re moldable. We hire older people, but it’s harder for them to adapt. I don’t have much control as CEO. The future of all businesses is to have cultures that are a s self-governing as possible. That’s much more resilient and agile than cultures built on inflexible rule sets.

Q: I chair a land conservancy. We create parks in urban areas. Does Ideo have much experience in designing to create behaviors that will get people to use parks? What’s your view of the state of park design?

TB: We don’t have a lot of expertise in designing anything because we like designing everything. The High Line and the West Side park in NYC are remarkable examples. Projects like that show that parks can be remarkable assets to the city. We’re working with High Line on the third phase of that project. NYC’s life expectancy has gone up 3 yrs. Two explanations: People are closer to health facilities, and people walk more.

Q: What are the logistics of running a decentralized org? Mentoring? Sharing a vision?

TB: Purpose creates a sense of direction, so we talk about why the heck we’re doing what we’re doing. We think we should measure everything we do based on the impact it has on the word. We’ve done an occasionally decent job of mentoring; that can be a problem with a decentralized org. It’s a tension. Most of our employees probably want more mentoring, but we also want autonomy. We are not big believers in warehousing knowledge. Designers hate reusing other people’s ideas. It’s much better to have knowledge systems that inspire people to think in new ways. So we’re a storytelling culture. It’s a bit of an obsession of ours. If you do a piece of work, your job is to have some stories to tell about it. That’s more effective than big reports that live in a database somewhere.

(JF calls for all remaining questions)

Q: My group works with at-risk youth. Education is increasingly standards based, but your work is collaborative.

Q: How do you look at chaos? People in open markets are open and affectionate. In corporate controlled spaces, people shut down.

Q: Does form drive function or vice versa?

Q: Apple is a closed system. Google wants more control. Open vs. controlled systems?

TB: 1. University ed is not always the best way to teach entrepreneurship. Apprenticeships are interesting. 2. Great markets are vibrant, but not chaotic. I take clients to the Ferry building to point out all the interrelated pieces that make that such a great experience. It’s not top down, but you can see the patterns and use them as inspiration. 3. Form follow function? Hard to kick that notion because I believe in beautiful engineering, but most things we’re designing today have hundreds of functions, so you can’t get a single form for it. 4. I love closed systems but I think they’re inevitably part of an open system. IOS is part of an open system of everything else that I do with it. We need both. [At last! Something I disagree with! Sort of! :)]

[Fantastic. I’ve been a huge fan of Ideo’s work, and Ideo’s organizational ethos, and Tim Brown, for a long time. So I felt particularly narcissistic as I heard this talk through Cluetrain and Too Big to Know lenses. Substitute “knowledge” for “design” and you get a lot of the ideas in 2b2k. To hear them coming from Tim Brown, who is a personal idol of mine, was a self-centered thrill.]

Follow me

Categories: business, cluetrain, culture, liveblog, too big to know Tagged with: 2b2k • aspenideas • cluetrain • design • innovation • liveblog Date: June 29th, 2012 dw

7 Comments »

June 24, 2012

[2b2k] How much info per minute, per an infographic

There’s a fun infographic — and aren’t all infographics fun, one way or another? — at Visual News about how much information is made every minute.

It’s poorly sourced (a list of sources at the bottom without references to which data came from which sources, and no links, but, heck infographics are fun!), but let’s assume/pretend that it’s accurate. Beyond the pure massiveness of the amount of data, a couple of “facts” leap out (and these are especially unreliable since they probably come from different sources so the comparisons are likely to be apples to orangutans, but it’s all about putting the “fun” into infungraphics!):

There are three times as many tweets as Facebook Likes, even though one is just a no-thought reaction (and the other requires pressing the Like button — heyo! I kid Twitter ’cause I love it. I’ll be @dweinberger all week.)
There are 80x more posts on Tumblr than on WordPress
There are 2,000x more emails sent than Tweets posted, and 100x more emails sent than search queries received by Google. This seems plausible if I look at my own usage, but I’m old and thus more attached to email than are today’s Digital Youngsters with their IMs and their hiphop ringtones and 4Gs. Nope, email remains the volume leader in terms of number of units (as opposed to the number of bytes, which I cannot figure out).

Info fun! With air quotes around each of those two words!

Follow me

Categories: abundance, too big to know Tagged with: 2b2k • email • info • infographic Date: June 24th, 2012 dw

1 Comment »

June 13, 2012

[2b2k] PDF 2012 – In Defense of Echo Chambers

Here is the text of a short talk I gave at PDF yesterday. I did not use slides, and I actually read from pieces of paper because I wanted to make sure that I stayed on time (it took about 8 minutes, I think) and did not stray too far from what I wanted to say. So, yes, I read a freaking paper at PDF. And yes, I am ashamed. On the other hand, I’m humbled and amazed to have been in the line-up of speakers that morning.

Senator Daniel Patrick Moynihan is reputed to have said, “Everyone is entitled to his own opinion, not to his own facts.” We like this saying in large part because it brings us the comfort of believing that facts provide a way of bringing us together. But perhaps the single incontestable conclusion to be drawn after any even quick involvement with the Internet is that we don’t agree about anything. Everything is contested on the Net, even things that really should not be. Perhaps it’s time to acknowledge that the facts are not going to bring us together. The old Enlightenment ideal of two people with deeply different ideas sitting together over a cup of coffee and working themselves down to their fundamental differences, until the issue is resolved, the Internet has shown that that ideal just isn’t going to happen. We don’t agree, and now we can’t deny it.

I am not saying that we should give up on facts, or on fact-based argument. To the contrary. It remains our obligation to try to base our policies on facts, because facts are the parts of reality against which we bark our shins. Reality counts.

But I do want to argue against one version of despair that comes from looking at the seeming powerlessness of facts on the Internet: The echo chamber argument.

Cass Sunstein’s idea of echo chambers, and Eli Pariser’s excellent Filter Bubble variation, are well known to you. It’s the idea that when people are given lots of choices of voices to listen to, they — we — tend to listen to people with whom we already agree, and that this results in a confirming of what we believe, and can move us to move extreme versions of it, resulting in even greater polarization. If the Net is having this effect, the Net is not the great hope for a more open society, but a tragedy. Echo chambers are a real problem. We need to be vigilant, and educate ourselves and our children how to avoid their pernicious effects.

Please keep that in mind as I head toward what is actually my point today: Echo chambers are dangerous, but they are also a condition of thought and understanding.

So, I want to look at an example of an echo chamber. But not the usual ones. Instead, Reddit.com. Reddit has all the earmarks of an echo chamber. The Reddit community, although it is far from uniform, nevertheless generally shares some values. It is pro science, atheist, pro legalization of marijuana, pro cute cat, generally progressive. It has shared heroes like Neil DeGrasse Tyson. It has a set of in-jokes — memes that often you have to understand a hidden context to get; you have to know that a photo of a particular woman flags the text as an example of a “first world problem.” Then it’s hilarious. Reddit has its own vocabulary: FTFY is fixed that for you, and AMA is ask me anything. And it has its own norms and ethos. Reddit is an echo chamber.

Yet, it’s also one of the best examples of how a community can successfully engage outside of its own bubble. IAMA at Reddit stands for I am a …someone putting her or himself forward as interesting, willing to answer questions. I am a Mariachi. I am Louis CK. I am Daryl Issa. I am a janitor at WalMart. I am a Rick Santorum supporter. I am a Muslim religious student — remember Reddit is strongly atheistic and even anti-religion.AMA. Ask me anything. At its best, which is frequent, what follows is a group interview in which answers are treated with respect so long as they are frank and honest. The community feels empowered to ask the questions that people really want answered, without a foolish regard for political correctness. (Of course not all political correctness is foolish.) IAMA’s are a new form of journalism, and can result in the best interviews I’ve read – the recent IAMA with Paul Krugman for example. More important at the moment, they are a way in which an echo chamber throws a window open.

The key point is that it’s because Reddit is an echo chamber that it can engage in something close to the Enlightenment ideal of open, honest, frank discussion among people with deep deep differences. This is totally not accidental, and points to the baby that we should be careful not to throw out with the echo chamber bathwater. The Reddit community can engage in IAMAs so frankly and well because it has a strong sense of who it is as a community. Communities are echo chambers – a set of people that share basic values and beliefs that are assumed and reinforced. This is not an accident or something we can avoid. It is baked into the very nature of the conversations that create community: To have a conversation of any sort, you have to have 99% agreement. (I made that number up.) You have to be speaking the same language, have the same basic norms of conversation — who gets to speak for how long, how interruptive you can be, and so forth — and you have to be interested in the same topic. Then you can find some small differences to talk about — you both like Johnny Depp but differ about if he’s sold out, or you both want the poor to have access to health care but differ over how — and then you iterate on that 1% of difference. This need for a vast similarity is not a failing of conversation, but is its condition. And that’s because human understanding itself works this way. We understand the new by assimilating it to our existing context- our densely interrelated web of concepts, ideas and feeelings. That’s why when some piece of news comes along, it makes sense to go to a site where people with whom you basically agree — your echo chamber — is discussing it. What did the Wisconsin recall results mean for Pres. Obama’s reelection? I’m going to go first to, say, DailyKos, because they’re going to help me understand it within my personal political context. I might then visit a Republican site to help me see how they’re taking it, but that’s at least in part a type of anthropological research. Communities are echo chambers. Conversation is an echo chamber. Understanding is an echo chamber. The political solidarity that leads to action requires an echo chamber.

And as Reddit shows, our way out of an echo chamber is through an echo chamber.

The problem is that Reddit is an all too rare example of an echo chamber that willingly throws open its windows. It takes rare delight in doing so. What distinguishes Reddit? It is an echo chamber with a commitment to the value of curiosity, and strong norms of empathy, acceptance and love. It can engage with other points of view without giving up its own values or its snarky silliness. And from this, as the SOPA protest showed, can come political action.

We cannot escape all our echo chambers. Our challenge is to bring to each of the echo chambers we inhabit the values that will turn them into arenas of engaged understanding rather than into dark chambers of willful stupidity.

Follow me

Categories: echo chambers, liveblog, too big to know Tagged with: 2b2k • echo chambers • pdf12 • reddit Date: June 13th, 2012 dw

2 Comments »

June 7, 2012

[2b2k] The Internet, Science, and Transformations of Knowledge

[Note that this is cross posted at the new Digital Scholarship at Harvard blog.]

Ralph Schroeder and Eric Meyer of the Oxford Internet Institute are giving a talk sponsored by the Harvard Library on Internet, Science, and Transformations of knowledge.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Ralph begins by defining e-research as “Research using digital tools and digital data for the distributed and collaborative production of knowledge.” He points to knowledge as the contentious term. “But we’re going to take a crack at why computational methods are such an important part of knowledge.” They’re going to start with theory and then move to cases.

Over the past couple of decades, we’ve moved from talking about supercomputing to the grid to Web 2.0 to clouds and now Big Data, Ralph says. There is continuity, however: it’s all e-research, and to have a theory of how e-research works, you need a few components: 1. Computational manipulability (mathematization) and 2. The social-technical forces that drive that.

Computational manipulability. This is important because mathematics enables consensus and thus collaboration. “High consensus, rapid discovery.”

Research technologies and driving forces. The key to driving knowledge is research technologies, he says. I.e., machines. You also need an organizational component.

Then you need to look at how that plays out in history, physics, astronomy, etc. Not all fields are organized in the same way.

Eric now talks, beginning with a quote from a scholar who says he now has more information then he needs, all without rooting around in libraries. But others complain that we are not asking new enough questions.

He begins with the Large Hadron Collider. It takes lots of people to build it and then to deal with the data it generates. Physics is usually cited as the epitome of e-research. It is the exemplar of how to do big collaboration, he says.

Distributed computation is a way of engaging citizens in science, he says. E.g. Galaxy Zoo, which engages citizens in classifying galaxies. Citizens have also found new types of galaxies (“green peas”), etc. there. Another example: the Genetic Association Information Network is trying to find the cause of bipolarism. It has now grown into a worldwide collaboration. Another: Structure of Populations, Levels of Abundance, and Status of Humpbacks (SPLASH), a project that requires human brains to match humpback tails. By collaboratively working on data from 500 scientists around the Pacific Rim, patterns of migration have emerged, and it was possible to come up with a count of humpbacks (about 15-17K). We may even be able to find out how long humpbacks live. (It’s a least 120 years because a harpoon head was found in one from a company that went out of business that long ago.)

Ralph looks at e-research in Sweden as an example. They have a major initiative under way trying to combine health data with population data. The Swedes have been doing this for a long time. Each Swede has a unique ID; this requires the trust of the population. The social component that engenders this trust is worth exploring, he says. He points to cases where IP rights have had to be negotiated. He also points to the Pynchon Wiki where experts and the crowd annotate Pynchon’s works. Also, Google Books is a source of research data.

Eric: Has Google taken over scholarly research? 70% of scholars use Google and 66% use Google Scholar. But in the humanities, 59% go to the library. 95% consult peers and experts — they ask people they trust. It’s true in the physical sciences too, he says, although the numbers vary some.

Eric says the digital is still considered a bit dirty as a research tool. If you have too many URLS in your footnotes it looks like you didn’t do any real work, or so people fear.

Ralph: Is e-research old wine in new bottles? Underlying all the different sorts of knowledge is mathematization: a shared symbolic language with which you can do things. You have a physical core that consists of computers around which lots of different scholars can gather. That core has changed over time, but all offer types of computational manipulability. The Pynchon Wiki just needs a server. The LHC needs to be distributed globally across sites with huge computing power. The machines at the core are constantly being refined. Different fields use this power differently, and focus their efforts on using those differences to drive their fields forward. This is true in literature and language as well. These research technologies have become so important since they enable researchers to work across domains. They are like passports across fields.

A scholar who uses this tech may gain social traction. But you also get resistance: “What are these guys doing with computing and Shakespeare?”

What can we do with this knowledge about how knowledge is changing? 1. We can inform funding decisions: What’s been happening in different fields, how they affected by social organizations, etc. 2. We need a multidisciplinary way of understanding e-research as a whole. We need more than case studies, Ralph says. We need to be aiming at developing a shared platform for understanding what’s going on. 3. Every time you use these techniques, you are either disintermediating data (e.g., Galaxy Zoo) or intermediating (biomedicine). 4. Given that it’s all digital, we as outsiders have tremendous opportunities to study it. We can analyze it. Which fields are moving where? Where are projects being funded and how are they being organized? You can map science better than ever. One project took a large chunk of academic journals and looked in real time at who is reading what, in what domain.

This lets us understand knowledge better, so we can work together better across departments and around the globe.

Q&A

Q: Sometimes you have to take a humanities approach to knowledge. Maybe you need to use some of the old systems investigations tools. Maybe link Twitter to systems thinking.

A: Good point. But caution: I haven’t seen much research on how the next generation is doing research and is learning. We don’t have the good sociology yet to see what difference that makes. Does it fragment their attention? Or is this a good thing?

Q: It’d be useful to know who borrows what books, etc., but there are restrictions in the US. How about in Great Britain?

A: If anything, it’s more restrictive in the UK. In the UK a library can’t even archive a web site without permission.
A: The example I gave of real time tracking was of articles, not books. Maybe someone will track usage at Google Books.

Q: Can you talk about what happens to the experience of interpreting a text when you have so much computer-generated data?

A: In the best cases, it’s both/and. E.g., you can’t read all the 19th century digitized newspapers, but you can compute against it. But you still need to approach it with a thought process about how to interpret it. You need both sets of skills.
A: If someone comes along and says it’s all statistics, the reply is that no one wants to read pure stats. They want to read stats put into words.

Q: There’s a science reader that lets you keep track of which papers are being read.

A: E.g., Mendeley. But it’s a self-selected group who use these tools.

Q: In the physical sciences, the more info that’s out there, it’s hard to tell what’s important.

A: One way to address it is to think about it as a cycle: as a field gets overwhelmed with info, you get tools to concentrate the information. But if you only look at a small piece of knowledge, what are you losing? In some areas, e.g., areas within physics, everyone knows everyone else and what everyone else is doing. Earth sciences is a much broader community.

[Interesting talk. It’s orthogonal to my own interests in how knowledge is becoming something that “lives” at the network level, and is thus being redefined. It’s interesting to me to see how this look when sliced through at a different angle.]

Follow me

Categories: reviews, too big to know Tagged with: 2b2k • knowledge • science Date: June 7th, 2012 dw

Be the first to comment »

May 28, 2012

[2b2k] Attribution isn’t just about credit. It’s about networking knowledge.

David Kay pointed out to me a piece by Arthur Brisbane, the NY Times Public Editor. In it Arthur deals with a criticism of a NYT article that failed to acknowledge the work of prior journalists and investigators (“uncredited foundational reporting”) that led to the NYT story. For example, Hella Winston at The Jewish Week told Arthur:

The lack of credit stings. “You get so much flak — these are difficult stories,” Ms. Winston told me, “People come down on you.” The Times couldn’t have found all its sources among victims and advocates by itself, she added: “You wouldn’t have known they existed, you wouldn’t have been able to talk to them, if we hadn’t written about them for years.”

But, as David Kay points out, this is not just about giving credit. As the piece says about the Poynter Institute‘s Kelly McBride:

[McBride] struck another theme, echoed by other ethics experts: that providing such credit would have enabled readers to find other sources of information on the subject, especially through online links.

Right. It’s about making the place smarter by leaving traceable tracks of the ideas one’s post brings together. It’s about building networks of knowledge.

Follow me

Categories: journalism, too big to know Tagged with: 2b2k • attribution • journalism • nyt Date: May 28th, 2012 dw

2 Comments »

May 26, 2012

[2b2k][mesh] Setting the record straight: Overall, the networking of knowledge is awesome

Christine Dobby posted at the Financial Post about my session at the Mesh conference on Thursday in Toronto. She accurately captured two ideas, but missed the bigger point I was trying to make, which — given how well she captured the portion of my comments she blogs about — was undoubtedly my fault. Worse, the post gives incredibly short shrift to two powerful and important sessions that morning by Rebecca MacKinnon and Michael Geist about the threats to Internet freedom…way more important (in my view, natch) than what the FP post leads with.

To judge for yourself, you might want to check the live blogging I did of the sessions by Rebecca and Michael. These were great sessions by leaders in their fields, people who are full-time working on keeping the Internet free and open. They are fighting for us and our Internet. (Likewise true of Andy Carvin, of course, who gave an awesome afternoon session.) What they said seems to me clearly to be so much more important than my recapitulation of a decade-old argument that I think is valid but is not even half the story.

On to moi moi moi.

Christine does a nice job summarizing my summary of the echo chamber argument, and I’m pleased that she followed that up with my use of Reddit as an example of how an echo chamber — a group that shares a set of beliefs, values, and norms — can enable a sympathetic yet critical encounter with those who hold radically different views. But, here are the first two paragraphs of Christine’s post about the morning at Mesh:

With the vast sprawl of the web — and in spite of its power to fact check information — stupidity abounds, says David Weinberger.

“One of the bad things we get from networked knowledge is it’s easier than ever to be stupid because you can find other people who can reinforce your beliefs,” the U.S. academic, Internet commentator and author of the recent Too Big to Know told a Toronto audience Thursday.

True, and I did indeed say that. But I don’t want to leave the impression that I’m going around to conferences bashing the Net as a stupidity enabler. In fact, I spent the first half hour at Mesh being interviewed by the inestimable Mathew Ingram about the rise of networked knowledge, about which I am overall quite enthusiastic. The networking of knowledge is enabling knowledge to scale far beyond the limits within which it’s operated since it was born 2,500 years ago. It’s enabling knowledge to shed some of the blinkered limitations that it had embraced as a virtue. Overall, it’s an awesomely good thing, although I did try to point out some of the risks and dangers.

So, it’s weird for me to read in the FP that the take-away is that the Net is creating echo chambers that are making us stupider. Indeed, as my remarks on Reddit were intended to indicate, the echo chamber argument can lead us to underestimate the positive importance of groups sharing views and values: conversation and understanding itself require a huge amount of agreement to be productive. As I wrote not too long ago, culture is an echo chamber.

So, put Christine’s post together with the post you’re currently reading and you’ll get a more accurate representation of what I intended to say and certainly what I believe. Sort of like how networked knowledge works, come to think of it :)

More important, go read what Rebecca, Michael, and Andy had to say. (And I also really liked Michael O’Connor Clarke’s session, but couldn’t live blog it.)

Follow me

Categories: echo chambers, too big to know Tagged with: 2b2k • echo chamber • mesh Date: May 26th, 2012 dw

1 Comment »

May 16, 2012

[2b2k] Peter Galison on The Collective Author

Harvard professor Peter Galison (he’s actually one of only 24 University Professors, a special honor) is opening a conference on author attribution in the digital age.

He points to the vast increase in the number of physicists involved in an experiment, some of which have 3,000 people working on them. This transforms the role of experiments and how physicists relate to one another. “When CERN says in a couple of months that ‘We’ve found the Higgs particle,’ who is the we?”

He says that there has been a “pseudo-I”: A group that functions under the name of a single author. A generation or two ago this was common: The Alvarez Group,” Thorndike Group, ” etc. This is like when the works of a Rembrandt would in fact come from his studio. But there’s also “The Collective Group”: a group that functions without that name — often without even a single lead institution.” This requires “complex internal regulation, governance, collective responsibility, and novel ways of attributing credit.” So, over the past decades physicists have been asked very fundamental questions about how they want to govern. Those 3,000 people have never all met one another; they’re not even in the same country. So, do they stop the accelerator because of the results from one group? Or, when CERN scientists found data suggesting faster than light neutrinos, the team was not unanimous about publishing those results. When the results were reversed, the entire team suffered some reputational damage. “So, the stakes are very high about how these governance, decision-making, and attribution questions get decided.”

He looks back to the 1960s. There were large bubble chambers kept above their boiling point but under pressure. You’d get beautiful images of particles, and these were the iconic images of physics. But these experiments were at a new, industrial scale for physics. After an explosion in 1965, the labs were put under industrial rules and processes. In 1967 Alan Thorndike at Brookhaven responded to these changes in the ethos of being an experimenter. Rarely is the experimenter a single individual, he said. He is a composite. “He might be 3, 5 or 8, possibly as many as 10, 20, or more.” He “may be spread around geographically…He may be epehemral…He is a social phenomenon, varied in form and impossible to define precisely.” But he certainly is not (said Thorndike) a “cloistered scientist working in isolation at his laboratory bench.” The thing that is thinking is a “composite entity.” The tasks are not partitioned in simple ways, the way contractors working on a house partition their tasks. Thorndike is talking about tasks in which “the cognition itself does not occur in one skull.”

By 1983, physicists were colliding beams that moved particles out in all directions. Bigger equipment. More particles. More complexity. Now instead of a dozen or two participants, you have 150 or so. Questions arose about what an author is. In July 1988 one of the Stanford collaborators wrote an internal memo saying that all collaborators ought to be listed as authors alphabetically since “our first priority should be the coherence of the group and the de facto recognition that contributions to a piece of physics are made by all collaborators in different ways.” They decided on a rule that avoided the nightmare of trying to give primacy to some. The memo continues: “For physics papers, all physicist members of the colaboration are authors. In addition, the first published paper should also include the engineers.” [Wolowitz! :)]

In 1990s rules of authorship got more specific. He points to a particular list of seven very specific rules. “It was a big battle.”

In 1997, when you get to projects as large as ATLAS at CERN, the author count goes up to 2,500. This makes it “harder to evaluate the individual contribution when comparing with other fields in science,” according to a report at the time. With experiments of this size, says Peter, the experimenters are the best source of the review of the results.

Conundrums of Authorship: It’s a community and you’re trying to keep it coherent. “You have to keep things from falling apart” along institutional or disciplinary grounds. E.g., the weak neutral current experiment. The collaborators were divided about whether there were such things. They were mockingly accused of proposing “alternating weak neutral currents,” and this cost them reputationally. But, trying to making these experiments speak in one voice can come at a cost. E.g., suppose 1,900 collaborators want to publish, but 600 don’t. If they speak in one voice, that suppresses dissent.

Then there’s also the question of the “identity of physicists while crediting mechanical, cryogenic, electrical engineers, and how to balance with builders and analysts.” E.g., analysts have sometimes claimed credit because they were the first ones to perceive the truth in the data, while others say that the analysts were just dealing with the “icing.”

Peter ends by saying: These questions go down to our understanding of the very nature of science.

Q: What’s the answer?
A: It’s different in different sciences, each of which has its own culture. Some of these cultures are still emerging. It will not be solved once and for all. We should use those cultures to see what part of evaluations are done inside the culture, and which depend on external review. As I said, in many cases the most serious review is done inside where you have access to all the data, the backups, etc. Figuring out how to leverage those sort of reviews could help to provide credit when it’s time to promote people. The question of credit between scientists and engineers/technicians has been debated for hundreds of years. I think we’ve begun to shed some our class anxiety, i.e., the assumption that hand work is not equivalent to head work, etc. A few years ago, some physicists would say that nanotech is engineering, not science; you don’t hear that so much any more. When a Nobel prize in 1983 went to an engineer, it was a harbinger.

Q: Have other scientists learned from the high energy physicists about this?
A: Yes. There are different models. Some big science gets assimilated to a culture that is more like abig engineering process. E.g., there’s no public awareness of the lead designers of the 747 we’ve been flying for 50 years, whereas we know the directors of Hollywood films. Authorship is something we decide. That the 747 has no author but Hunger Games does was not decreed by Heaven. Big plasma physics is treated more like industry, in part because it’s conducted within a secure facility. The astronomers have done many admirable things. I was on a prize committee that give the award to a group because it was a collective activity. Astronomers have been great about distributing data. There’s Galaxy Zoo, and some “zookeepers” have been credited as authors on some papers.

Q: The credits are getting longer on movies as the specializations grow. It’s a similar problem. They tell you how did what in each category. In high energy physics, scientists see becoming too specialized as a bad thing.
A: In the movies many different roles are recognized. And there are questions of distribution of profits, which is not so analogous to physics experiments. Physicists want to think of themselves as physicists, not as sub-specialists. If you are identified as, for example, the person who wrote the Monte Carlo, people may think that you’re “just a coder” and write you off. The first Ph.D. in physics submitted at Harvard was on the Bohr model; the student was told that it was fine but he had to do an experiment because theoretical physics might be great for Europe but not for the US. It’s naive to think that physicists are Da Vinci’s who do everything; the idea of what counts as being a physicist is changing, and that’s a good thing.

[I wanted to ask if (assuming what may not be true) the Internet leads to more of the internal work being done visibly in public, might this change some of the governance since it will be clearer that there is diversity and disagrement within a healthy network of experimenters. Anyway, that was a great talk.]

Follow me

Categories: science, too big to know Tagged with: 2b2k • collaboration • history of science • networked science • peter galison • physics • science Date: May 16th, 2012 dw

3 Comments »

May 13, 2012

[2b2k] The Net as paradigm

Edward Burman recently sent me a very interesting email in response to my article about the 50th anniversary of Thomas Kuhn’s The Structure of Scientific Revolutions. So I bought his 2003 book Shift!: The Unfolding Internet – Hype, Hope and History (hint: If you buy it from Amazon, check the non-Amazon sellers listed there) which arrived while I was away this week. The book is not very long — 50,000 words or so — but it’s dense with ideas. For example, Edward argues in passing that the Net exploits already-existing trends toward globalization, rather than leading the way to it; he even has a couple of pages on Heidegger’s thinking about the nature of communication. It’s a rich book.

Shift! applies The Structure of Scientific Revolutions to the Internet revolution, wondering what the Internet paradigm will be. The chapters that go through the history of failed attempts to understand the Net — the “pre-paradigms” — are fascinating. Much of Edward’s analysis of business’ inability to grasp the Net mirrors cluetrain‘s themes. (In fact, I had the authorial d-bag reaction of wishing he had referenced Cluetrain…until I realized that Edward probably had the same reaction to my later books which mirror ideas in Shift!) The book is strong in its presentation of Kuhn’s ideas, and has a deep sense of our cultural and philosophical history.

All that would be enough to bring me to recommend the book. But Edward admirably jumps in with a prediction about what the Internet paradigm will be:

This…brings us to the new paradigm, which will condition our private and business lives as the twenty-first century evolves. It is a simple paradigm, and may be expressed in synthetic form in three simple words: ubiquitous invisible connectivity. That is to say, when the technologies, software and devices which enable global connectivity in real time become so ubiquitous that we are completely unaware of their presence…We are simply connected.” [p. 170]

It’s unfair to leave it there since the book then elaborates on this idea in very useful ways. For example, he talks about the concept of “e-business” as being a pre-paradigm, and the actual paradigm being “The network itself becomes the company,” which includes an erosion of hierarchy by networks. But because I’ve just written about Kuhn, I found myself particularly interested in the book’s overall argument that Kuhn gives us a way to understand the Internet. Is there an Internet paradigm shift?

The are two ways to take this.

First, is there a paradigm by which we will come to understand the Internet? Edward argues yes, we are rapidly settling into the paradigmatic understanding of the Net. In fact, he guesses that “the present revolution [will] be completed and the new paradigm of being [will] be in force” in “roughly five to eight years” [p. 175]. He sagely points to three main areas where he thinks there will be sufficient development to enable the new paradigm to take root: the rise of the mobile Internet, the development of productivity tools that “facilitate improvements in the supply chain” and marketing, and “the increased deployment of what have been termed social applications, involving education and the political sphere of national and local government.” [pp. 175-176] Not bad for 2003!

But I’d point to two ways, important to his argument, in which things have not turned out as Edward thought. First, the 5-8 years after the book came out were marked by a continuing series of disruptive Internet developments, including general purpose social networks, Wikipedia, e-books, crowdsourcing, YouTube, open access, open courseware, Khan Academy, etc. etc. I hope it’s obvious that I’m not criticizing Edward for not being prescient enough. The book is pretty much as smart as you can get about these things. My point is that the disruptions just keep coming. The Net is not yet settling down. So we have to ask: Is the Net going to enable continuous disruption and self-transformation? If so will it be captured by a paradigm? (Or, as M. Knight Shyamalan might put it, is disruption the paradigm?)

Second, after listing the three areas of development over the next 5-8 years, the book makes a claim central to the basic formulation of the new paradigm Edward sees emerging: “And, vitally, for thorough implementation [of the paradigm] the three strands must be invisible to the user: ubiquitous and invisible connectivity.” [p. 176] If the invisibility of the paradigm is required for its acceptance, then we are no closer to that event, for the Internet remains perhaps the single most evident aspect of our culture. No other cultural object is mentioned as many times in a single day’s newspaper. The Internet, and the three components the book point to, are more evident to us than ever. (The exception might be innovations in logistics and supply chain management; I’d say Internet marketing remains highly conspicuous.) We’ve never had a technology that so enabled innovation and creativity, but there may well come a time when we stop focusing so much cultural attention on the Internet. We are not close yet.

Even then, we may not end up with a single paradigm of the Internet. It’s really not clear to me that the attendees at ROFLcon have the same Net paradigm as less Internet-besotted youths. Maybe over time we will all settle into a single Internet paradigm, but maybe we won’t. And we might not because the forces that bring about Kuhnian paradigms are not at play when it comes to the Internet. Kuhnian paradigms triumph because disciplines come to us through institutions that accept some practices and ideas as good science; through textbooks that codify those ideas and practices; and through communities of professionals who train and certify the new scientists. The Net lacks all of that. Our understanding of the Net may thus be as diverse as our cultures and sub-cultures, rather than being as uniform and enforced as, say, genetics’ understanding of DNA is.

Second, is the Internet affecting what we might call the general paradigm of our age? Personally, I think the answer is yes, but I wouldn’t use Kuhn to explain this. I think what’s happening — and Edward agrees — is that we are reinterpreting our world through the lens of the Internet. We did this when clocks were invented and the world started to look like a mechanical clockwork. We did this when steam engines made society and then human motivation look like the action of pressures, governors, and ventings. We did this when telegraphs and then telephones made communication look like the encoding of messages passed through a medium. We understand our world through our technologies. I find (for example) Lewis Mumford more helpful here than Kuhn.

Now, it is certainly the case that reinterpreting our world in light of the Net requires us to interpret the Net in the first place. But I’m not convinced we need a Kuhnian paradigm for this. We just need a set of properties we think are central, and I think Edward and I agree that these properties include the abundant and loose connections, the lack of centralized control, the global reach, the ability of everyone (just about) to contribute, the messiness, the scale. That’s why you don’t have to agree about what constitutes a Kuhnian paradigm to find Shift! fascinating, for it helps illuminate the key question: How are the properties of the Internet becoming the properties we see in — or notice as missing from — the world outside the Internet?

Good book.

Follow me

Categories: infohistory, reviews, too big to know Tagged with: 2b2k • edward burman • kuhn • net • paradigms • reviews • science Date: May 13th, 2012 dw

3 Comments »

« Previous Page | Next Page »