logo
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

March 22, 2011

[doep] Daily (intermittent) Open-Ended Puzzle: Why do moths fly the way they do?

My understanding (possibly bogus) is that moths spiral into flames because evolution has designed them to fly in straight lines by noting celestial lights. When the light is nearby, keeping its position fixed in their visual space causes them to spiral inward toward it.

Fine. But why is it an evolutionary advantage for moths to fly in a straight line? Where are they trying to get to so quickly? And isn’t there a metaphor for MBAs somewhere in here?

Tweet
Follow me

Categories: puzzles, science Tagged with: evolution • moths • puzzle Date: March 22nd, 2011 dw

1 Comment »

March 4, 2011

[2b2k] Tagging big data

According to an article in Science Insider by Dennis Normile, a group formed at a symposium sponsored by the Board on Global Science and Technology, of the National Research Council, an arm of the U.S. National Academies [that’s all they’ve got??] is proposing making it easier to find big scientific data sets by using a standard tag, along with a standard way of conveying the basic info about the nature of the set, and its terms of use. “The group hopes to come up with a protocol within a year that researchers creating large data sets will voluntarily adopt. The group may also seek the endorsement of the Internet Engineering Task Force…”

Tweet
Follow me

Categories: everythingIsMiscellaneous, science, too big to know Tagged with: 2b2k • big data • everythingIsMiscellaneous • science • standards Date: March 4th, 2011 dw

2 Comments »

February 2, 2011

Open access welcomes Nature

A couple of weeks ago, when Nature magazine announced it was starting a peer-reviewed open access journal, PLoS One (a peer reviewed open access journal) welcomed them the way Apple welcomed IBM into the personal computing market:

On January 6, 2011, Nature announced a new Open Access (OA) publication called Scientific Reports. Nature’s news underscores the growing acceptance of OA, as reflected in recent OA journal launches from other traditional publishers such as the BMJ, Sage,  AIP (American Institute of Physics) and APS (American Physical Society). Please spread the word either via this blog post or download this PDF.


Inspired by Apple
.

The Nature entry into the open access field is a big deal. So is Nature’s support of Creative Commons. I’ve had a chance to spend some little time with folks at Nature, and know them to be passionate about making the work of science more accessible. So, this is good news all around.

Tweet
Follow me

Categories: open access, science, too big to know Tagged with: 2b2k • nature • npg • open access • plos • science Date: February 2nd, 2011 dw

2 Comments »

January 21, 2011

Open access continues to catch on

Science Magazine reports on a study sponsored by the EU that found that 89% of the 50,000 researchers surveyed think open access is good for their field. On the other hand, the reporter, Gretchen Vogel, points out that while 53% said they had published at least one open access article, only 10% of papers are published in open access journals. What’s holding them back from doing more open access publishing? About 40% said it was because there wasn’t enough funding to cover the publication fees, and 30% said there weren’t high-quality open access journals in their field.

The data and analysis is supposed to become available this week at The SOAP Project. Unfortunately, the Science Magazine article covering the report is only available to members of the AAAS or to those willing to pay $15 for 24 hours of access. (Hat tip to Andrew “Yes he is my brother” Weinberger.)

Tweet
Follow me

Categories: open access, science Tagged with: cheap irony • open access • science Date: January 21st, 2011 dw

1 Comment »

January 5, 2011

[2b2k] Amateur astronomers, and science as a network

I met today with Aaron Price, who’s with the American Association of Variable Star Observers, a group celebrating a hundred years of gathering data from amateurs and professionals about variable stars. AAVSO has an archive of over 19 million variable star observations. Aaron is particularly interested in enabling and encouraging amateurs to become increasingly involved in the scientific process, ultimately collaboratively writing publishable articles. (I’m putting this my way, not his, so don’t blame him for my infelicities.)

We talked a bit about who should be called a scientist. My own view is that if you have this discussion without any context, then you look to paradigmatic scientists — works in a lab (perhaps), designs and runs experiments, formulates hypotheses, has academic credentials, wears a lab coat. In such cases, when there is no actual need driving the question, arguments about edge cases can’t be resolved. On the other hand, if something hangs on the question — does the person get funding, get invited to address a conference, is allowed access to equipment, get to claim a particular standing in an argument, etc. — then the question is more likely to be settle-able. For that reason, most discussions about whether citizen scientists are scientists (or, are “citizen journalists” journalists, etc.) should be addressed (in my opinion) first by asking, “Why do you ask?”

This seems to me to be an illustration of the way everything (well, almost) is becoming a network. In the old days, when science was a lot like publishing, the line between scientist and layperson was fairly well (but certainly imperfectly) drawn. In a networked world, it’s not simply a matter of redrawing lines, so that now citizen scientists are inside the Circle of Science. Rather, the nature of the lines is different. All members of a network are connected. The question is the nature of the connection, and that can change instantly based on interests, skills, credentials, and the project underway. The old lines disconnected; the new ones connect. And that makes it far more difficult to come up with persistent answers to questions like “Who is a scientist?” or “Who is a journalist?”

Often, in a networked world, it’s better not to insist on an answer. More important than deciding exactly who is inside the charmed circle is figuring out how to make the network smarter — which almost always means extending the network as far as it can possibly go.

Tweet
Follow me

Categories: science, too big to know Tagged with: 2b2k • amateurs • citizen journalists • citizen scientists • science Date: January 5th, 2011 dw

8 Comments »

December 28, 2010

[2b2k] Citizen scientists

Alex Wright has an excellent article in the New York Times today about the great work being done by citizen scientists. (Alex follows up in his blog with some more worthy citizen science efforts.)

Alex, who I met a few years ago at a conference because we had written books on similar topics — his excellent Glut and my Everything Is Miscellaneous — quotes me a couple of times in the article. The first time, I say that the people who are gathering data and classifying images “are not doing the work of scientists.” Some in the comments have understandably taken issue with that characterization. It’s something I deal with at some length in Too Big to Know. Because of the curtness of the comment, it could easily be taken as dismissive, which was not my intent; these volunteers are making a real contribution, as Alex’s article documents. But, in many of the projects Alex discusses (and that I discuss in my manuscript), the volunteers are doing work for which they need no scientific training. They are doing the work of science — gathering data certainly counts — but not the work of scientists. But that’s what makes it such an exciting time: You don’t need a degree or even training beyond the instructions on a Web page, and you can be part of a collective effort that advances science. (Commenter kc I think makes a good argument against my position on this.)

FWIW, the origins of my participation in the article were a discussion with Alex about why in this age of the amateur it’s so hard to find the sort of serious leap in scientific thinking coming from amateurs. Amateurs drove science more in the 19th century than now. Of course, that’s not an apple to apples comparison because of the professionalization of science in the 20th century. Also, so much of basic science now requires access to equipment far too expensive for amateurs. (Although that’s scarily not the case for gene sequencers.)

Tweet
Follow me

Categories: science, too big to know Tagged with: 2b2k • citizen science • everythingIsMiscellaneous • science Date: December 28th, 2010 dw

2 Comments »

December 17, 2010

Podcast with Kevin Kelly that doesn’t worry about whether technology can want anything

I really enjoyed interviewing Kevin Kelly for this Radio Berkman podcast. (Well, who wouldn’t!) Kevin’s book, What Technology Wants, is quite remarkable. Kevin is attempting to reframe our way of understanding life, the universe, and all its little details.

I was especially proud that we made it through without talking about whether technology can really be said to want anything. That’s not what really is at stake in the book.

Tweet
Follow me

Categories: culture, science, too big to know Tagged with: berkman • kevin kelly • podcast Date: December 17th, 2010 dw

2 Comments »

November 30, 2010

[bigdata] Ensuring Future Access to History

Brewster Kahle, Victoria Stodden, and Richard Cox are on a panel, chaired by the National Archive’s Director of Litigation Jason Baron. The conference is being put on by Princeton’s Center Internet for Technology Policy.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Brewster goes first. He’s going to talk about “public policy in the age of digital reproduction.” “We are in a jam,” he says, because of how we have viewed our world as our tech has change. Brewster founded the Internet Archive, a non-profit library. The aim is to make freely accessible everything ever published, from the Sumerian texts on. “Everyone everywhere ought to have access to it” — that’s a challenge worthy of our generation, he says.

He says the time is ripe for this. The Internet is becoming ubiquitous. If there aren’t laptops, there are Internet cafes. And there a mobiles. Plus, storage is getting cheaper and smaller. You can record “100 channel years” of HD TV in a petabyte for about $200,000, and store it in a small cabinet. For about $1,200, you could store all of the text in the Library of Congress. Google’s copy of the WWW is about a petabyte. The WayBack machine uses 3 petabytes, and has about 150 billion pages. It’s used by 1.5M/day. A small organization, like the Internet Archive, can take this task on.

This archive is dynamic, he says. The average Web page has 15 links. The average Web page changes every 100 days.

There are downsides to the archive. E.g., the WayBack Machine gets used to enable lawsuits. We don’t want people to pull out of the public sphere. “Get archived, go to jail,” is not a useful headline. Brewster says that they once got an FBI letter asking for info, which they successfully fought (via the EFF). The Archive gets lots of lawyer letters. They get about 50 requests per week to have material taken out of the Archive. Rarely do people ask for other people’s stuff to be taken down. Once, the Scientologists wanted some copyright-infringing material taken down from someone else’s archived site; the Archive finally agreed to this. The Archive held a conference and came up with Oakland Archive Policy for issues such as these.

Brewster points out that John Postel’s taxonomy is sticking: .com, .org, .gov, .edu, .mil … Perhaps we need separate policies for each of these, he says. And how do we take policy ideas and make them effective? E.g., if you put up a robots.txt exclusion, you will nevertheless get spidered by lots of people.

“We can build the Library of Alexandria,” he concludes, “but it might be problematic.”

Q: I’ve heard people say they don’t need to archive their sites because you will.
A: Please archive your own. More copies make us safe.

Q: What do you think about the Right to Oblivion movement that says that some types of content we want to self-destruct on some schedule, e.g. Facebook.
A: I have no idea. It’s really tough. Personal info is so damn useful. I wish we could keep our computers from being used against us in court; if we defined the 5th amendment so that who “we” are included our computers…


Richard Cox says if you gold, you know about info overload. It used to be that you had one choice of golf ball, Top-Flite. Now they have twenty varieties.

Archives are full of stories waiting to be told, he says. “When I think about Big Data…most archivists would think we’re talking about being science, corporate world, and government.” Most archivists work in small cultural, public institutions. Richard is going to talk about the shifting role of archivists.

As early as the 1940s, archivists were talking about machine-readable records. The debates and experiments have been going on for many decades. One early approach was to declare that electronic records were not archives, because the archives couldn’t deal with them. (Archivists and records managers have always been at odds, he says, because RM is about retention schedules, i.e., deleting records.) Over time, archivists came up to speed. By 2000, some were dealing with electronic records. In 2010, many do, but many do not. There is a continuing debate. Archivists have spent too long debating among themselves when they need to be talking with others. But, “archivists tend not to be outgoing folks.” (Archivists have had issues with the National Archives because their methods don’t “scale down.”)

There are many projects these days. E.g., we now have citizen archivists who maintain their own archives and who may contribute to public archives. Who are today’s archivists? Archival educators are redefining the role. Richard believes archives will continue, but the profession may not. He recommends reading the Clair report [I couldn’t get the name or the spelling, and can’t find it on Google :( ] on audio-visual archives. “I read it and I wept.” It says that we need people who understand the analog systems so that they can be preserved, but there’s no funding.


Victoria Stodden’s talk gloomy title is “The Coming Dark Ages in Scientific Knowledge.”

She begins by pointing to the pervasive use of computers and computational methods in the sciences, and even in the humanities and law schools. E.g., Northwestern is looking at the word counts in Shakespearean works. It’s changing the type of scientific analysis we’re doing. We can do very complicated simulations that give us a new way of understanding our world. E.g., we do simulations of math proofs, quite different from the traditional deductive processes.

This means what we’re doing as scientists is being stored in script, codes, data, etc. But science only is science when it’s communicated. If the data and scripts are not shared, the results are not reproducible. We need to act as scientists to make sure that this data etc. are shared. How do we communicate results based on enormous data sets? We have to give access to those data sets. And what happens when those data sets change (corrected or updated)? What happens to results based on the earlier sets? We need to preserve the prior versions of the data. How do we version it? How do we share it? How do we share it? E.g., There’s an experiment at NSF: All proposals have to include a data management plan. The funders and journals have a strong role to play here.

Sharing scientific knowledge is harder than it sounds, but is vital. E.g., a recent study showed that a cancer therapy will be particular effective based on individual genomes. But, it was extremely hard to trace back the data and code used to get this answer. Victoria notes that peer reviewers do not check the data and algorithms.

Why a dark age? Because “without reproducibility, knowledge cannot be recreated or understood.” we need ways and processes of sharing. Without this, we only have scientists making proclamations.

She gives some recommendations: (1) Assessment of the expense of data/code archiving. (2) Enforcement of funding agency guidelines. (3) Publication requirements. (4) Standards for scientific tools. (5) Versioning as a scientific principal. (6) Licensing to realign scientific intellectual property with longstanding scientific norms (Reproducible Research Standard). [verbatim from her slide] Victoria stresses the need to get past the hurdles copyright puts in the way.

Q: Are you a pessimist?
A: I’m an optimist. The scientific community is aware of these issues and is addressing them.

Q: Do we need an IRS for the peer review process?
A: Even just the possibility that someone could look at your code and data is enough to make scientists very aware of what they’re doing. I don’t advocate code checking as part of peer review because it takes too long. Instead, throw your paper out into the public while it’s still being reviewed and let other scientists have at it.

Q: [rick] Every age has lost more info than it has preserved. This is not a new problem. Every archivist from the beginning of time has had to cope with this.


Jason Baron of the National Archives (who is not speaking officially) points to the volume of data the National Archives (NARA) has to deal with. E.g., in 2001 32 million emails were transferred to NARA; in 2009, 250+ million archives were. He predicts there will be a billion presidential emails by 2017 held at NARA. The first lawsuit over email was filed in 1989 (email=PROFS). Right now, the official policy of 300 govt agencies is to print email out for archiving. We can no longer deal with the info flow with manual processes. Processing of printed pages occurs when there’s a lawsuit or a a FOIA request. Jason is pushing on the value of search as a way of encouraging systematic intake of digital records. He dreams of search algorithms that retrieve all relevant materials. There are clustering algorithms emerging within law that hold hope. He also wants to retrieve docs other than via key words. Visual analytics can help.

There are three languages we need: Legal, Records Management, and IT. How do we make the old ways work in the new? We need both new filtering techniques, but also traditional notions of appraisal. “The neutral archivist may serve as an unbiased resource for the filtering of information in an increasingly partisan (untrustworthy) world” [from the slide].

Tweet
Follow me

Categories: libraries, science, too big to know Tagged with: 2b2k • archives • bigdata • libraries • science Date: November 30th, 2010 dw

3 Comments »

« Previous Page


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
TL;DR: Share this post freely, but attribute it to me (name (David Weinberger) and link to it), and don't use it commercially without my permission.

Joho the Blog uses WordPress blogging software.
Thank you, WordPress!