March 24, 2018
Sixteen speeches
In case you missed any of today’s speeches at The March for Our Lives, here’s a page that has sixteen of them
I, on the other hand, am speechless.
March 24, 2018
In case you missed any of today’s speeches at The March for Our Lives, here’s a page that has sixteen of them
I, on the other hand, am speechless.
March 19, 2018
Kate Zwaard (twitter: @kzwa) Chief of National Digital Strategies at the Library of Congress and leader of the LC Lab, is opening MIT Libraries’ Grand Challenge Summit..The next 1.5 days will be about the grand challenges in enabling scholarly discovery.
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people. |
For context she tells us that the LC is the largest library in the world, with 164M items. It has the world’s largest collection of film, maps, comic books, telephone directories, and more. [Too many for me to keep this post up with.]
You can wolk for two football fields just in the maps section. The world’s largest collection of recorded sound. The largest collection
Personal papers from Ben Franklin, Rosa Parks, Groucho Marx, Claude Shannon, and so many more.
Last year they circulated almost a million physical items.
Every week 11,000 tangible items come in through the Copyright office.
Last year, they digitized 4.7M iems, as well 730M documents crawled from the Web, plus much more. File count: 243M and growing every day.
These serve just one of the LC’s goal: “Acquire, preserve, and provide access to a universal collection of knowledge and the record of America’s creativity.” Not to mention serving Congress, and much more. [I can only keep up with a little of this. Kate’s a fantastic presenter and is not speaking too quickly. The LC is just too big!]
Kate thinks of the LC’s work as an exothermic reaction that needs an activation energy or catalyst. She leads the LC Labs, which started a year ago as a place of experimentation. The LC is a delicate machine, which makes it hard for it to change. The Labs enable experimentation. “Trying things that are easy and cheap is the only way forward.”
When thinking about what to do next, she things about what’s feasible and the impact. One way of having impact: demonstrating that the collection has unexplored potentials for research. She’s especially interested in how the Labs can help deal with the problem of scale at the LC.
She talks about some of Lab’s projects.
If you wanted to make stuff with LC data, there was no way of doing that. Now there’s LC for Robots, added documentation, and Jupyter Notebooks: an open source Web app that let you create open docs that contain code, running text, etc. It lets people play with the API without doing all the work from scratch.
But it’s not enough to throw some resources onto a Web page. The NEH data challenge asked people to create new things using the info about 12M newspapers in the collection. Now the Lab has the Congressional Data Challenge: do something with with Congressional data.
Labs has an Innovator in Residence project. The initial applicants came from LC to give it a try. One of them created a “Beyond Words” crowdsourcing project that asks them to add data to resources
Kate likes helping people find collections they otherwise would have missed. For ten years LC has collaborated wi the Flickr Commons. But they wanted to crowdsource a transcription project for any image of text. A repo will be going up on GitHub shortly for this.
In the second year of the Innovator in Residence, they got the artist Jer Thorp [Twitter: @blprnt] to come for 6 months. Kate talks about his work with the papers of Edward Lorenz, who coined the phrase “The Butterfly Effect.” Jer animated Lorenz’s attractor, which, he points out, looks a bit like a butterfly. Jer’s used the attractor on a collection of 3M words. It results in “something like a poem.” (Here’s Jer’s Artist in the Archive podcast about his residency.)
Jer wonders how we can put serendipity back into the LC and into the Web. “How do we enable our users to be carried off by curiousity not by a particular destination.” The LC is a closed stack library, but it can help guide digital wanderers. ”
Last year the LC released 25M catalog records. Jer did a project that randomly pulls the first names of 20 authors in any particular need. It demonstrates, among other things, the changing demographics of authors. Another project: “Birthy Deathy” that displays birthplace info. Antother looks for polymaths.
In 2018 the Lab will have their first open call for an Innovator in Residence. They’ll be looking for data journalists.
Kate talks about Laura Wrubel
‘s work with the Lab. “Library of Congress Colors” displays a graphic of the dominant colors in a collection.
Or Laura’s Photo Roulette: you guess the date of a photo.
Kate says she likes to think that libraries not just “book holes.” One project: find links among items in the archives. But the WARC format is not amenable to that.
The Lab is partnering with lots of great grops, including JSONstor and WikiData.
They’re working on using machine learning to identify place names in their photos.
What does this have to do with scale, she asks, nothng that the LC has done pretty well with scale. E.g., for the past seven years, the size of their digital collection has doubled every 32 months.
The Library also thinks about how to become a place of warmth and welcome. (She gives a shout out to MIT Libraries’ Future of Libraries
report). Right now, visitors and scholars go to different parts of the building. Visitors to the building see a monument to knowledge, but not a living, breathing place. “The Library is for you. It is a place you own. It is a home.”
She reads from a story by Ann Lamott.
How friendship relates to scale. “Everything good that has happened in my life has happened because of friendship.” The average length of employment of a current employee is thirty years. — that’s not the average retirement year. “It’s not just for the LC but for our field.” Good advice she got: “Pick your career by the kind of people you like to be around.” Librarians!
“We’ve got a tough road ahead of us. We’re still in the early days of the disruption that computation is going to bring to our profession.” “Friendship is what will get us through these hard times. We need to invite peopld into the tent.” “Everything we’ve accomplished has been through the generosity of our friends and colleagues.” This 100% true of the Labs. It’s ust 4 people, but everything they do is done in collaboration.
She concludes (paraphrasing badly): I don’t believe in geniuses, and i don’t believe in paradigm shirts. I believe in friendship and working together over the long term. [She put this far better.]
Q&A
Q: How does the Lab decide on projects?
A: Collaboratively
Q: I’m an archivist at MIT. The works are in closed stack, which can mislead people about the scale. How do we explain the scale in an interesting way.
A: Funding is difficult because so much of the money that comes is to maintain and grow the collection and services. It can be a challenge to carve out funding for experimentation and innovation. We’ve been working hard on finding ways to help people wrap their heads around the place.
Q: Data science students are eager to engage, e.g., as interns. How can academic institutions help to make that happen?
A: We’re very interested in what sorts of partnerships we can create to bring students in. The data is so rich, and the place is so interesting.
Q: Moving from models that think about data as packages as opposed to unpacking and integrating. What do you think about the FAIR principle: making things Findable, Accesible Interoperable, and Reusable? Also, we need to bring in professionals thinking about knowledge much more broadly.
I’m very interested in Hathi Trust‘s data capsules. Are there ways we can allow people to search through audio files that are not going to age into the commons until we’re gone? You’re right: the model of chunks coming in and out is not going to work for us.
Q: In academia, our focus has been to provide resources efficiently. How can weave in serendipity without hurting the efficiency?
A: That’s hard. Maybe we should just serve the person who has a specific purpose. You could give ancillary answers. And crowdsourcing could make a lot more available.
[Great talk.]
March 14, 2018
The Chartered Institute of Public Relations today gave the Cluetrain Manifesto its Presidents Medal. The announcement is here.
This is a huge honor and a big deal. CIPR is the largest professional association for PR folks in Europe.
Former recipients include — get ready for this —
Sir Tim Berners Lee
Archbishop Desmond Tutu
Princess Anne
Prince Philip
Yup, those are now my peeps.
Background: The Cluetrain site went up in 1999 — yes, almost 20 yrs ago — and we turned it into a book in 2000. (The “we” is Doc Searls, Rick Levine, Christopher Locke, and me.) It was an attempt to explain to media and businesses why people like us were so enthusiastic about this new Web thing: it is a place where we get to talk about what mattered to us and to do so in our own voice. That is, it’s a social space, which, surprisingly, was news to much of the media and many businesses. The best-known line from it is Doc’s: “Markets are conversations.”
For the occasion, they asked me to video a talk which is here and is 45 mins long. On the other hand, they wrote up an extensive summary, which should save you north of 42 mins. ( Why me? Pretty random: I was the Cluetrain point person for this.)
Cluetrain got important things wrong, but it also got important things right. CIPR has honored Cluetrain, I believe, as a way of honoring what is right and good about the Web. Still.
February 18, 2018
despite having our hearts ripped out of our chests. Despite losing our friends and coaches. Despite living through a nightmare. As students of Douglas, we are the voice of this generation. And I’ll be damned if anyone thinks they can silence us.
— kyra (@longlivekcx) February 18, 2018
My generation was mobilized politically by the threat of being sent to kill and die in Vietnam.
The new generation is being mobilized by the threat of being killed in their classrooms.
It would of course be foolish to assume that the political path of the new generation will follow that of the 1960s generation. There are so many differences. Here are two that seem to me to matter:
First, the draft was an institutionalized, bureaucratic mechanism that every male faced, by law, on his eighteenth birthday. A choice was forced on each young man. But school shootings are random, unpredictable.
Second, because the draft and the war it served were caused by the government, we knew whom to protest against and what had to be done. The way to end mass murders in schools isn’t as conveniently obvious. Yet there are some steps that a high school movement can and will focus on, beginning with making it harder to get a gun than to hack your parents’ Netflix account.
But those differences will not matter if this movement is indeed an expression of the outrage the high school generation feels. They are facing so much that I can’t even begin to list the issues — not that I need to since they are the issues++ that my generation faced, addressed, and in some cases made worse. Our children’s fear of being murdered in their schools is, horrifyingly, simply the identifiable face of the unfair world we are leaving them.
Hearing these young people speak out even before they have buried their friends brings me the saddest hope imaginable. At such an age to stand so strong together…they are fierce and beautiful and I will laugh and cry with joy as they change the world.
Of course I stand with them. Or, more exactly, I stand a respectful and supportive distance behind them. And not just on March 24:
http://act.everytown.org/sign/march-for-our-lives/
Here’s the speech from Marjory Stoneman Douglas High School student Emma Gonzalez at an anti-gun rally happening today in Fort Lauderdale https://t.co/CyfMnPDAvW // https://t.co/hgewZy4Cxf https://t.co/gssAmGczuH
— Joshua Chavers (@JoshuaChavers) February 17, 2018
February 15, 2018
An earlier draft of Descartes’ Meditations has been discovered, which will inevitably lead to a new round of unfunny jokes under the rubric of “Descartes’ First Draft.” I can’t wait :(
The draft is a big discovery. Camilla Shumaker at Research Frontiers reports that Jeremy Hyman, a philosophy instructor at the University of Arkansas, came across a reference to the manuscript and hied off to a municipal library in Toulouse … a gamble, but he apparently felt he had nothing left Toulouse.
And so it begins…
February 11, 2018
Patrick Sharkey [twitter: patrick_sharkey] uses a Twitter thread to evaluate the evidence about a possible relationship between exposure to lead and crime. The thread is a bit hard to get unspooled correctly, but it’s worth it as an example of:
1. Thinking carefully about complex evidence and data.
2. How Twitter affects the reasoning and its expression.
3. The complexity of data, which will only get worse (= better) as machine learning can scale up their size and complexity.
Note: I lack the skills and knowledge to evaluate Patrick’s reasoning. And, hat tip to David Lazer for the retweet of the thread.
Robert Epstein argues in Aeon against the dominant assumption that the brain is a computer, that it processes information, stores and retrieves memories, etc. That we assume so comes from what I think of as the informationalizing of everything.
The strongest part of his argument is that computers operate on symbolic information, but brains do not. There is no evidence (that I know of, but I’m no expert. On anything) that the brain decomposes visual images into pixels and those pixels into on-offs in a code that represents colors.
In the second half, Epstein tries to prove that the brain isn’t a computer through some simple experiments, such as drawing a dollar bill from memory and while looking at it. Someone committed to the idea that the brain is a computer would probably just conclude that the brain just isn’t a very good computer. But judge for yourself. There’s more to it than I’m presenting here.
Back to Epstein’s first point…
It is of the essence of information that it is independent of its medium: you can encode it into voltage levels of transistors, magnetized dust on tape, or holes in punch cards, and it’s the same information. Therefore, a representation of a brain’s states in another medium should also be conscious. Epstein doesn’t make the following argument, but I will (and I believe I am cribbing it from someone else but I don’t remember who).
Because information is independent of its medium, we could encode it in dust particles swirling clockwise or counter-clockwise; clockwise is an on, and counter is an off. In fact, imagine there’s a dust cloud somewhere in the universe that has 86 billion motes, the number of neurons in the human brain. Imagine the direction of those motes exactly matches the on-offs of your neurons when you first spied the love of your life across the room. Imagine those spins shift but happen to match how your neural states shifted over the next ten seconds of your life. That dust cloud is thus perfectly representing the informational state of your brain as you fell in love. It is therefore experiencing your feelings and thinking your thoughts.
That by itself is absurd. But perhaps you say it is just hard to imagine. Ok, then let’s change it. Same dust cloud. Same spins. But this time we say that clockwise is an off, and the other is an on. Now that dust cloud no longer represents your brain states. It therefore is both experiencing your thoughts and feeling and is not experiencing them at the same time. Aristotle would tell us that that is logically impossible: a thing cannot simultaneously be something and its opposite.
Anyway…
Toward the end of the article, Epstein gets to a crucial point that I was very glad to see him bring up: Thinking is not a brain activity, but the activity of a body engaged in the world. (He cites Anthony Chemero’s Radical Embodied Cognitive Science (2009) which I have not read. I’d trace it back further to Andy Clark, David Chalmers, Eleanor Rosch, Heidegger…). Reducing it to a brain function, and further stripping the brain of its materiality to focus on its “processing” of “information” is reductive without being clarifying.
I came into this debate many years ago already made skeptical of the most recent claims about the causes of consciousness by having some awareness of the series of failed metaphors we have used over the past couple of thousands of years. Epstein puts this well, citing another book I have not read (and another book I’ve consequently just ordered):
In his book In Our Own Image (2015), the artificial intelligence expert George Zarkadakis describes six different metaphors people have employed over the past 2,000 years to try to explain human intelligence.
In the earliest one, eventually preserved in the Bible, humans were formed from clay or dirt, which an intelligent god then infused with its spirit. That spirit ‘explained’ our intelligence – grammatically, at least.
The invention of hydraulic engineering in the 3rd century BCE led to the popularity of a hydraulic model of human intelligence, the idea that the flow of different fluids in the body – the ‘humours’ – accounted for both our physical and mental functioning. The hydraulic metaphor persisted for more than 1,600 years, handicapping medical practice all the while.
By the 1500s, automata powered by springs and gears had been devised, eventually inspiring leading thinkers such as René Descartes to assert that humans are complex machines. In the 1600s, the British philosopher Thomas Hobbes suggested that thinking arose from small mechanical motions in the brain. By the 1700s, discoveries about electricity and chemistry led to new theories of human intelligence – again, largely metaphorical in nature. In the mid-1800s, inspired by recent advances in communications, the German physicist Hermann von Helmholtz compared the brain to a telegraph.
Maybe this time our tech-based metaphor has happened to get it right. But history says we should assume not. We should be very alert to the disanologies, which Epstein helps us with.
Getting this right, or at least not getting it wrong, matters. The most pressing problem with the informationalizing of thought is not that it applies a metaphor, or even that the metaphor is inapt. Rather it’s that this metaphor leads us to a seriously diminished understanding of what it means to be a living, caring creature.
I think.
Hat tip to @JenniferSertl for pointing out the Aeon article.
February 1, 2018
A new research paper, published Jan. 24 with 34 co-authors and not peer-reviewed, claims better accuracy than existing software at predicting outcomes like whether a patient will die in the hospital, be discharged and readmitted, and their final diagnosis. To conduct the study, Google obtained de-identified data of 216,221 adults, with more than 46 billion data points between them. The data span 11 combined years at two hospitals,
That’s from an article in Quartz by Dave Gershgorn (Jan. 27, 2018), based on the original article by Google researchers posted at Arxiv.org.
…Google claims vast improvements over traditional models used today for predicting medical outcomes. Its biggest claim is the ability to predict patient deaths 24-48 hours before current methods, which could allow time for doctors to administer life-saving procedures.
Dave points to one of the biggest obstacles to this sort of computing: the data are in such different formats, from hand-written notes to the various form-based data that’s collected. It’s all about the magic of interoperability … and the frustration when data (and services and ideas and language) can’t easily work together. Then there’s what Paul Edwards, in his great book A Vast Machine calls “data friction”: “…the costs in time, energy, and attention required simply to collect, check, store, move, receive, and access data.” (p. 84)
On the other hand, machine learning can sometimes get past the incompatible expression of data in a way that’s so brutal that it’s elegant. One of the earlier breakthroughs in machine learning came in the 1990s when IBM analyzed the English and French versions of Hansard, the bi-lingual transcripts of the Canadian Parliament. Without the machines knowing the first thing about either language, the system produced more accurate results than software that was fed rules of grammar, bilingual dictionaries, etc.
Indeed, the abstract of the Google paper says “Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. ” It continues: “We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.”
The paper also says that their approach affords clinicians “some transparency into the predictions.” Some transparency is definitely better than none. But, as I’ve argued elsewhere, in many instances there may be tools other than transparency that can give us some assurance that AI’s outcomes accord with our aims and our principles of fairness.
I found this article by clicking on Dave Gershgon’s byline on a brief article about the Wired version of the paper of mine I referenced in the previous paragraph. He does a great job explaining it. And, believe me, it’s hard to get a writer — well, me, anyway — to acknowledge that without having to insert even one caveat. Thanks, Dave!
January 11, 2018
I’ve long wondered — like for a couple of decades — when software developers who write algorithms that produce beautiful animations of water will be treated with the respect accorded to painters who create beautiful paintings of water. Both require the creators to observe carefully, choose what they want to express, and apply their skills to realizing their vision. When it comes to artistic vision or merit, are there any serious differences?
In the January issue of PC Gamer , Philippa Warr [twitter: philippawarr] — recently snagged
from Rock, Paper, Shotgun points to v r 3 a museum of water animations put together by Pippin Barr. (It’s conceivable that Pippin Barr is Philippa’s hobbit name. I’m just putting that out there.) The museum is software you download (here) that displays 24 varieties of computer-generated water, from the complex and realistic, to simple textures, to purposefully stylized low-information versions.
Philippa also points to the Seascape
page by Alexander Alekseev where you can read the code that procedurally produces an astounding graphic of the open sea. You can directly fiddle with the algorithm to immediately see the results. (Thank you, Alexander, for putting this out under a Creative Commons license.) Here’s a video someone made of the result:
Philippa also points to David Li’s Waves where you can adjust wind, choppiness, and scale through sliders.
More than ten years ago we got to the point where bodies of water look stunning in video games. (Falling water is a different question.) In ten years, perhaps we’ll be there with hair. In the meantime, we should recognize software designers as artists when they produce art.
Good work, PC Gamer, in increasing the number of women reviewers, and especially as members of your editorial staff. As a long-time subscriber I can say that their voices have definitely improved the magazine. More please!
January 7, 2018
I’m more surprised than proud that I got this to work, but here’s some JavaScript that slides down a box when the user scrolls an alphabetized table and slides that box back up once the user stops. While the user continues scroll up or down the page, the box displays the first letter of the row at the top. When the user stops scrolling for about a tenth of a second, the box goes away.
Note that when I say that “I got this to work,” what I really mean is that I successfully copy-and-pasted code from StackOverflow into the part of my script that runs when the script is first loaded. And when I say “JavaScript” I really mean “JavaScript using the jQuery library along with the Visible plugin that I think I actually don’t need but I couldn’t get jQuery’s is(":visible")
to work the way I thought it should.
So here’s an annotated walkthrough of the embarrassing code.
The first part notices the scrolling, shows the box, and fills it with the first letter of the relevant column of the table the page is displaying. (Thank you, Stackoverflow!)
The second part comes from another StackOverflow question. It notices when someone has stopped scrolling for 0.15 seconds and hides the block displaying the letter. And, yes, it could probably be combined with the first bit.
This is amateurish hackery. I understand that. But I’m an amateur. I’m not writing production code. I don’t have to worry about performance: this code works fine for scrolling 350 rows of a text-only table, but might crap out with 1,000 lines or 5,000 lines. At least it works fine so far. On the current versions of Chrome and Firefox. Under a waxing moon. I understand that I can get this far only because millions of real developers have posted their own code, and answered questions from fools like me. My hat is off to you.
For your copying-and-pasting convenience, here’s the code in copy-able form. (Click on the “Toggle line numbers” button on the bottom.)
|
|
1 |
var mywindow = $(window); // get the window within which |
2 |
// the page is being displayed |
3 |
var mypos = mywindow.scrollTop(); |
4 |
var newscroll; |
|
|
5 |
// add a function that’s called whenever the window is scrolled |
6 |
mywindow.scroll(function () { |
7 |
newscroll = mywindow.scrollTop(); // the scroll bar indicator’s |
8 |
// vertical position |
9 |
// Go through the rows of the table to find the one currently at the top. |
10 |
// I am undoubtedly doing this embarrassingly inefficiently. |
11 |
var letter = “”, done = false, i = 0; |
12 |
// loop until we find the row at the top or we’ve looked at all rows |
13 |
while (!done){ |
14 |
var title = $(“#title” + i); // id of the cell with the phrase the |
15 |
// table is sorted on |
16 |
if ( $(title).visible() == true){ // Unnecessary use of the |
17 |
// Visible plugin |
18 |
var currentTopRow = i; |
19 |
done = true; |
20 |
// Get the first letter of the relevant cell |
21 |
letter = $(title).text().substr(0,1).toUpperCase(); |
22 |
// put the letter into the box that will display it |
23 |
$(“#lettercontent”).text(letter); |
24 |
} |
25 |
// |
26 |
i++; |
27 |
// if we’ve checked all the rows and none is visible |
28 |
if (i >= gData.length){ // gData is the array the table is built from |
29 |
done = true; |
30 |
letter = “?”; |
31 |
} |
32 |
} |
33 |
// display the box with the letter |
34 |
$(‘#bigletter’).slideDown(); |
35 |
mypos = newscroll; |
36 |
}); |
|
|
37 |
// hide letter block when scrolling stops |
|
|
38 |
var scrollpausetimer = null; // create a timer to note |
39 |
// when scrolling has stopped |
40 |
$(window).scroll( function() { |
41 |
if(scrollpausetimer !== null) { |
42 |
clearTimeout(scrollpausetimer); |
43 |
} |
44 |
scrollpausetimer = setTimeout(function() { |
45 |
// hide the letter block |
46 |
$(‘#bigletter’).slideUp(); |
47 |
}, 150); // 150 is the pause to be noticed in 1/1000ths of a sec |
48 |
}, false); |
|
Javascript converted into html by this.