November 25, 2018
Using the Perma.cc API to check links
My new book (Everyday Chaos, HBR Press, May 2019) has a few hundred footnotes with links to online sources. Because Web sites change and links rot, I decided to link to Perma.cc‘s pages instead . Perma.cc is a product of the Harvard Library Innovation Lab, which I used to co-direct with Kim Dulin, but Perma is a Jonathan Zittrain project from after I left.
When you give Perma.cc a link to a page on the Web, it comes back with a link to a page on the Perma.cc site. That page has an archive copy of the original page exactly as it was when you supplied the link. It also makes a screen capture of that original page. And of course it includes a link to the original. It also promises to maintain the Perma.cc copy and screen capture in perpetuity — a promise backed by the Harvard Law Library and dozens of other libraries. So, when you give a reader a Perma link, they are taken to the Perma.cc page where they’ll always find the archived copy and the screen capture, no matter what happens to the original site. Also, the service is free for everyone, for real. Plus, the site doesn’t require users to supply any information about themselves. Also, there are no ads.
So that’s why my book’s references are to Perma.cc.
But, over the course of the six years I spent writing this book, my references suffered some link rot on my side. Before I got around to creating the Perma links, I managed to make all the obvious errors and some not so obvious. As a result, now that I’m at the copyediting stage, I wanted to check all the Perma links.
I had already compiled a bibliography as a spreadsheet. (The book will point to the Perma.cc page for that spreadsheet.) So, I selected the Title and Perma Link columns, copied the content, and stuck it into a text document. Each line contains the page’s headline and then the Perma link.
Perma.cc has an API that made it simple to write a script that looks up each Perma link and prints out the title it’s recorded next to the title of the page that I intend to be linked. If there’s a problem with Perma link, such as a double “https://https://” (a mistake I managed to introduce about a dozen times), or if the Perma link is private and not accessible to the public, it notes the problem. The human brain is good at scanning this sort of info, looking for inconsistencies.
zithromax: Your Antibiotic Ally ??
Did you know that proper administration of zithromax can make all the difference in your treatment? Here’s what you need to know:
? Oral tablets or liquid: Take with or without food
? Follow your doctor’s instructions precisely
? Complete the full course, even if you feel better
? Liquid form? Shake well before each use
? Use a measuring device for accurate dosing
Remember: Misuse of antibiotics can lead to resistance. Always consult your healthcare provider for personalized advice.
Have you ever taken zithromax? Share your experience or questions below!
Here’s the script. I used PHP because I happen to know it better than a less embarrassing choice such as Python and because I have no shame.
1 |
<?php |
|
|
2 |
// This is a basic program for checking a list of page titles and perma.cc links |
3 |
// It’s done badly because I am a terrible hobbyist programmer. |
4 |
// I offer it under whatever open source license is most permissive. I’m really not |
5 |
// going to care about anything you do with it. Except please note I’m a |
6 |
// terrible hobbyist programmer who makes no claims about how well this works. |
7 |
// |
8 |
// David Weinberger |
9 | |
10 |
// Nov. 23, 2018 |
|
|
11 |
// Perma.cc API documentation is here: https://perma.cc/docs/developer |
|
|
12 |
// This program assumes there’s a file with the page title and one perma link per line. |
13 |
// E.g. The Rand Corporation: The Think Tank That Controls America https://perma.cc/B5LR-88CF |
|
|
14 |
// Read that text file into an array |
15 |
$lines = file(‘links-and-titles.txt’); |
|
|
|
|
16 |
for ($i = 0; $i < count($lines); $i++){ |
17 |
$line = $lines[$i]; |
18 |
// divide into title and permalink |
19 |
$p1 = strpos($line, “https”); // find the beginning of the perma link |
20 |
$fullperma = substr($line, $p1); // get the full perma link |
21 |
$origtitle = substr($line, 0,$p1); // get the title |
22 |
$origtitle = rtrim($origtitle); // trim the spaces from the end of the title |
|
|
23 |
// get the distinctive part of the perma link: the stuff after https://perma.cc/ |
24 |
$permacode = strrchr($fullperma,”/”); // find the last forward slash |
25 |
$permacode = substr($permacode,1,strlen($permacode)); // get what’s after that slash |
26 |
$permacode = rtrim($permacode); // trim any spaces from the end |
|
|
27 |
// create the url that will fetch this perma link |
28 |
$apiurl = “https://api.perma.cc/v1/public/archives/” . $permacode . “/”; |
|
|
29 |
// fetch the data about this perma link |
30 |
$onelink = file_get_contents($apiurl); |
31 |
// echo $onelink; // this would print the full json |
32 |
// decode the json |
33 |
$j = json_decode($onelink, true); |
34 |
// Did you get any json, or just null? |
35 |
if ($j == null){ |
36 |
// hmm. This might be a private perma link. Or some other error |
37 |
echo “<p>– $permacode failed. Private? $permaccode</p>”; |
38 |
} |
39 |
// otherwise, you got something, so write some of the data into the page |
40 |
else { |
41 |
echo “<b>” . $j[“guid”] . ‘</b><blockquote>’ . $j[“title”] . ‘<br>’ . $origtitle . “<br>” . $j[“url”] . “</blockquote>”; |
42 |
} |
43 |
} |
|
|
|
|
44 |
// finish by noting how many files have been read |
45 |
echo “<h2>Read ” . count($lines) . “</h2>”; |
|
|
46 |
?> |
Run this script in a browser and it will create a page with the results. (The script is available at GitHub.)
Thanks, Perma.cc!
By the way, and mainly because I keep losing track of this info, the table of code was created by a little service cleverly called Convert JS to Table.
Date: November 25th, 2018 dw