Bulk harvesting of newpaper articles from Trove on MacOS 10.9 or 10.10 using Retailer – Instructions

The following is a cut-and-paste from my blog entry on this subject 🙂

Conal Tuohy (@conal_tuohy) presented a session at THATCamp Canberra 2014 on Retailer, an interface tool he’s developing to provide the National Library of Australia's Trove service with an Open Archives Initiative Protocol for Metadata Harvesting-compliant interface.

The aim of the session was to get attendees to install Retailer on their laptops and then perform some searches.
It turned out that installing Retailer on the Mac laptops present wasn’t quite as straight-forward as might have been hoped (the linux-heads present had no such problems).

During the session, we worked out a procedure that does work for users of MacOS 10.9 (Mavericks) and MacOS 10.10 (Yosemite). This procedure is explained, step-by-step, below. Please read through these instructions in their entirety before you try to install Retailer on your Mac, so that you don’t make incorrect assumptions about the following steps 🙂 Please note that I’m going to make the following assumptions:

  1. You haven’t moved your default Downloads location from the default location (ie the Downloads folder in your home directory)
  2. That you know how to open the Applications folder to see the complete list of your installed applications.
  3. That you’ve applied for, and received, a Trove API key. You’re not going to get far without one.

The installation instructions

  1. Start by reading Con’s blog post introducing Retailer. You may not understand all of it, and it’s very Debian-centric, but read it anyway, so you understand what Retailer is and how it works, and why you need to download various pieces of software.
  2. Download the Java Development Kit (JDK) installer. Yes, you want the JDK (which installs a full Java compiler & tools), not the Java Runtime Environment (JRE), which is just a plugin for your web browsers). You also need to ensure that you download Java 8 update 25 or newer; earlier version of the installer weren’t aware of MacOS 10.10 (the latest, greatest), and treated it as 10.1 (ye olde ancient version from the early 2000s), and would refuse to install because they thought your OS was too old. You can download the installer from http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html You need to click on the radio button that says “Accept Licence Agreement” then you can download jdk-8u25-macosx-x64.dmg. Let it put it into your default Downloads location. Do not install it at this time.
  3. Download Apache Tomcat v. 8 from https://tomcat.apache.org/download-80.cgi Look in the section labelled “Binary Distributions”. The first sub-section is labelled “Core”. You should download the tar.gz version. Do not unpack the compressed file at this time.
  4. Download jOAI from http://www.dlese.org/dds/services/joai_software.jsphttp://www.dlese.org/dds/services/joai_software.jsp Click on the “Download from SourceForge” link, and let it put the download in your default downloads folder. Do not unpack the download at this time.
  5. Download Retailer from https://github.com/Conal-Tuohy/Retailer/releases You should click on the green button with the down arrow and “retailer.war” on it. Let it put it in your default download location. Do not do anything with this file at this time.
  6. OK, at this point your default downloads location should contain (please note the version numbers in the following were current at time of writing, your mileage may vary):
    1. apache-tomcat-8.0.14.tar.gz
    2. jdk8u25-macosx-x64.dmg
    3. joai_v3.1.1.3.zip
    4. retailer.war
  7. Now it’s time to visit our friend the command-line. Open up a Terminal window (the Terminal is in the “Utilities” folder inside your Application” folder). Do not close this Terminal window until you’re told it’s safe to do so much later; you’re going to be making a great deal of use of it.
  8. You need to decide where you want to put the apache-tomcat installation. I recommend the /Users/Shared folder. Type
    cd /Users/Shared
    into the terminal window, and hit return.
  9. Now type the following three lines into the Terminal, hitting the return key after you’ve typed each line. The first line unpacks the tomcat server, the second line copies retailer.war to where it needs to be, and the third line extracts oai.war from the archive and puts it where it needs to be.
    tar -xvf ~/Downloads/apache-tomcat-8.0.14.tar.gz --gunzip
    cp ~/Downloads/retailer.war apache-tomcat-8.0.14/webapps/
    unzip -j ~/Downloads/joai_v3.1.1.3.zip joai_v3.1.1.3/oai.war -d apache-tomcat-8.0.14/webapps
    
  10. OK, now you should install the Java Development Kit. Double click on the jdk-8u25-macosx-x64.dmg file to open the disc image, then run the enclosed installer. Once the installation has completed, eject the disc image.
  11. Go back to the Terminal. Type
    java -version
    If all has gone well, you should see something like:

    java version "1.8.0_25"
    Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
    Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
  12. Now type
    ./apache-tomcat-8.0.14/bin/startup.sh
    If all goes well, you should see something like:

    Using CATALINA_BASE: /Users/Shared/apache-tomcat-8.0.14
    Using CATALINA_HOME: /Users/Shared/apache-tomcat-8.0.14
    Using CATALINA_TMPDIR: /Users/Shared/apache-tomcat-8.0.14/temp
    Using JRE_HOME: /Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home
    Using CLASSPATH: /Users/Shared/apache-tomcat-8.0.14/bin/bootstrap.jar:/Users/Shared/apache-tomcat-8.0.14/bin/tomcat-juli.jar
    Tomcat started.
  13. Start your web browser of choice, and point it at:
    http://localhost:8080
    If all goes well, you should see a web page for Apache Tomcat.
  14. When Tomcat started up, it should have unpacked the two .war files into separate directories for you. You need to edit Retailer’s configuration file. Go back to your Terminal window, and type

    open -a TextEdit apache-tomcat-8.0.14/webapps/retailer/WEB-INF/web.xml
    to open the file in TextEdit. Replace the text “INSERT TROVE API KEY HERE” with your Trove API key.
    Now you need to add an additional parameter, to tell Retailer that you’re going to use it to perform Trove searches. Add the following lines just before the <servlet> line:

    <context-param>
    <param-name>xslt</param-name>
    <param-value>trove.xsl</param-value>
    </context-param>

    Save the file and exit TextEdit.

  15. Go back to the Terminal window, and type
    cp apache-tomcat-8.0.14/webapps/retailer/WEB-INF/web.xml /Users/Shared/retailer-config-backup.xml
    This will make a backup of your configuration file outside the Retailer web app; I’ve had my web.xml “restored” to the default a couple of times through no action of my own, so having a backup on hand has been useful.
  16. Point your web browser at:
    http://localhost:8080/oai/admin/harvester.do
    and click on “Add new harvest”.
  17. Fill in the settings as per Con’s blog post. For your first harvest, I suggest you use “search: international cometary explorer”; this doesn’t match too many items (most are in The Canberra Times, post 1954) Note the section “Save files from this harvest:”.
    The default harvest location is
    /Users/Shared/apache-tomcat-8.0.14/webapps/oai/WEB-INF/harvested_records
    You’ll probably want to put these somewhere else, so select “at a location I specify…” and type in a folder path (eg /Users/Shared/harvested_records/ICE ). Click on the “save” button
  18. Click on “All” under “Manual Harvest”. You’ll be asked if you want to replace the results of a previous harvest. Since you haven’t harvested before, your answer should be “OK” (in future, you’ll be better off clicking on the “New” button to add any new results to your pre-existing harvest).
  19. Wait. Depending upon your search parameters, your harvest may take some time. You can keep an eye on it by clicking on “View harvest history and progress” and then occasionally refreshing the page.
  20. Your harvested records will be stored in the location you specified.

Please note that unless you specifically turn it off, the Tomcat server will continue running until your computer is shut down or rebooted; even if you log out and log in as a different user, the Tomcat server will continue running. You can turn it off by typing
./apache-tomcat-8.0.14/bin/startup.sh
into the Terminal window.

Bots of collections / Bots of conviction

I’m interested in the potential of Bots, specifically Twitter bots, to mobilise cultural collections by moving them into spaces where people already are. My first bot, @TroveNewsBot not only tweets random newspaper articles from Trove, it responds to other users, and interacts with the current news headlines. You can read more here and here.

In recent months @TroveNewsBot has been joined by a number of other collection bots, most tweeting random items. Steve Lubar has argued that these random selections help expose the constructed nature of collections:

The museumbot calls attention to the necessity of making choices. The vast difference between its random choice and what I see in the museum points out that the choices have been made. 

But can bots do more? Mark Sample, digital humanist and bot maker extraordinaire, recently wrote an essay that explored the possibilities of protest bots or ‘bots of conviction’:

protest bots take a stand. Society being what it is, this stance will likely be unpopular, perhaps even unnerving. Just as the most affecting protest songs made their audiences feel uncomfortable, bots of conviction challenge us to consider our own complicity in the wrongs of the world

My one venture in the to realm of protest bots is the rather tame @OperationBot (and its companion webapp). But I’d like to do more.

At THATCamp I’d like to discuss the possibilities of bots, and to think about ways we might respond to Mark Sample’s call for bots of conviction.

Instant cooperative editing of a Wikipedia article – so what’s new?

A quick, last-minute proposal for a ‘play’ session, just to engage some expertise and find out how ‘quick’ cooperative work is (and how good the Wikipedia engine is).

Wikipedia is the pre-eminent example of a wiki – software providing a place for co-operative development of content on a given subject – writing it and changing it. In this session we would

1) select a Wikipedia article to edit

2) individuals or ad hoc groups edit that article (at this point separately i.e. save rather than publish)

3) compare our edits

It would be interesting to see how people combine their expertise in a (non-competitive) way to edit something quickly. The first challenge of this game would be in selecting a suitable article – presumably one for which at least one participant has expert knowledge. Given that ‘camp’ participants in general are self-selected for interest both in humanities and in technology the available fields will be many. Though even this natural assumption of a good starting point could be dumped if we feel like it. But contributions to the development / changes to the article should be made by everyone, not just the main subject-matter expert(s) (if any have been identified).

NB. Should also discuss if time that fact that we would not in fact be replicating the collaborative paradigm of Wikipedia articles, as we will all be in the same room and talking.

(Susan Ford)

Bring your ideas!

The unconference part of THATCamp Canberra kicks off tomorrow morning. Hopefully the workshops today will have inspired some ideas, or raised some new problems you’d like to discuss. If so, propose a session! Either login to the site and add a post, or bring your idea along to the scheduling session.

Remember, you don’t have to be an expert in the topic you propose. Some of the best discussions start with a problem or a question.

And if you’ve got something you’d like to share but don’t think it’s enough for a whole session — remember we’ll also have a series of lightning talks or Speedos after lunch. Show off your latest projects or a favourite website — it’s up to you (as long as it only takes 3 minutes)!

Creating a dynamic community history project

I am responsible for the Australian Paralympic history project. This is a wide-ranging project which seeks to capture, manage and preserve the history of the Paralympic movement in Australia. The attached document gives you an overview of what we are trying to do.

With very limited resources, we rely on partnerships with experts (such as the NLA) and volunteers (such as a group of Wikipedia editors). We are also lucky to be working with the Uni of Qld, which has received an ARC Linkage grant for the project.

However, the big challenge is to create a vehicle which will draw together all the elements of the project and make them available to anyone who wants to access them or who wants to contribute. We have a general concept of an online platform, or “e-history”, but at present these seem to be more limited in scope and don’t necessarily offer the capacity to access the detail we are assembling through our project.

I am sure that we would not be the only community organisation in such a position and looking to tell its story and preserve its history effectively, at a reasonable (haha – minimal) cost.

At the same time as our history is important the Australian Paralympic Committee, it intersects with the “bigger pictures” of Australian society and its history that are being painted by organisations such as the NLA.

The Paralympic History Project short version

Session proposal – spatially explore via mapping an Australian historical problem

***NOTE: I don’t want to facilitate this as I don’t have the mapping experience, I am just interested in this area***
I would like to see a session please on how you could use maps (with layers of information) to explore an Australian historical debate/event that is spatial/locational in nature (on Saturday not Sunday as I can’t attend on Sun). Perhaps Paul Hagon could lead this and Tim Sherratt, being a historian, could identify a suitable example?
Check out this link as an example – examining what happened on the Battle of Gettysburg spatially (via a 3D model of the landscape using contour plans) actually proved finally, after years of debate amongst historians, that General Lee lost the battle because he lacked crucial visibility of key areas of the battlefield.
http://www.smithsonianmag.com/ist/?next=/history/looking-at-the-battle-of-gettysburg-through-robert-e-lees-eyes-136851113/