More conceptual searching

I am interested in building a search capability on a large text corpus (such as Australian Newspapers)
to answer queries such as:

  • which prime ministers have visited the Tumut district of NSW?
  • who were the most prominent antagonists in the margarine quota discussions during the 1940’s and 50’s?
  • what poems by members of the Jindyworobak movement were published in newspapers?

Such an approach requires:

  1. fairly clean OCR
  2. entities (such as people, organisations, places) can be identified and useful attributes assigned (such as “Gough Whitlam is a Prime Minister”)
  3. there is an easy way for normal people to express such queries, or iterate towards them
  4. ways to deal with ambiguity (For example, what are the boundaries of the “Tumut district” and have they changed? Is Harold Wilson a “Prime Minister” in this context? Does a poem written by a Jindyworobak member before they joined the movement count?)

I’m fairly confident about how the first two requirements can be met, but I am most interested in ways that campers think the third and fourth requirement could be addressed.

Playtesting Sembl, the game of analogy


Since 2012 I have been working on Sembl, a game of resemblance, where players make analogical connections between images of (openly-licensed) cultural heritage material and then rate other players’ connections on a sliding scale of interestingness. When you draw together things that are not normally associated, you can create beautiful insights into how the world works. It’s like conceptual parkour.

Sembl is being built by Icelab and the alpha version will be released as soon as I can test all the basics – and for some things I need a group to test… that’s where I hope you THATCampers will help.

The games are playable on a board for three, four, five, six or twelve players. I’ve played a lot with the smaller boards, but the bigger ones are tricky to test – especially the 12-player board, not for the fainthearted.

12-semblersTo be honest I’m *not at all* sure we can play a whole game in an hour, or whether this board will even function (it’s crowded; there might or might not be catastrophic visual and functional overlaps). For sure, we will have to be swift and happy to stumble.

If it feels too much for a short session, or if it fails, we can revert to a 6-player board and play in teams. (Of course anyone, once registered, can play around as they like.)

I will have up a Google doc for issue and bug reporting. It will be very, very good to have your feedback. And you will have early access to this fabulous new way to access and interpret cultural heritage material 🙂 🙂 🙂

And hopefully, Michael will join us.

Please comment to express interest!

PS I ticked the category called ‘Linked data’ because even though it’s not about logical links, Sembl is an engine of handcrafted, analogical and dialogic i.e. two-way mutual simultaneous links. They are very *human* links, and foreign to computers, which I believe is what makes them important – as I said a while ago.



Sign up now for workshops!

As you know, THATCamp Canberra will run over three days — from Friday 31 October to Sunday 2 November. On the Friday there’ll be a series of introductory workshops to get you primed for the unconference that’ll run across Saturday and Sunday.

The schedule for the workshops is now online and you’ll see it’s an exciting and varied mix – everything from a crash course in regular expressions to a discussion of the poetics of online collections:

The morning sessions are shorter and will provide an overview of particular technologies, concepts or issues. In the afternoon we’ll get our hands dirty, trying out some tools and examples.

During the lunch break there’ll also be a collection visualisation showcase from the Digital Treasures Program at the University of Canberra.

Bring a laptop if you can, particularly for the afternoon sessions – although the ‘Wikipedia and Trove’ workshop will be held in a computer lab. Wifi will be available.

Places are limited and we’ll need to juggle the available rooms based on interest, so please sign up now! At the bottom of each session description you’ll see a ‘Sign up’ link – just click on this and submit your details to register your interest in that session. No need to sign up for the lunchtime showcase.


How about briefly clarifying where we stand with the elephant of copyright. I think the copyright act basically says photos before 1955 are all out of copyright and for films or artistic works its 70 years after the creator died or the date of publication for books.
It’s great on Trove when it’s out of copyright but various Australian galleries, universities and libraries seem to think differently. Some even consider they still have the copyright even if they have copied the newspaper article off Trove! It really bogs down your research when you have to keep seeking permission even if its already in the ‘Public domain’.

Session suggestion: GLAMs and Wikipedia – can we help each other?

I’d like to look at ways that GLAMs (Galleries, Libraries, Archives & Museums) can work more closely with Wikipedia. Of course, it’s not just GLAMs but any organisation with some collection of “knowledge” (so that might include universities, local history societies, or just any organisation that has knowledge they’d like to share with the world).

The benefits of working together are:

* for Wikipedia, better quality information

* for GLAMs, the general public is more likely to find information in a Wikipedia article (as they are in the top of the search engine results) so having GLAM content linked from the Wikipedia article can bring the reader to authoritiative content on the GLAM website for a deeper understanding than can be provided on Wikipedia

Let’s work together to give people a better “knowledge experience”.

Kerry (Wikimedia Australia)

Downloading bulk newspaper articles from Trove

I have recently made some software, called Retailer, (a kind of proxy server) which can be used as channel through which to download the full text of newspaper articles from Trove, in bulk (and the New Zealand equivalent, Papers Past, too). Is there any interest in attending a workshop to deal with how to set up and use Retailer with Trove?

I would love to see a session where a bunch of people install it and run it on their notebooks.

I’d also really like to be able to document the installation and usage procedures, or better automate them.  So far I have only written instructions for installing and using on Linux. But I have a friend who has harvested thousands of articles running it on Mac OS X, and I’m sure it will also run on Windows and other OSes.

For background see my blog post How to download bulk newspaper articles from Trove.


THATCamp and Trove

Trove turns five in November, and we intend to celebrate in style. No party hats and birthday cake for us — we want to bring people together to build, learn and share. We’ll be kicking off Trovember with THATCamp Canberra, a digital humanities unconference.

THATCamps (The Humanities And Technology) explore the possibilities and problems raised by the application of technology to the humanities. They’re unconferences, which means no prepared talks or Powerpoint — the program is developed on the spot based on the interests of participants. Do you want to know more about new tools or methods for digital research? Visualisation? Big data? APIs? Digitisation? THATCamps are a great place to start.

The Trove team is always on the lookout for innovative research using our data. We know there are some exciting projects out there, but we want more! We also want to help people make use of our API to build tools and interfaces, or simply streamline their research. Trove and THATCamp seem like a perfect match.

Don’t be intimidated — everyone from complete newbies to hardened coders will be welcome at THATCamp Canberra. And to help you find your feet we’ll be starting off with a series of workshops to introduce some of the tools, methods, technologies and standards used in the digital humanities.

It’s our birthday, let’s make stuff.

The return of THATCamp Canberra

In 2010 THATCamp Canberra was born — the first THATCamp in Australia, the first in the southern hemisphere. And we discovered there were other people like us, people interested in the intersection of technology and the humanities. We were a community.

In 2011 the legend continued. But with more workshops.

And then silence…

Until now.

This Trovember, THATCamp Canberra returns to your screens in

THATCamp Canberra 2014: The rise of the bots.



THATCamp Canberra 2014

31 October — 2 November Trovember

National Library of Australia

We’ll be kicking off Trovember by taking over the 4th floor of the National Library of Australia and turning it into a Digital Humanities discovery space.

There’ll be public workshops on 31 October, offering everyone an opportunity to pick up some new skills.

Then across the weekend 1–2 November we’ll be unscheduling, unpowerpointing, and unconferencing our way through all those questions about Digital Humanities that you always wanted to ask.

It’ll be fun, it’ll be exhausting, it’ll be THATCamp Canberra 2014.

Registration will open soon.