    Amidst reports of exploding heads, some great blog posts are appearing describing the THATCamp Canberra experience. Feel free to add details below:

  • Adventures with automated text analysis


    Data storage and retrieval technologies give us unprecedented access to volumes of data that are impractical to analyse manually. The quantity of data can be extreme:  a collection of just one month of the worlds online media and social media contains some 386 million posts, articles etc. – about 3TB of text data! Even more modest collections are often far too large to read in their entirety.

    This has lead to many techniques and advances in automated text analysis. I would like to propose a session in which those who use or wish to use automated text analysis techniques come together to exchange notes, discuss effective approaches, identify stumbling blocks and potential sources of error etc…

    I myself come from a machine learning background and have only very recently began work in a humanities context. What I can offer is some more technical knowledge of what can be done, such as algorithms to detect sentiment or discussion topics running through a corpus. What I hope to gain is an understanding of how these techniques, or ones like them, are and can be used in the Humanities.

  • Electronic Editions on a Shoestring


    I’d be interested in any sessions on electronic editions, particularly sessions that discuss the tools and platforms that can support electronic editions for researchers with limited technical experience. Can Omeka be used for such projects? What other tools and resources are available? What are the limits of such platforms for complicated textual histories? How can print editions and electronic editions work together?

  • Visualising and Analysing Historical Cultural Networks


    At AustLit, we often talk about enhancing AustLit data in order to visualise and analyse the cultural networks that influence the composition, publication and reception of literary works. Such analysis inevitably extends beyond literary works to other cultural products, particularly when writers contribute to several forms (eg fiction in serial and book forms, drama, radioplays, screenplays etc.). It also extends to other art forms when visual artists and literary artists commune socially, artistically or professionally.

    We’ve been experimenting with LORE to see how relationships can be defined and visualised with the tools we have at hand. But discussion with anyone who is pursuing similar research questions or who has experience with software such as Cytoscape would be very valuable. I’d like to participate in such a session if anyone else in interested in joining me.

  • The tyranny of citation formats


    I’d like to have a session about citation formats and bibliographic processing. Not sure if this would be a hackathon or a general discussion, probably a bit of both.

    The thing is, citation formats evolved in the days of paper – they’re a form of text based hypertext. In the old days when you referenced something you had to put enough bibliographic detail in your text so that people could find it. We still have to format articles, theses, essays etc with redundant text-formatted references and bibliographies to submit them to publishers and markers, even though we’re using machines to manage all the references. And we’re still teaching students to do this, sometimes by hand.

    In many disciplines we have online resources so in many cases a citation could be a URI referencing a good quality stable data source. But URIs are not always going to be the way to go, in which case the bibliographic data could be embedded in text in a way that makes re-processing easy.

    This session could look at what can be done to rationalise citation practices so that an author can use existing bibliograpic databases (via stuff like the Open Bibliogrpahy project, Zotero, Mendeley, CrossRef et al) without having to maintain their own, unless they want to of course, and downstream consumers (publisers, readers, markers etc) can choose how they would like to view, reuse or otherwise process the references.

    In the sciences there are many disciplines where citing by DOI would be sufficient to cover almost all use-cases, but this is certainly not the case in the humanities.

    We could talk about:

    • How to embed citation-by-reference and citation-with-bibliographic-data in HTML and how to choose which to do. (I have some ways of doing this using HTML5 Microdata I’d like feedback on)
    • How to produce said HTML using tools such as Word, Wiki formats, Pandoc, LaTeX, WordPress etc. (I have made some progress on a tool using Zotero + MS Word producing HTML5 that can be re-formatted automatically to suit the reader, with bib-data embedded in the HTML for machine processing as well).
    • What are the limits of this approach? There will be lots of areas of the humanities where trying to construct bibliographic entries for the stuff you are referencing will be hard.
    (I have been working on this for a project in the UK, funded by JISC)