Kent Fitch – THATCamp Canberra 2014 http://2014.thatcampcanberra.org Tue, 01 Dec 2020 13:05:25 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.16 More conceptual searching http://2014.thatcampcanberra.org/2014/10/more-conceptual-searching/ http://2014.thatcampcanberra.org/2014/10/more-conceptual-searching/#comments Mon, 27 Oct 2014 08:07:21 +0000 http://canberra2014.thatcamp.org/?p=295 I am interested in building a search capability on a large text corpus (such as Australian Newspapers)
to answer queries such as:

  • which prime ministers have visited the Tumut district of NSW?
  • who were the most prominent antagonists in the margarine quota discussions during the 1940’s and 50’s?
  • what poems by members of the Jindyworobak movement were published in newspapers?

Such an approach requires:

  1. fairly clean OCR
  2. entities (such as people, organisations, places) can be identified and useful attributes assigned (such as “Gough Whitlam is a Prime Minister”)
  3. there is an easy way for normal people to express such queries, or iterate towards them
  4. ways to deal with ambiguity (For example, what are the boundaries of the “Tumut district” and have they changed? Is Harold Wilson a “Prime Minister” in this context? Does a poem written by a Jindyworobak member before they joined the movement count?)

I’m fairly confident about how the first two requirements can be met, but I am most interested in ways that campers think the third and fourth requirement could be addressed.

]]>
http://2014.thatcampcanberra.org/2014/10/more-conceptual-searching/feed/ 1