This week, we were responsible for reading Professor Cohen’s From Babel to Knowledge: Data Mining Large Digital Collections. The reading described Jorge Luis Borges’s short story The Library of Babel in which the narrator describes bizarre rooms filled with books piled on top of books and all the pages of all the books contain incoherent letters and scrambled messages that make no sense. Used as an attention getter, this story ultimately links to various ways in which people are able to sift and sort through digital information. The reading brings up a “syllabus finder” used to find, locate, and organize various syllabi that professors may put online. This notion highlights the idea of information sorting through things like keyword searches and date entries that we have addressed numerous times in class.
The reading went on to discuss the more complex information searching application of “question and answer.” This idea of typing a question into a specific information search and instantly receiving an answer is quite complex. The reading states:
QA is a far greater challenge than document classification because it exercises almost all of the computational muscles. Not only do you have to find relevant documents in massive corpuses (involving methodologies of search and document classification), you also have to interpret users’ questions well to know what they are looking for (natural language processing) and analyze the text of retrieved documents using a variety of statistical and linguistic methods (information theory, regular expressions, and other text parsing techniques).
Immediately, this QA made me thing of one of the most frequently used online question and answer services: Yahoo answers
Though the purpose is to answer your questions quickly and accurately, there is overwhelming room for error. This specific QA offers answers to be posted by ANYONE with a computer. However, this does allow for strange and humorous posts such as this one:
The internet is a vast, bottomless pit of information. But luckily, things like syllabus finders and QA’s are helpings us, in some ways, to sift through all that data.