Sunday 20 December 2009

DITA 08 - Information Retrieval

Information is data with context which gives meaning in a subjective way to the recipient.

Three views of Information Retrieval (IR) are:
  • User view - the user wants to fill a gap in their knowledge by searching for meaningful results from queries or browsing data in likely relevant topics
  • System view - IT systems for reliable storage and retrieval of data supporting user needs
  • Sources View - capturing and presenting data for 3rd parties e.g. collating aircraft data relevant to the aerospace industry by Jane's

To support IR from systems, content is structuring to speed searches using indexing techniques:
  • Find data fields and communicate with metadata
  • Find words used in the database
  • "Stop word" removal reduces the word list by taking out the most commonly used words e.g the, and, to
  • Stemming references common terms with suffixes removed e.g. generate from generator, generated, generates. An example of a stemming generation language is Snowball
  • Synonym generation helps to create an index which finds more terms related to the original search e.g. table tennis & ping pong
My www search approach starts with a Google search for the most unique combination of words to narrow the likely results and give useful information on page one. If too many returned items have poor relevance I add further terms to select better information.

When looking for information on interests, such as guitars, I adopt "indiscriminate driftnetting" searching everything that mentions the esoteric interest and saving links to the ones which closely match the topic e.g. "which replacement pickup best delivers a classic jazz tone". No single search will find the answer and related searches are employed with a painstaking review for relevant listings which become a new search terms to find further information.

No comments: