Untitled TechShare 2021/03/18
finding material (usually documents)
of an unstructured nature (usually text)
that satisfies an information need
from within large collections (usually stored on computers)
binary term-document incidence matrix
110100 AND 110111 AND 101111
a data structure directing a word to documents
Just chop on whitespace and throw away punctuation characters?
language-specific ➜ Language identification
Dog ➜ dog
thought ➤ think
studies ➤ study
better ➤ good
catty ➤ cat
studies ➤ studi
the boy's cars are different colors
➤ the boy car be differ color
each document has a unique serial number
➜ docID
Brutus AND Calpurnia
(eyes AND trees) AND kaleidoscope
(kaleidoscope AND eyes) AND trees
(kaleidoscope AND eyes) AND trees
difficult or impossible to find a satisfactory middle ground
General Problems with
Boolean Search &
Extended Boolean Model
Information Retrieval Problem
Inverted Index
Boolean Queries
Boolean Retrieval