AI and Search

Catherine Gracey, CS Librarian

  • I'm the Computer Science Librarian
  • It's my job to help you find and/or publish scholarly information 
  • Today we're going to focus on the 'finding' piece

1. Research Tools and Strategies

A) Searching

Where do you find your academic information?

  • Literal search for only the tokens you put in
  • If you search 'wife' variations like 'wives' will not be searched
  • High level of specificity and control
  • Harder to use

Keyword Search

Semantic Search

  • Accounts for 'meaning' of terms, will search for variations
  • If you search 'wife' search would include 'wives' 'spouse' 'partner'
  • Less control, easy to use
  • Search using keywords
  • Results are only returned if keyword appears in text
  • No 'judgement' from system on what's relevant, it shows it all

Traditional Databases

"artificial intelligence" AND "diagnosis"

Have you used any library databases? What was your experience like?

Bibliographic Databases

Curated, contain peer-reviewed content

  • wide breadth
  • excellent for very general research questions
  • not necessarily going to have all CS content

Multi-disciplinary

Discipline Specific

  • curated to only contain works published by CS researchers or presented at CS conferences
  • usually your go to option, ACM, IEEE
  • To search:
    • Navigate to CS guide for list of CS databases
    • Choose one!
    • Combine keywords using Boolean Operators
    • Use filters to narrow down results

Article Databases

Other fun syntax!

Name Symbol What it does Example
Quotation "" Searches for that phrase exactly "graph theory"
Truncation  * Searches for variations on this word educat*
Proximity N/3 or Near/3 or W/3 Searches for kw within 3 (or n) words of each other  climate NEAR/2 change

Learning check:

Using the Advanced Search function in ACM:

  • The authors have listed 'gamification' as an author keyword, but gamification does not appear in the title

Using filters:

  • The research was funded by NSERC
  • The sponsoring SIG was SIGCHI
  • It is a full Research Article

Search Engines

  • Search using natural language
  • Results that contain similar words are returned due to Machine Learning
  • Results are pre-sorted by perceived relevance
  • Search engine optimization at play

*the word diagnosis doesn't actually appear, but ML is used to determine that this is about diagnosis

Google Scholar

  • An academic search engine
  • Pros: ease of use, wide breath of content
  • Cons: less curated, some content is not reliable, less metadata
  • NOT designed for search
  • Was generating output based on training data alone
  • Done by predicting which word should come next
  • Riddled with hallucinations

LLMs (from ~2023)

Types of Hallucinations

GenAI tool entirely makes up a citation that does not exist. It may look real, but if you go looking, it can't be found.

1

Fake sources

Fake Sources

Types of Hallucinations

GenAI tool entirely makes up a citation that does not exist. It may look real, but if you go looking, it can't be found.

1

Fake sources

GenAI tool generates an answer based on it's training data, but it is just incorrect

2

Incorrect facts

Incorrect Facts

Types of Hallucinations

GenAI tool entirely makes up a citation that does not exist. It may look real, but if you go looking, it can't be found.

1

Fake sources

GenAI tool generates an answer based on it's training data, but it is just incorrect

2

Incorrect facts

The GenAI tool pulls from a real article, but just misrepresents the information from the source

3

Unfaithful citations

Retrevial Augmented Generation (RAG)

  • Supplements LLMs with an external search
  • Results that contain similar words are returned due to Machine Learning
  • Outputs can be traced to specific sources
  • Now being incorporated into tools that were just previously LLMs (like ChatGPT), but the sources they can search vary widely

Not all RAG tools are created equally, it's essential to look at what corpus they are searching

For instance, the basic perplexity version searches the internet to answer your questions, meaning information could be based on lots of kinds of sources (social media, etc.)

Perplexity Sources

  • There are a number of tools designed specifically for academic research
  • The difference is that they search a more curated corpus that only contains scholarly or peer-reviewed works
  • Some generalized tools (Perplexity Academic) offer options that do this as well

Academic RAG tools

ScopusAI (an Academic Example)

Let's give them a try!

Tool
Scopus AI*
Semantic Scholar
Consensus 
Perplexity Academic

Your task, you have 7 minutes to test one of these out in a small group, then we'll report back about:

  • Your general thoughts
  • How easy it was to use
  • How happy you were with the results
  • What you do/don't like about the tool

* Must be accessed via the library

AI Search Pros

  • Ease of use
  • Wide breadth
  • Ability to synthesize information easily

Considerations

  • Bias
  • Answering-as-service
  • Cognitive implications 
  • Long term feasibility 

AI Search Pros

  • Ease of use
  • Wide breadth
  • Ability to synthesize information easily

Considerations

  • Bias
  • Answering-as-service
  • Cognitive implications 
  • Long term feasibility 

Censorship:

Heavy Bias/political agendas:

[a]

1. Research Tools and Strategies

B) Mapping Tools

AI Driven Research Mapping Tools

AI Driven Research Mapping Tools

Tool Name Cost
Research Rabbit Free
Connected Papers Freemium (~$5)
Litmaps Freemium (~$8)

Acknowledging AI Use/Citations

Thanks!

Connect with me via email if you have any thoughts/questions on this topic!

 

catherine.gracey@unb.ca