Do LLMs dream of Type Inference?

Leaner Technologies, Inc.
Shunsuke "Kokuyou" Mori

@kokuyouwind

$ whoami

Name: Shunsuke Mori
Handle: Kokuyou (黒曜)

from Japan 🇯🇵
(First time in the US / at RubyConf)

Work: Leaner Technologies, Inc.
Hobby Project:

Developing an LLM-based type inference tool

@kokuyouwind

Why Type Inference with LLMs?

class Bird; end

class Duck < Bird
  def cry; puts "Quack"; end
end

class Goose < Bird
  def cry; puts "Gabble"; end
end

def make_sound(bird)
  bird.cry
end

make_sound(Duck.new)
make_sound(Goose.new)

What is the argument type of make_sound method?

Traditional Approach: Algorithmic

make_sound(Duck.new)
make_sound(Goose.new)

Called with Duck

Called with Goose

The argument type of make_sound is (Duck | Goose)

Human Approach: Heuristics

class Bird; end

def make_sound(bird)
  bird.cry
end

The argument name is bird.

The argument type of make_sound is Bird

There is Bird class.

Algorithms are great at logic, but lack heuristic understanding.

LLMs offer the potential for human-like type inference.

I developed

as a tool to guess RBS types using LLMs.

https://rubykaigi.org/2024/presentations/kokuyouwind.html#day1

RBS Goose

Generate RBS type definitions from Ruby code using LLMs
(Presented at RubyKaigi 2024)

https://rubykaigi.org/2024/presentations/kokuyouwind.html#day1

Duck

quacking like geese

Duck

quacking like geese

Duck

Duck Typing

quacking like geese

Goose

RBS Goose: Current State

Ruby

RBS

some small

How capable is

？

We will need some metrics of RBS Goose performance.

Previous Research

Ruby SimTyper: Research of type inference
- Covers several libraries and Rails Applications
- Not directly available due to different type formats
Python TypeEvalPy: Type Inference Micro-benchmark
- Covers grammatical elements / typing context

Previous Research

A lot of papers exists... on Python🐍

Explain how RBS Goose works with LLM and evaluate
- Better results than traditional methods in several cases
Share the idea of a type inference benchmark I planned
- Referring to previous studies

Today's Focus

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

Type System

A mechanism to classify the components of a program
- Strings, numbers, etc.
- To prevent invalid operations
Ruby is a dynamically typed language
- 1 + 'a' : TypeError is raised at runtime
- 1 + 'a' if false : TypeError is not raised

Static Type Checking

A mechanism to detect type errors before execution
Need to know the type of each part of the code
- Ruby does not use type annotations in its code
- Define types with RBS / Checking with Steep
- (Other options include RBI / Sorbet, and RDL, but we will not cover in this session)

Static Type Checking: Examples

For 1 + 'a', we can detect a type error if we know...
- 1 is an Integer
- 'a' is a String
- Integer#+ cannot accept a String

class Integer
  def +: (Integer) -> Integer
  # ...
end

https://github.com/ruby/rbs/blob/d9000d23/core/integer.rbs

Type Inference

Mechanism to infer types of code without explicit annotations
- For performing static type checks
- To generate types for Ruby code without type definitions
TypeProf: Ruby / RBS type inference tool
- Tracking data flow in variable assignments and method calls
  (Dataflow Analysis)

TypeProf: Mechanism

https://speakerdeck.com/mame/good-first-issues-of-typeprof?slide=30

def foo: (Integer n) -> String

Tricky Case - Generalization

class Bird; end

class Duck < Bird
  def cry; puts "Quack"; end
end

class Goose < Bird
  def cry; puts "Gabble"; end
end

def make_sound(bird)
  bird.cry
end

make_sound(Duck.new)
make_sound(Goose.new)

class Bird
end

class Duck < Bird
  def cry: -> nil
end

class Goose < Bird
  def cry: -> nil
end

class Object
  def make_sound: (Duck | Goose) -> nil
end

lib/bird.rb

sig/bird.rbs

The argument type of make_sound is

infered as a union of subtypes.

TypeProf

Tricky Case - Dynamic definition

class Dynamic
  ['foo', 'bar'].each do |x|
    define_method("print_#{x}") do
      puts x
    end
  end
end

d = Dynamic.new
d.print_foo #=> 'foo'
d.print_bar #=> 'bar'

class Dynamic
end

lib/dynamic.rb

sig/dynamic.rbs

[error] undefined method: Dynamic#print_foo
[error] undefined method: Dynamic#print_bar

TypeProf

Summary - Basics of Type System and Type Inference

Type System: prevent invalid operation of the program
- Ruby has a dynamic type system
Static Type Checking: detect type errors before execution
- Type description language (e.g. RBS) are used
Type Inference: Infer types of codes without type annotations
- Traditional methods usually work well,
  but are not good at generalization, dynamic definition, etc.

Since LLM imitates human thinking,

so it may work well in these cases.

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

RBS Goose

class Bird; end

class Duck < Bird
  def cry; puts "Quack"; end
end

class Goose < Bird
  def cry; puts "Gabble"; end
end

def make_sound(bird)
  bird.cry
end

make_sound(Duck.new)
make_sound(Goose.new)

class Bird
end

class Duck < Bird
  def cry: () -> void
end

class Goose < Bird
  def cry: () -> void
end

class Object
  def make_sound: (Bird arg) -> void
end

lib/bird.rb

sig/bird.rbs

Generate RBS type definitions from Ruby code using LLMs

LLM execution example (ChatGPT)

Prompt (Input Text)

Output Text

LLM Technique: Few-shot Prompting

Few-shot Prompt
(provide some examples)

Zero-shot Prompt
(provide no examples)

(Prompt)
Answer color code.
Q: red
A: #FF0000
Q: blue
A:

(Output)
#0000FF

(Prompt)
Answer color code for blue.

(Output)
The color code for blue depends on the system you're using:

HEX: #0000FF
RGB: (0, 0, 255)
CMYK: (100%, 100%, 0%, 0%)
HSL: (240°, 100%, 50%)
Pantone: PMS 2935 C (approximation)
Would you like codes for a specific shade of blue?

RBS Goose Architecture

Ruby

RBS

Refined RBS

rbs prototype

examples

Prompt

LLM
(e.g. ChatGPT)

Step.1 Generate RBS prototype

Ruby

RBS

Refined RBS

rbs prototype

examples

Prompt

LLM
(e.g. ChatGPT)

class Bird
end

class Duck < Bird
  def cry: () -> untyped
end

class Goose < Bird
  def cry: () -> untyped
end

class Object
  def make_sound: (untyped bird) -> untyped
end

sig/bird.rbs

RBS Goose Architecture

Ruby

RBS

Refined RBS

rbs prototype

examples

Prompt

LLM
(e.g. ChatGPT)

class Example1
  attr_reader :quantity

  def initialize(quantity:)
    @quantity = quantity
  end

  def quantity=(quantity)
    @quantity = quantity
  end
end

lib/example1.rb

class Example1
  @quantity: untyped

  attr_reader quantity: untyped

  def initialize: (quantity: untyped) -> void

  def quantity=: (untyped quantity) -> void
end

sig/example1.rbs

class Example1
  @quantity: Integer

  attr_reader quantity: Integer

  def initialize: (quantity: Integer) -> void

  def quantity=: (Integer quantity) -> void
end

refined/sig/example1.rbs

Ruby

RBS

Refined RBS

examples

Prompt

LLM
(e.g. ChatGPT)

class Example1
  attr_reader :quantity

  def initialize(quantity:)
    @quantity = quantity
  end

  def quantity=(quantity)
    @quantity = quantity
  end
end

lib/example1.rb

class Example1
  @quantity: untyped

  attr_reader quantity: untyped

  def initialize: (quantity: untyped) -> void

  def quantity=: (untyped quantity) -> void
end

sig/example1.rbs

class Example1
  @quantity: Integer

  attr_reader quantity: Integer

  def initialize: (quantity: Integer) -> void

  def quantity=: (Integer quantity) -> void
end

refined/sig/example1.rbs

rbs prototype
(or other tools)

Step.2 Load Few-shot Examples

Ruby

RBS

Refined RBS

rbs prototype

examples

Prompt

LLM
(e.g. ChatGPT)

When ruby source codes and 
RBS type signatures are given, 
refine each RBS type signatures. 

======== Input ========

```lib/example1.rb
...
```

```sig/example1.rbs
...
```

======== Output ========

```sig/example1.rbs
...
```

======== Input ========

```lib/bird.rb
...
```

```sig/bird.rbs
...
```

======== Output ========

Examples

Ruby Code

LLM Infer

RBS Prototype

Step.3 Construct Prompt

Step.4 Parse response and output

Ruby

RBS

Refined RBS

rbs prototype

examples

Prompt

LLM
(e.g. ChatGPT)

```sig/bird.rbs
class Bird
end

class Duck < Bird
  def cry: () -> void
end

class Goose < Bird
  def cry: () -> void
end

class Object
  def make_sound: (Bird arg) -> void
end
```

Key Points

LLMs are not inherently familiar with RBS grammar
- Pre-generate RBS prototypes
- Framing the task as a fill-in-the-blanks problem for untyped
Use Few-shot prompting
- To format the output for easy parsing
- Illustrate RBS unique grammar (such as attr_reader)

RBS Goose Results - Generarization

class Bird; end

class Duck < Bird
  def cry; puts "Quack"; end
end

class Goose < Bird
  def cry; puts "Gabble"; end
end

def make_sound(bird)
  bird.cry
end

# The following is not 
# provided to RBS Goose
# make_sound(Duck.new)
# make_sound(Goose.new)

class Bird
end

class Duck < Bird
  def cry: () -> void
end

class Goose < Bird
  def cry: () -> void
end

class Object
  def make_sound: (Bird arg) -> void
end

lib/bird.rb

sig/bird.rbs

The argument of make_sound is inferred to be Bird.

RBS Goose Results - Dynamic definition

class Dynamic
  ['foo', 'bar'].each do |x|
    define_method("print_#{x}") do
      puts x
    end
  end
end

# The following is not 
# provided to RBS Goose
# d = Dynamic.new
# d.print_foo #=> 'foo'
# d.print_bar #=> 'bar'

class Dynamic
  def print_foo: () -> void

  def print_bar: () -> void
end

lib/dynamic.rb

sig/dynamic.rbs

Correctly infer dynamic method definitions

RBS Goose Results - Proc Arguments

def call(f)
  f.call()
end

f = -> { 'hello' }
p call(f)

# Wrong Syntax
def call: (() -> String f) -> String

lib/call.rb

# Correct Syntax
def call: (^-> String f) -> String

TypeProf

Correct Syntax

Wrong Syntax

Manual evaluation has limitations

ProcType

OptionalType

RecordType

TuppleType

AttributeDefinition

Generics

Mixin

Member Visibility

Ruby on Rails

ActiveSupport

ActiveModel

Refinement

Quine

method_missing

delegete

Need better evaluation methods

Works in small examples, but no metrics of performance
- Unclear what RBS Goose can and cannot do
It's difficult to determine the improvement direction
- How do I check if the change has made it better?
Find out previous studies
- How to evaluate type inference

We need better methods to evaluate type inference.

Let's look at how previous studies have evaluated this.

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

Evaluation Method in Previous Studies

This session will focus on below two studies
- Study 1: Evaluation of SimTyper(Ruby Type Inference Tool)
- Study 2: TypeEvalPy (Python Type Inference Benchmark)

Ruby type inference tool
- Constraint-based inference
Built on RDL, one of the Ruby type checker
- incompatible with RBS

Kazerounian, SimTyper: sound type inference for Ruby using type equality prediction, 2021, OOPSLA 2024

https://dl.acm.org/doi/10.1145/3485483

Previous Study 1: SimTyper

Kazerounian, SimTyper: sound type inference for Ruby using type equality prediction, 2021, OOPSLA 2024

https://dl.acm.org/doi/10.1145/3485483

SimTyper - Evaluation Method

Compare expected and inferred types
for each argument, return value, and variable

def foo: (Array[String], Array[Integer]) -> Array[String]

def foo: (Array[String], Array[String])  ->     void

expected:

inferred:

Match

Match
up to Parameter

Different

SimTyper - Test Data

The following were used as type inference test data
- Four Ruby on Rails apps typed in RDL
  (code.org, Discourse, Journey, Talks)
- Four Ruby libraries with YARD documentation
  (TZInfo, MiniMagick, Ronin, Money)

SimTyper - Evaluation Result

The number of matches can be compared for each method.

SimTyper - Artifacts

The reproducion data is provided
... as a VM image 😢

https://zenodo.org/records/5449078

What we can learn from Study 1

Compare expected and inferred types and count matched
- for each argument, return value, and variable
Ruby libraries and Rails applications are targeted
- Practice-based results
- Any repository with type declarations can be used
Even if it's Ruby, it's hard to use evaluation tools directly

TypeEvalPy - Abstract

Micro-benchmarks for type inference in Python
- Small test cases, categorized by grammatical elements, etc.
Evaluation method is almost the same as SimTyper
- compares for each parameter, return, and variable
- only Exact matches counted

Venkatesh, TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools, 2023, ICSE 2024

https://arxiv.org/abs/2312.16882

Previous Study 2: TypeEvalPy

Venkatesh, TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools, 2023, ICSE 2024

https://arxiv.org/abs/2312.16882

TypeEvalPy: TestCase Categories

https://github.com/secure-software-engineering/TypeEvalPy/tree/main/micro-benchmark/python_features

TypeEvalPy: TestCase

https://github.com/secure-software-engineering/TypeEvalPy/tree/main/micro-benchmark/python_features/args/assigned_call

def param_func():
    return "Hello from param_func"


def func(a):
    return a()


b = param_func
c = func(b)

main.py

[{"file": "main.py",
  "line_number": 4,
  "col_offset": 5,
  "function": "param_func",
  "type": ["str"]},
 {"file": "main.py",
  "line_number": 8,
  "col_offset": 10,
  "parameter": "a",
  "function": "func",
  "type": [ "callable"]},
  // ...

main_gt.json

TypeEvalPy: Benchmark Results

https://arxiv.org/abs/2402.17679

What we can learn from Study 2

Categorised test cases by grammar element, etc.
- Aggregated by category to reveal strengths and weaknesses
Test case is small because it is a micro-benchmark
- Possibility of deviation from practical performance

Based on these studies,

we will now consider how to evaluate RBS Goose's performance.

TypeEvalPy: Results

Category	Total facts	Scalpel
args	43	15
assignments	82	23
builtins	68	0
classes	122	24
decorators	58	19

...

Aggregate by category,
measure strengths and weaknesses.

What we can learn from Previous Studies

Compare expected and inferred types
- for each argument, return value, and variable
- The number of matches can be used as metrics
Two types of test data
- Real-world code: measures practical performance
- Micro benchmark: clarify the strengths and weaknesses

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

Future Prospects in Ruby

Ruby+RBS Type Data Sets
(like ManyTypes4Py)

Provide Training Data

(Embed as examples)

Provide Data for evaluation

Type Benchmark

(like TypeEvalPy)

Evaluate

Develop data sets and benchmarks

to enable performance evaluation

Future Prospects in Ruby (2)

Generate type hints and Embed to prompts

RubyGems

Project Files

gem_rbs_collection

Collect Related Type Hints

TypeEvalRb - Architecture

Comparator

Test data

Expected RBS Types

Ruby Code

Inferred RBS Types

Benchmark Result

Aggregate
Match / Unmatch

TypeEvalRb - Comparation

Construct Comparison Tree from two RBS::Environment
- Currently working on
Traverses Comparison Tree and Calcurate Match Count
- Comparison is done per argument, return value, etc.
- Classify as Match, Match up to parameters, or Different

TypeEvalRb - Comparation

Construct Comparison Tree from two RBS::Environment

# load expected/sig/bird.rbs to RBS::Environment
> loader = RBS::EnvironmentLoader.new
> loader.add(path: Pathname('expected/sig/bird.rbs'))
> env = RBS::Environment.from_loader(loader).resolve_type_names
=> #<RBS::Environment @declarations=(409 items)...>

# RBS::Environment contains ALL types includes stdlib, etc.
> env.class_decls.count
=> 330

# Extract Goose class
> goose = env.class_decls[RBS::Namespace.parse('::Goose').to_type_name]
=> #<RBS::Environment::ClassEntry:0x000000011e478d70 @decls=...>

TypeEvalRb - Comparation

Goose's ClassEntry is... so deeply nested 😅

> pp goose
#<RBS::Environment::ClassEntry:0x000000011f239a40
 @decls=
  [#<struct RBS::Environment::MultiEntry::D
    decl=
     #<RBS::AST::Declarations::Class:0x0000000128d7dd08
      @annotations=[],
      @comment=nil,
      @location=
       #<RBS::Location:371300 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=8:0, pos=61...105, children=keyword,name,end,?type_params,?lt source="class Goose < Bird">,
      @members=
       [#<RBS::AST::Members::MethodDefinition:0x0000000128d7dd58
         @annotations=[],
         @comment=nil,
         @kind=:instance,
         @location=
          #<RBS::Location:371360 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=9:2, pos=82...101, children=keyword,name,?kind,?overloading,?visibility source="def cry: () -> void">,
         @name=:cry,
         @overloading=false,
         @overloads=
          [#<RBS::AST::Members::MethodDefinition::Overload:0x000000011f23a968
            @annotations=[],
            @method_type=
             #<RBS::MethodType:0x0000000128d7dda8
              @block=nil,
              @location=
               #<RBS::Location:371420 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=9:11, pos=91...101, children=type,?type_params source="() -> void">,
              @type=
               #<RBS::Types::Function:0x0000000128d7ddf8
                @optional_keywords={},
                @optional_positionals=[],
                @required_keywords={},
                @required_positionals=[],
                @rest_keywords=nil,
                @rest_positionals=nil,
                @return_type=
                 #<RBS::Types::Bases::Void:0x0000000128892af0
                  @location=
                   #<RBS::Location:371440 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=9:17, pos=97...101, children= source="void">>,
                @trailing_positionals=[]>,
              @type_params=[]>>],
         @visibility=nil>],
      @name=#<RBS::TypeName:0x000000011f23abc0 @kind=:class, @name=:Goose, @namespace=#<RBS::Namespace:0x000000011f23abe8 @absolute=true, @path=[]>>,
      @super_class=
       #<RBS::AST::Declarations::Class::Super:0x000000011f23a9e0
        @args=[],
        @location=
         #<RBS::Location:371540 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=8:14, pos=75...79, children=name,?args source="Bird">,
        @name=
         #<RBS::TypeName:0x000000011f23b160 @kind=:class, @name=:Bird, @namespace=#<RBS::Namespace:0x0000000100cdf6a8 @absolute=true, @path=[]>>>,
      @type_params=[]>,
    outer=[]>],
 @name=#<RBS::TypeName:0x000000011f239a68 @kind=:class, @name=:Goose, @namespace=#<RBS::Namespace:0x000000011f23abe8 @absolute=true, @path=[]>>,
 @primary=nil>

TypeEvalRb - Comparation

Take only defined classes and build a tree structure (lack many things)

> compare_bird
=>
ComparisonTree(
  class_nodes=[
    ClassNode(typename=::Bird,  instance_variables=[  ], methods=[ ])
    ClassNode(typename=::Duck,
      instance_variables=[ ],
      methods=[
        MethodNode(name=cry,  parameters=[  ],
          return_type=TypeNode( expected="void", actual="untyped")
        )])
    ClassNode(typename=::Goose,
      instance_variables=[ ],
      methods=[
        MethodNode(name=cry,  parameters=[  ],
          return_type=TypeNode( expected="void", actual="untyped")
        )])
    ])

TypeEvalRb - Test Data

Micro-benchmark data like TypeEvalPy
- Small test data classified by grammatical elements, etc.
- For detailed evaluation of strengths and weaknesses
Real-world data, similar to that used to evaluate SimTyper
- Libraries and Rails applications with RBS type definitions
- For evaluation of practical performance

TypeEvalRb - Microbenchmark Test Data

Exploring the possibility of

using the GitHub Copilot Workspace for data preparation.

https://githubnext.com/projects/copilot-workspace

TypeEvalRb

https://github.com/kokuyouwind/type_eval_rb

Work in Progress

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

Shared how RBS Goose works and evaluation results
- Better results than traditional methods in some cases
Surveyed evaluation methods in previous studies
- Count matches between expected and inferred types
- Both Micro-Benchmark and real-world data are useful
Shared idea of TypeEvalRb, type inference benchmark
- To reveal inference performance and for future improvement

Preliminary Slides

What is LLM?

Large language model (LLM) is

"Large" "Language Model".

Language Model (LM)

A model assigns probabilities to sequence of words.

["The", "weather", "is"]

A model assigns probabilities to sequence of words.

(50%) "sunny"

(20%) "rainy"

Language Model

(0%) "duck"

"Large" Language Model

The amount of pre-training data size and the model size are “large”.

Pre-training

(non-large) LM

Large Language Model

"Large" Language Model

https://research.google/blog/characterizing-emergent-phenomena-in-large-language-models/

Considerations for LLMs

Because LLMs are one type of language model,
LLM generate text probabilistically.
- Inference process is unclear
- Sometimes they output "plausible nonsense" (hallucinations)

Papers Found

Found many previous papers
Today I pick up two papers
- Measured the type inference capability of LLMs
- Improvement with Chain of Thought Prompts

Paper 1. Measured the type inference capability

Ashwin Prasad Shivarpatna Venkatesh,
The Emergence of Large Language Models in Static Analysis: A First Look through Micro-Benchmarks, 2024, ICSE FORGE 2024

https://arxiv.org/abs/2402.17679

Paper 1. Abstruct

Evaluate LLM type inference accuracy in Python
- LLMs showed higher accuracy than traditional methods
TypeEvalPy is used as a micro-benchmark for type inference
- Comparing the inferred types with the correct data
  (Function Return Type, Function Parameter Type, Local Variable Type)

https://arxiv.org/abs/2402.17679

Paper 1. Prompt

Simple few-shot Prompt
Input: Python code

Output: JSON

You will be provided with the following information:
1. Python code. The sample is delimited with triple backticks.
2. Sample JSON containing type inference information for the Python code in
a specific format.
3. Examples of Python code and their inferred types. The examples are delimited
with triple backticks. These examples are to be used as training data.

Perform the following tasks:
1. Infer the types of various Python elements like function parameters, local
variables, and function return types according to the given JSON format with
the highest probability.
2. Provide your response in a valid JSON array of objects according to the
training sample given. Do not provide any additional information except the JSON object.



Python code:
```
def id_func ( arg ):
    x = arg
    return x

result = id_func (" String ")
result = id_func (1)
```

inferred types in JSON:
[
   {
      "file": "simple_code.py",
      "function": "id_func",
      "line_number": 1,
      "type": [
         "int",
         "str"
      ]
   }, ...

https://github.com/secure-software-engineering/TypeEvalPy/blob/main/src/target_tools/llms/src/prompts.py

Python Code

Specify Output Format (JSON)

Instruction

Paper 1. Results

https://arxiv.org/abs/2402.17679

GPT-4
Score: 775
Time: 454.54

Traditional Method
Score: 321
Time: 18.25

Paper 1. What We Can Learn

Benchmarks for type inference exist, such as TypeEvalPy
- Provide consistent metrics for type inference capability
- Ruby also needs benchmarking
Even Simple prompt can have a higher inference capability
- Note that micro-benchmarks favor LLMs
- Both time and computational costs are high

Paper 2. Improvement with Chain of Thought

Yun Peng, Generative Type Inference for Python, 2023, ASE' 23

https://arxiv.org/abs/2307.09163

Paper 2. Abstruct

Chain of Thought (COT) prompts can be used for type inference
- Improved 27% to 84% compared to the zero-shot prompts
ManyTypes4Py is used for training and evaluation
- Dataset for Machine Learning-based Type Inference
- 80% to Training, 20% to Evaluation
- Measure Exact Match and Match Parametric in Evaluation

https://arxiv.org/abs/2307.09163

Paper 2. Architecture

https://arxiv.org/abs/2307.09163

COT Prompt Construction

Training Data is used

as an example

Generate Type Hints

Paper 2. Prompt

https://arxiv.org/abs/2307.09163

Embedded type derivation process with COT

Embedded type hints

Embedded examples

Paper 2. Results

28-29% improved

than Zero-Shot

in ChatGPT

https://arxiv.org/abs/2307.09163

Paper 2. What We Can Learn

What information is embedded in the prompt is important
- Add Type Hints
- Use chain of thought prompts
Type data sets like ManyTypes4Py are useful
- Can be used as an example prompt as well as for evaluation

Prospective of Type Benchmark - Considerations

Currently, rbs-inline is under development [*]
- it allows type descriptions within special inline comments
- Similar to YARD documentation
Might need to support
- RBS Goose Input / Output
- Benchmark Test Data / Comparator

[*] https://github.com/soutaro/rbs-inline

Do LLMs dream of Type Inference?

Leaner Technologies, Inc. Shunsuke "Kokuyou" Mori

@kokuyouwind

$ whoami

Why Type Inference with LLMs?

Traditional Approach: Algorithmic

Human Approach: Heuristics

LLMs offer the potential for human-like type inference.

RBS Goose

Duck Typing

Goose

RBS Goose: Current State

Previous Research

Previous Research

Today's Focus

Outline

Outline

Type System

Static Type Checking

Static Type Checking: Examples

Type Inference

TypeProf: Mechanism

Tricky Case - Generalization

Tricky Case - Dynamic definition

Summary - Basics of Type System and Type Inference

Since LLM imitates human thinking,

so it may work well in these cases.

Outline

RBS Goose

LLM execution example (ChatGPT)

LLM Technique: Few-shot Prompting

RBS Goose Architecture

Step.1 Generate RBS prototype

RBS Goose Architecture

Step.2 Load Few-shot Examples

Step.3 Construct Prompt

Step.4 Parse response and output

Key Points

RBS Goose Results - Generarization

RBS Goose Results - Dynamic definition

RBS Goose Results - Proc Arguments

Manual evaluation has limitations

Need better evaluation methods

Outline

Evaluation Method in Previous Studies

Previous Study 1: SimTyper

Previous Study 1: SimTyper

SimTyper - Evaluation Method

SimTyper - Test Data

SimTyper - Evaluation Result

SimTyper - Artifacts

What we can learn from Study 1

TypeEvalPy - Abstract

Previous Study 2: TypeEvalPy

TypeEvalPy: TestCase Categories

TypeEvalPy: TestCase

TypeEvalPy: Benchmark Results

What we can learn from Study 2

TypeEvalPy: Results

What we can learn from Previous Studies

Outline

Future Prospects in Ruby

Future Prospects in Ruby (2)

TypeEvalRb - Architecture

TypeEvalRb - Comparation

TypeEvalRb - Comparation

TypeEvalRb - Comparation

TypeEvalRb - Comparation

TypeEvalRb - Test Data

TypeEvalRb - Microbenchmark Test Data

TypeEvalRb

Work in Progress

Outline

Conclusion

Preliminary Slides

What is LLM?

"Large" "Language Model".

Language Model (LM)

"Large" Language Model

"Large" Language Model

Leaner Technologies, Inc.
Shunsuke "Kokuyou" Mori