Do LLMs dream of Type Inference?

Leaner Technologies, Inc.
Shunsuke "Kokuyou" Mori

@kokuyouwind

$ whoami

Name:   Shunsuke Mori
Handle: Kokuyou (黒曜)

from Japan 🇯🇵
(First time in the US / at RubyConf)

Work: Leaner Technologies, Inc.
Hobby Project:

  Developing an LLM-based type inference tool

@kokuyouwind

@kokuyouwind

Why Type Inference with LLMs?

class Bird; end

class Duck < Bird
  def cry; puts "Quack"; end
end

class Goose < Bird
  def cry; puts "Gabble"; end
end

def make_sound(bird)
  bird.cry
end

make_sound(Duck.new)
make_sound(Goose.new)

What is the argument type of make_sound method?

Traditional Approach: Algorithmic

make_sound(Duck.new)
make_sound(Goose.new)

Called with Duck

Called with Goose

The argument type of make_sound is (Duck | Goose)

Human Approach: Heuristics

class Bird; end

def make_sound(bird)
  bird.cry
end

The argument name is bird.

The argument type of make_sound is Bird

There is Bird class.

Algorithms are great at logic, but lack heuristic understanding.

LLMs offer the potential for human-like type inference.

I developed

as a tool to guess RBS types using LLMs.

RBS Goose

Generate RBS type definitions from Ruby code using LLMs
(Presented at RubyKaigi 2024)

Duck

quacking like geese

Duck

quacking like geese

Duck

Duck  Typing

quacking like geese

Goose

RBS Goose: Current State

Ruby

RBS

some small

:

How capable is

:

We will need some metrics of RBS Goose performance.

Previous Research

  • Ruby SimTyper: Research of type inference

    • Covers several libraries and Rails Applications

    • Not directly available due to different type formats

  • Python TypeEvalPy: Type Inference Micro-benchmark

    • ​Covers grammatical elements / typing context

Previous Research

A lot of papers exists... on Python​​🐍

  • Explain how RBS Goose works with LLM and evaluate

    • Better results than traditional methods in several cases

  • Share the idea of a type inference benchmark I planned

    • Referring to previous studies

Today's Focus

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

Type System

  • A mechanism to classify the components of a program

    • Strings, numbers, etc.

    • To prevent invalid operations

  • Ruby is a dynamically typed language

    • 1 + 'a' : TypeError is raised at runtime

    • 1 + 'a' if false : TypeError is not raised

Static Type Checking

  • A mechanism to detect type errors before execution

  • Need to know the type of each part of the code

    • Ruby does not use type annotations in its code

    • Define types with RBS / Checking with Steep

    • (Other options include RBI / Sorbet, and RDL, but we will not cover in this session)

Static Type Checking: Examples

  • For 1 + 'a', we can detect a type error if we know...

    • 1 is an Integer

    • 'a' is a String

    • Integer#+ cannot accept a String

class Integer
  def +: (Integer) -> Integer
  # ...
end

Type Inference

  • Mechanism to infer types of code without explicit annotations

    • For performing static type checks

    • To generate types for Ruby code without type definitions

  • TypeProf: Ruby / RBS type inference tool

    • Tracking data flow in variable assignments and method calls
      (Dataflow Analysis)

TypeProf: Mechanism

def foo: (Integer n) -> String

Tricky Case - Generalization

class Bird; end

class Duck < Bird
  def cry; puts "Quack"; end
end

class Goose < Bird
  def cry; puts "Gabble"; end
end

def make_sound(bird)
  bird.cry
end

make_sound(Duck.new)
make_sound(Goose.new)
class Bird
end

class Duck < Bird
  def cry: -> nil
end

class Goose < Bird
  def cry: -> nil
end

class Object
  def make_sound: (Duck | Goose) -> nil
end

lib/bird.rb

sig/bird.rbs

The argument type of make_sound is

infered as a union of subtypes.

TypeProf

Tricky Case - Dynamic definition

class Dynamic
  ['foo', 'bar'].each do |x|
    define_method("print_#{x}") do
      puts x
    end
  end
end

d = Dynamic.new
d.print_foo #=> 'foo'
d.print_bar #=> 'bar'
class Dynamic
end

lib/dynamic.rb

sig/dynamic.rbs

[error] undefined method: Dynamic#print_foo
[error] undefined method: Dynamic#print_bar

TypeProf

Summary - Basics of Type System and Type Inference

  • Type System: prevent invalid operation of the program

    • Ruby has a dynamic type system

  • Static Type Checking: detect type errors before execution

    • Type description language (e.g. RBS) are used

  • Type Inference: Infer types of codes without type annotations

    • Traditional methods usually work well,
      but are not good at generalization, dynamic definition, etc.

Since LLM imitates human thinking,

so it may work well in these cases.

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

RBS Goose

class Bird; end

class Duck < Bird
  def cry; puts "Quack"; end
end

class Goose < Bird
  def cry; puts "Gabble"; end
end

def make_sound(bird)
  bird.cry
end

make_sound(Duck.new)
make_sound(Goose.new)
class Bird
end

class Duck < Bird
  def cry: () -> void
end

class Goose < Bird
  def cry: () -> void
end

class Object
  def make_sound: (Bird arg) -> void
end
  

lib/bird.rb

sig/bird.rbs

Generate RBS type definitions from Ruby code using LLMs

LLM execution example (ChatGPT)

Prompt (Input Text)

Output Text

LLM Technique: Few-shot Prompting

Few-shot Prompt
(provide some examples)

Zero-shot Prompt
(provide no examples)

(Prompt)
Answer color code.
Q: red
A: #FF0000
Q: blue
A:

(Output)
#0000FF
(Prompt)
Answer color code for blue.

(Output)
The color code for blue depends on the system you're using:

HEX: #0000FF
RGB: (0, 0, 255)
CMYK: (100%, 100%, 0%, 0%)
HSL: (240°, 100%, 50%)
Pantone: PMS 2935 C (approximation)
Would you like codes for a specific shade of blue?

RBS Goose Architecture

Ruby

RBS

Refined RBS

rbs prototype

examples

Prompt

LLM
(e.g. ChatGPT)

Step.1 Generate RBS prototype

Ruby

RBS

Refined RBS

rbs prototype

examples

Prompt

LLM
(e.g. ChatGPT)

class Bird
end

class Duck < Bird
  def cry: () -> untyped
end

class Goose < Bird
  def cry: () -> untyped
end

class Object
  def make_sound: (untyped bird) -> untyped
end

sig/bird.rbs

RBS Goose Architecture

Ruby

RBS

Refined RBS

rbs prototype

examples

Prompt

LLM
(e.g. ChatGPT)

class Example1
  attr_reader :quantity

  def initialize(quantity:)
    @quantity = quantity
  end

  def quantity=(quantity)
    @quantity = quantity
  end
end

lib/example1.rb

class Example1
  @quantity: untyped

  attr_reader quantity: untyped

  def initialize: (quantity: untyped) -> void

  def quantity=: (untyped quantity) -> void
end

sig/example1.rbs

class Example1
  @quantity: Integer

  attr_reader quantity: Integer

  def initialize: (quantity: Integer) -> void

  def quantity=: (Integer quantity) -> void
end

refined/sig/example1.rbs

Ruby

RBS

Refined RBS

examples

Prompt

LLM
(e.g. ChatGPT)

class Example1
  attr_reader :quantity

  def initialize(quantity:)
    @quantity = quantity
  end

  def quantity=(quantity)
    @quantity = quantity
  end
end

lib/example1.rb

class Example1
  @quantity: untyped

  attr_reader quantity: untyped

  def initialize: (quantity: untyped) -> void

  def quantity=: (untyped quantity) -> void
end

sig/example1.rbs

class Example1
  @quantity: Integer

  attr_reader quantity: Integer

  def initialize: (quantity: Integer) -> void

  def quantity=: (Integer quantity) -> void
end

refined/sig/example1.rbs

rbs prototype
(or other tools)

Step.2 Load Few-shot Examples

Ruby

RBS

Refined RBS

rbs prototype

examples

Prompt

LLM
(e.g. ChatGPT)

When ruby source codes and 
RBS type signatures are given, 
refine each RBS type signatures. 

======== Input ========

```lib/example1.rb
...
```

```sig/example1.rbs
...
```

======== Output ========

```sig/example1.rbs
...
```

======== Input ========

```lib/bird.rb
...
```

```sig/bird.rbs
...
```

======== Output ========

Examples

Ruby Code

LLM Infer

RBS Prototype

Step.3 Construct Prompt

Step.4 Parse response and output

Ruby

RBS

Refined RBS

rbs prototype

examples

Prompt

LLM
(e.g. ChatGPT)

```sig/bird.rbs
class Bird
end

class Duck < Bird
  def cry: () -> void
end

class Goose < Bird
  def cry: () -> void
end

class Object
  def make_sound: (Bird arg) -> void
end
```

Key Points

  • LLMs are not inherently familiar with RBS grammar

    • Pre-generate RBS prototypes

    • Framing the task as a fill-in-the-blanks problem for untyped

  • Use Few-shot prompting

    • To format the output for easy parsing

    • Illustrate RBS unique grammar (such as attr_reader)

RBS Goose Results - Generarization

class Bird; end

class Duck < Bird
  def cry; puts "Quack"; end
end

class Goose < Bird
  def cry; puts "Gabble"; end
end

def make_sound(bird)
  bird.cry
end

# The following is not 
# provided to RBS Goose
# make_sound(Duck.new)
# make_sound(Goose.new)
class Bird
end

class Duck < Bird
  def cry: () -> void
end

class Goose < Bird
  def cry: () -> void
end

class Object
  def make_sound: (Bird arg) -> void
end
  

lib/bird.rb

sig/bird.rbs

The argument of make_sound is inferred to be Bird.

RBS Goose Results - Dynamic definition

class Dynamic
  ['foo', 'bar'].each do |x|
    define_method("print_#{x}") do
      puts x
    end
  end
end

# The following is not 
# provided to RBS Goose
# d = Dynamic.new
# d.print_foo #=> 'foo'
# d.print_bar #=> 'bar'
class Dynamic
  def print_foo: () -> void

  def print_bar: () -> void
end

lib/dynamic.rb

sig/dynamic.rbs

Correctly infer dynamic method definitions

RBS Goose Results - Proc Arguments

def call(f)
  f.call()
end

f = -> { 'hello' }
p call(f)
# Wrong Syntax
def call: (() -> String f) -> String

lib/call.rb

# Correct Syntax
def call: (^-> String f) -> String

TypeProf

Correct Syntax

Wrong Syntax

Manual evaluation has limitations

ProcType

OptionalType

RecordType

TuppleType

AttributeDefinition

Generics

Mixin

Member Visibility

Ruby on Rails

ActiveSupport

ActiveModel

Refinement

Quine

method_missing

delegete

Need better evaluation methods

  • Works in small examples, but no metrics of performance

    • ​Unclear what RBS Goose can and cannot do

  • It's difficult to determine the improvement direction

    • How do I check if the change has made it better?

  • Find out previous studies

    • How to evaluate type inference

We need better methods to evaluate type inference.

Let's look at how previous studies have evaluated this.

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

Evaluation Method in Previous Studies

  • This session will focus on below two studies

    • Study 1: Evaluation of SimTyper(Ruby Type Inference Tool)

    • Study 2: TypeEvalPy​ (Python Type Inference Benchmark)

  • Ruby type inference tool

    • Constraint-based inference

  • Built on RDL, one of the Ruby type checker

    • incompatible with RBS

Kazerounian, SimTyper: sound type inference for Ruby using type equality prediction,  2021, OOPSLA 2024

Previous Study 1: SimTyper

Previous Study 1: SimTyper

Kazerounian, SimTyper: sound type inference for Ruby using type equality prediction,  2021, OOPSLA 2024

SimTyper - Evaluation Method

Compare expected and inferred types
for each argument, return value, and variable

def foo: (Array[String], Array[Integer]) -> Array[String]
def foo: (Array[String], Array[String])  ->     void

expected:

inferred:

Match

Match
up to Parameter

Different

SimTyper - Test Data

SimTyper - Evaluation Result

The number of matches can be compared for each method.

SimTyper - Artifacts

The reproducion data is provided
... as a VM image 😢

What we can learn from Study 1

  • Compare expected and inferred types and count matched

    • for each argument, return value, and variable

  • Ruby libraries and Rails applications are targeted

    • Practice-based results

    • Any repository with type declarations can be used

  • Even if it's Ruby, it's hard to use evaluation tools directly

TypeEvalPy - Abstract

  • Micro-benchmarks for type inference in Python

    • Small test cases, categorized by grammatical elements, etc.

  • Evaluation method is almost the same as SimTyper

    • compares for each parameter, return, and variable

    • only Exact matches counted

Venkatesh, TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools,  2023, ICSE 2024

Previous Study 2: TypeEvalPy

Venkatesh, TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools,  2023, ICSE 2024

TypeEvalPy: TestCase Categories

TypeEvalPy: TestCase

def param_func():
    return "Hello from param_func"


def func(a):
    return a()


b = param_func
c = func(b)

main.py

[{"file": "main.py",
  "line_number": 4,
  "col_offset": 5,
  "function": "param_func",
  "type": ["str"]},
 {"file": "main.py",
  "line_number": 8,
  "col_offset": 10,
  "parameter": "a",
  "function": "func",
  "type": [ "callable"]},
  // ...

main_gt.json

TypeEvalPy: Benchmark Results

What we can learn from Study 2

  • Categorised  test cases by grammar element, etc.

    • Aggregated by category to reveal strengths and weaknesses

  • Test case is small because it is a micro-benchmark

    • Possibility of deviation from practical performance

Based on these studies,

we will now consider how to evaluate RBS Goose's performance.

TypeEvalPy: Results

Category Total facts Scalpel
args 43 15
assignments 82 23
builtins 68 0
classes 122 24
decorators 58 19

...

Aggregate by category,
measure strengths and weaknesses.

What we can learn from Previous Studies

  • Compare expected and inferred types

    • for each argument, return value, and variable

    • The number of matches can be used as metrics

    Two types of test data

    • Real-world code: measures practical performance

    • Micro benchmark: clarify the strengths and weaknesses

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

Future Prospects in Ruby

Ruby+RBS Type Data Sets
(like
ManyTypes4Py)

Provide Training Data

(Embed as examples)

Provide Data for evaluation

Type Benchmark

(like TypeEvalPy)

Evaluate

Develop data sets and benchmarks

to enable performance evaluation

Future Prospects in Ruby (2)

Generate type hints and Embed to prompts

RubyGems

Project Files

gem_rbs_collection

Collect Related Type Hints

TypeEvalRb - Architecture

Comparator

Test data

Expected RBS Types

Ruby Code

Inferred RBS Types

Benchmark Result

Aggregate
Match / Unmatch

TypeEvalRb - Comparation

  • Construct Comparison Tree from two RBS::Environment

    • Currently working on

  • Traverses Comparison Tree and Calcurate Match Count

    • Comparison is done per argument, return value, etc.

    • Classify as Match, Match up to parameters, or Different

TypeEvalRb - Comparation

Construct Comparison Tree from two RBS::Environment

# load expected/sig/bird.rbs to RBS::Environment
> loader = RBS::EnvironmentLoader.new
> loader.add(path: Pathname('expected/sig/bird.rbs'))
> env = RBS::Environment.from_loader(loader).resolve_type_names
=> #<RBS::Environment @declarations=(409 items)...>

# RBS::Environment contains ALL types includes stdlib, etc.
> env.class_decls.count
=> 330

# Extract Goose class
> goose = env.class_decls[RBS::Namespace.parse('::Goose').to_type_name]
=> #<RBS::Environment::ClassEntry:0x000000011e478d70 @decls=...>

TypeEvalRb - Comparation

Goose's ClassEntry is... so deeply nested 😅

> pp goose
#<RBS::Environment::ClassEntry:0x000000011f239a40
 @decls=
  [#<struct RBS::Environment::MultiEntry::D
    decl=
     #<RBS::AST::Declarations::Class:0x0000000128d7dd08
      @annotations=[],
      @comment=nil,
      @location=
       #<RBS::Location:371300 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=8:0, pos=61...105, children=keyword,name,end,?type_params,?lt source="class Goose < Bird">,
      @members=
       [#<RBS::AST::Members::MethodDefinition:0x0000000128d7dd58
         @annotations=[],
         @comment=nil,
         @kind=:instance,
         @location=
          #<RBS::Location:371360 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=9:2, pos=82...101, children=keyword,name,?kind,?overloading,?visibility source="def cry: () -> void">,
         @name=:cry,
         @overloading=false,
         @overloads=
          [#<RBS::AST::Members::MethodDefinition::Overload:0x000000011f23a968
            @annotations=[],
            @method_type=
             #<RBS::MethodType:0x0000000128d7dda8
              @block=nil,
              @location=
               #<RBS::Location:371420 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=9:11, pos=91...101, children=type,?type_params source="() -> void">,
              @type=
               #<RBS::Types::Function:0x0000000128d7ddf8
                @optional_keywords={},
                @optional_positionals=[],
                @required_keywords={},
                @required_positionals=[],
                @rest_keywords=nil,
                @rest_positionals=nil,
                @return_type=
                 #<RBS::Types::Bases::Void:0x0000000128892af0
                  @location=
                   #<RBS::Location:371440 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=9:17, pos=97...101, children= source="void">>,
                @trailing_positionals=[]>,
              @type_params=[]>>],
         @visibility=nil>],
      @name=#<RBS::TypeName:0x000000011f23abc0 @kind=:class, @name=:Goose, @namespace=#<RBS::Namespace:0x000000011f23abe8 @absolute=true, @path=[]>>,
      @super_class=
       #<RBS::AST::Declarations::Class::Super:0x000000011f23a9e0
        @args=[],
        @location=
         #<RBS::Location:371540 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=8:14, pos=75...79, children=name,?args source="Bird">,
        @name=
         #<RBS::TypeName:0x000000011f23b160 @kind=:class, @name=:Bird, @namespace=#<RBS::Namespace:0x0000000100cdf6a8 @absolute=true, @path=[]>>>,
      @type_params=[]>,
    outer=[]>],
 @name=#<RBS::TypeName:0x000000011f239a68 @kind=:class, @name=:Goose, @namespace=#<RBS::Namespace:0x000000011f23abe8 @absolute=true, @path=[]>>,
 @primary=nil>

TypeEvalRb - Comparation

Take only defined classes and build a tree structure (lack many things)

> compare_bird
=>
ComparisonTree(
  class_nodes=[
    ClassNode(typename=::Bird,  instance_variables=[  ], methods=[ ])
    ClassNode(typename=::Duck,
      instance_variables=[ ],
      methods=[
        MethodNode(name=cry,  parameters=[  ],
          return_type=TypeNode( expected="void", actual="untyped")
        )])
    ClassNode(typename=::Goose,
      instance_variables=[ ],
      methods=[
        MethodNode(name=cry,  parameters=[  ],
          return_type=TypeNode( expected="void", actual="untyped")
        )])
    ])

TypeEvalRb - Test Data

  • Micro-benchmark data like TypeEvalPy

    • Small test data classified by grammatical elements, etc.

    • For detailed evaluation of strengths and weaknesses

  • Real-world data, similar to that used to evaluate SimTyper

    • Libraries and Rails applications with RBS type definitions

    • For evaluation of practical performance

TypeEvalRb - Microbenchmark Test Data

Exploring the possibility of

using the GitHub Copilot Workspace for data preparation.

TypeEvalRb

Work in Progress

Outline

Basics of Type System and Type Inference

RBS Goose Architecture and Evaluation

Evaluation Method in Previous Studies

The idea of TypeEvalRb

Conclusion

Conclusion

  • Shared how RBS Goose works and evaluation results

    • Better results than traditional methods in some cases

  • Surveyed evaluation methods in previous studies

    • Count matches between expected and inferred types

    • Both Micro-Benchmark and real-world data are useful

  • Shared idea of  TypeEvalRb, type inference benchmark

    • To reveal inference performance and for future improvement 

Preliminary Slides

What is LLM?

Large language model (LLM) is

"Large" "Language Model".

Language Model (LM)

A model assigns probabilities to sequence of words.

["The", "weather", "is"]

A model assigns probabilities to sequence of words.

(50%) "sunny"

(20%) "rainy"

Language Model

(0%) "duck"

"Large" Language Model

The amount of pre-training data size and the model size are “large”.

Pre-training

Pre-training

(non-large) LM

Large Language Model

"Large" Language Model

Considerations for LLMs

  • Because LLMs are one type of language model,
    LLM generate text probabilistically.

    • Inference process is unclear

    • Sometimes they output "plausible nonsense" (hallucinations)

Papers Found

  • Found many previous papers

  • Today I pick up two papers

    • Measured the type inference capability of LLMs

    • Improvement with Chain of Thought Prompts

Paper 1. Measured the type inference capability

Ashwin Prasad Shivarpatna Venkatesh,
The Emergence of Large Language Models in Static Analysis: A First Look through Micro-Benchmarks, 2024, ICSE FORGE 2024

Paper 1. Abstruct

  • Evaluate LLM type inference accuracy in Python

    • LLMs showed higher accuracy than traditional methods

  • TypeEvalPy is used as a micro-benchmark for type inference

    • Comparing the inferred types with the correct data
      (Function Return Type, Function Parameter Type, Local Variable Type)

Paper 1. Prompt

Simple few-shot Prompt
Input: Python code

Output: JSON

You will be provided with the following information:
1. Python code. The sample is delimited with triple backticks.
2. Sample JSON containing type inference information for the Python code in
a specific format.
3. Examples of Python code and their inferred types. The examples are delimited
with triple backticks. These examples are to be used as training data.

Perform the following tasks:
1. Infer the types of various Python elements like function parameters, local
variables, and function return types according to the given JSON format with
the highest probability.
2. Provide your response in a valid JSON array of objects according to the
training sample given. Do not provide any additional information except the JSON object.



Python code:
```
def id_func ( arg ):
    x = arg
    return x

result = id_func (" String ")
result = id_func (1)
```

inferred types in JSON:
[
   {
      "file": "simple_code.py",
      "function": "id_func",
      "line_number": 1,
      "type": [
         "int",
         "str"
      ]
   }, ...

Python Code

Specify Output Format (JSON)

Instruction

Paper 1. Results

GPT-4
  Score: 775
  Time: 454.54

Traditional Method
  Score: 321
  Time: 18.25

Paper 1. What We Can Learn

  • ​Benchmarks for type inference exist, such as TypeEvalPy

    • Provide consistent metrics for type inference capability

    • Ruby also needs benchmarking

  • Even Simple prompt can have a higher inference capability

    • Note that micro-benchmarks favor LLMs

    • Both time and computational costs are high

Paper 2. Improvement with Chain of Thought

Yun Peng, Generative Type Inference for Python, 2023, ASE' 23

Paper 2. Abstruct

  • Chain of Thought (COT) prompts can be used for type inference

    • Improved 27% to 84% compared to the zero-shot prompts

  • ManyTypes4Py is used for training and evaluation

    • Dataset for Machine Learning-based Type Inference

    • 80% to Training, 20% to Evaluation

    • Measure Exact Match and Match Parametric in Evaluation

Paper 2. Architecture

COT Prompt Construction

Training Data is used

as an example

Generate Type Hints

Paper 2. Prompt

Embedded type derivation process with COT

Embedded type hints

Embedded examples

Paper 2. Results

28-29% improved

 than Zero-Shot

 in ChatGPT

Paper 2. What We Can Learn

  • What information is embedded in the prompt is important

    • Add Type Hints

    • Use chain of thought prompts

  • Type data sets like ManyTypes4Py are useful

    • Can be used as an example prompt as well as for evaluation

Prospective of Type Benchmark - Considerations

  • Currently, rbs-inline is under development [*]

    • it allows type descriptions within special inline comments

    • Similar to YARD documentation

  • Might need to support

    • RBS Goose Input / Output

    • Benchmark Test Data / Comparator

Do LLMs dream of Type Inference?

By 黒曜

Do LLMs dream of Type Inference?

Session of RubyConf 2024

  • 35