Study of the performance of code generated by Large Language Models
+
The rest of my thesis
Tristan COIGNION
Spirals Team Seminar 2023
AI uses a lot of resources for training
(e.g. BLOOM : $2-5M equivalent in cloud computing)
Is it really worth the cost ?
For inference, Large Language Models need many GPUs and electricity
What is the environmental impact of an LLM when used for coding ?
How does the time LLMs save compares to the energy they cost ?
Could LLMs be used for creating greener code ?
How does the performance of the generated code varies ?
Is there a difference in terms of code performance between different LLMs ?
Problem example on Leetcode
The temperature of a model is a parameter regulating the "creativity" and the randomness of the model's generations.
For every model, we made the temperature vary between 0, 0.2, 0.4, 0.6, 0.8 and 1.0.
With every model, we generated 10 solutions for 300 Leetcode problems, and we measured the run times of the valid ones.
LLMs trained on English text are generally better at generating correct code
The performance of the generated code does increase with the model capabilities, but not by a lot
Measure of performance (lower is better)
When comparing Codex (success rate of 82%)Â and SantaCoder (success rate of 20%) code performance it appears that :
The higher the temperature, the worse the overall performance of the code
Having a higher temperature actually increases the chance of generating an inefficient solution
Examples of distributions of the runtime in seconds of the generations for two problems
(one dot = one generation) (high = slow)
Any questions ?