Christopher Pitt
Writer and coder, working at ringier.co.za
User asks question
Robot thinks
Robot answers question
User asks question
(using voice)
Robot thinks
(using chatgpt?)
Robot answers question
(using audio)
$relativePath = fmt('%/%', messageAudioFolderPath(), $fileName);
$type = Storage::mimeType($relativePath);
$response = Http::acceptJson()
->withHeaders([
'Authorization' => 'Token '.config('services.deepgram.secret'),
])
->withBody(
Storage::get($relativePath), $type
)
->post('https://api.deepgram.com/v1/listen');
if ($alternatives = $response->json('results.channels.0.alternatives')) {
return $alternatives[0]['transcript'];
}
throw new ConvertQuestionAudioToTextException(
'Could not convert question audio to text'
);
<div
class="container mx-auto"
x-data="{
isRecording: false,
blob: @entangle('blob').defer,
async startRecording() {
this.isRecording = true;
return window.AudioRecorder.start();
},
async stopRecording() {
var blob = await window.AudioRecorder.stop();
var base64 = await this.blobToBase64(blob);
this.isRecording = false;
this.blob = base64;
},
blobToBase64(blob) {
return new Promise(resolve => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result);
reader.readAsDataURL(blob);
});
},
}"
>
By Reema Alzohairi
if (! function_exists('writeAudioFile')) {
function writeAudioFile(string $blob): string
{
$tempName = tempnam(sys_get_temp_dir(), 'question');
file_put_contents($tempName, file_get_contents($blob));
$path = Storage::putFile(
messageAudioFolderPath(),
new File($tempName),
);
return str($path)->afterLast('/');
}
}
$tempName = tempnam(sys_get_temp_dir(), 'answer').'.mp3';
Http::sink($tempName)
->withOptions([
'curl' => [
CURLOPT_USERPWD => 'apikey:'.config('services.watson-tts.secret'),
],
])
->accept('audio/mpeg')
->post(config('services.watson-tts.url').'/v1/synthesize', [
'text' => $text,
])
->onError(function (Response $response) {
throw new ConvertAnswerTextToAudioException($response->body());
});
if ($path = Storage::putFile(messageAudioFolderPath(), new File($tempName))) {
return str($path)->afterLast('/');
}
throw new ConvertAnswerTextToAudioException('Could not write answer audio file');
@if ($message->audio_file_name)
<audio
controls
autoplay
>
<source
src="/storage/messages/{{ $message->audio_file_name }}"
type="audio/mpeg"
>
Your browser does not support the audio element.
</audio>
@endif
$client = OpenAI::client(config('services.openai.secret'));
$messages = $conversation
->messages()
->get()
->map(fn(Message $message) => [
'role' => $message->role,
'content' => $message->text,
]);
$response = $client->chat()->create([
'model' => 'gpt-3.5-turbo',
'messages' => $messages,
'max_tokens' => $maxTokens,
]);
if (isset($response['choices'][0]['message']['content'])) {
return trim($response['choices'][0]['message']['content']);
}
throw new AskAiException($response->toArray());
class Message extends Model
{
use HasFactory;
use HasStates;
protected $guarded = [];
protected $casts = [
'state' => MessageState::class,
];
public function conversation(): BelongsTo
{
return $this->belongsTo(Conversation::class);
}
}
abstract class MessageState extends State
{
public static function config(): StateConfig
{
return parent::config()
->registerState([
AskingAiMessageState::class,
ConvertingAnswerTextToAudioMessageState::class,
ConvertingQuestionAudioToTextMessageState::class,
DoneMessageState::class,
FailedMessageState::class,
PrepareQuestionAudioMessageState::class,
QueuedMessageState::class,
])
->default(
QueuedMessageState::class
)
// ...
// system setting the prompt
->allowTransition(
QueuedMessageState::class,
DoneMessageState::class,
)
->allowTransition(
QueuedMessageState::class,
FailedMessageState::class,
)
// user asking a question
->allowTransition(
QueuedMessageState::class,
PrepareQuestionAudioMessageState::class,
)
->allowTransition(
QueuedMessageState::class,
AskingAiMessageState::class,
)
$this->message->state->transitionTo(
PrepareQuestionAudioMessageState::class
);
$this->prepareQuestionAudio();
$this->message->state->transitionTo(
ConvertingQuestionAudioToTextMessageState::class
);
$this->convertQuestionAudioToText();
// ...
} catch (ConvertQuestionAudioToTextException $e) {
$this->message->state->transitionTo(
FailedMessageState::class
);
$this->message->reason_failed = $e->getMessage();
$this->message->save();
}
try {
FFMpeg::open(fmt('%/%', messageAudioFolderPath(), $fileName))
->export()
->inFormat(new Mp3())
->save(fmt(
'%/%.%',
messageAudioFolderPath(),
str($fileName)->beforeLast('.'),
'mp3',
));
} catch (Throwable $e) {
throw new ConvertQuestionVideoToAudioException(
'Could not convert question video to audio'
);
}
public function mount(): void
{
if ($id = session()->get('conversation')) {
$this->conversation = Conversation::findOrFail($id);
return;
}
$conversation = Conversation::create([
'is_manager' => true,
]);
$message = $conversation
->messages()
->create([
'role' => 'system',
'text' => file_get_contents(config('prompts.manager')),
]);
$message->state->transitionTo(DoneMessageState::class);
session()->put('conversation', $conversation->id);
$this->conversation = $conversation;
}
Here are the list of commands:
- [list conversations]
- [create conversation]
- [forward message 'x' to conversation ID y] (where 'x' is the message you want to forward and 'y' is the ID of the conversation you are busy remembering)
- [reset]
These are the only things I want you to say. I do not want you to answer my questions directly because that is not your job. You are a conversation manager, acting as a LLM API between my database and nested conversations.
When I send you a message, and you do not already have an appropriate conversation ID in your memory, you look for an appropriate conversation using the '[list conversations]' command. I will give you back a list of conversations. You pick based on their summary, and you remember the ID of the conversation.
If I tell you there are no conversations, you issue the '[create conversation]' command so that I can create a new conversation. I will tell you an ID that you cna remember.
If you have a conversation ID in memory then you can forward my original message to it with the '[forward conversation ...]' command.
For example:
I say: "I want to paint my house"
You say: "[list conversations]" (because you don't have a conversation ID in memory yet)
I say: "there are no conversations"
You say: "[create conversation]"
I say: "created conversation with ID 1"
You say: "[forward message 'I want to paint my house' to conversation 1]"
Another example:
I say: "I want to paint my house"
You say: "[list conversations]" (because you don't have a conversation ID in memory yet)
I say: "conversations: conversation about 'riding a bicycle' ID of 2"
You say: "[create conversation]" (because none of the conversations are relevant)
I say: "created conversation with ID 3"
You say: "[forward message 'I want to paint my house' to conversation 3]"
Another example:
I say: "I want to paint my house"
You say: "[list conversations]" (because you don't have a conversation ID in memory yet)
I say: "conversations: conversation about 'painting houses' ID of 3"
You say: "[forward message 'I want to paint my house' to conversation 3]" (because that is the most relevant conversation)
Another example:
I say: "I want to reset" or "reset" or "let's start over" or something similar
You say: "[reset]"
- splitting conversations when they get too big
- merging conversations
- updating conversation summaries
- saving/loading context to enable branching
- forgetting conversations
- define global (per user) preferences
- allow dynamic values
- prompt nested conversations with these
- allow changing them mid-conversation
By Christopher Pitt