Text Generation with Kalosm in Rust

Introduction

Kalosm supports two different model types: complete and streaming. Completion models are trained to complete free-form text. Chat models are trained with a chat format and can be used to generate responses to user prompts.

Building a Local Model

You can use Llama::builder() to create a local completion or chat model. You can set the source of the model with with_source to a local or remote file:

use kalosm::language::*;
use std::path::PathBuf;

// Create a builder for a chat model
let phi = Llama::builder()
    // Set the source of the model
    .with_source(LlamaSource::phi_3_5_mini_4k_instruct())
    // Build the model. This will fetch the model from the source if it is not cached.
    .build()
    .await
    .unwrap();

// You can also create a model from a local file
let model = Llama::builder()
    // To use a custom model, you can create a new LlamaSource
    .with_source(LlamaSource::new(
        // Llama source takes a gguf file to load the model, tokenizer, and chat template from
        FileSource::Local(PathBuf::from("path/to/model.gguf")),
    ))
    .build()
    .await
    .unwrap();

Bonus: Download progress

If you need to update progress while you are downloading the model, you can use the bert builder with the build_with_loading_handler method.

// You can also create a model from a local file
let model = Llama::builder()
    .with_source(LlamaSource::phi_3_5_mini_4k_instruct())
    // You can call build_with_loading_handler to get progress updates
    // as the model is being downloaded and loaded
    .build_with_loading_handler(|progress| match progress {
        ModelLoadingProgress::Downloading { source, progress } => {
            let progress_percent = (progress.progress * 100) as u32;
            let elapsed = progress.start_time.elapsed().as_secs_f32();
            println!("Downloading file {source} {progress_percent}% ({elapsed}s)");
        }
        ModelLoadingProgress::Loading { progress } => {
            let progress = (progress * 100.0) as u32;
            println!("Loading model {progress}%");
        }
    })
    .await
    .unwrap();

Building a Remote Model

For remote chat models, you can use OpenAICompatibleChatModel::builder() to create a connection to a remote model. You can set a specific model with one of the presets, or set a custom model id with the with_model method:

let llm = OpenAICompatibleChatModel::builder()
    .with_gpt_4o_mini()
    .build();

Text Completion

Once you have a text completion model, you can call the model like a function with the text you need to complete. Before you await the response, you can modify the builder with additional settings like the maximum length of the generated text. Once you are done, you can await the response to get the full generated text:

let text = model("The capital of France is")
    // Set any options you need
    .with_sampler(GenerationParameters::default().with_max_length(1000))
    // Once you are done, call `await` to get the generated text
    .await
    .unwrap();
println!("{}", text);

Instead of awaiting the full result, you can treat the response builder like a stream and read each token as it is generated:

let mut text_stream = model("The capital of France is")
    // Set any options you need
    .with_sampler(GenerationParameters::default().with_max_length(1000));

// Pipe the generated text to stdout or read individual tokens with the `next` method from the stream ext trait
text_stream.to_std_out().await.unwrap();

Chat Models

If you have a chat model like OpenAICompatibleChatModel or Llama with a chat source, you can use the chat method to start a chat session with the model. You can call the chat session with a new message to create a response builder. You can modify the builder with settings and then await the response or stream the response:

// Create a chat session
let mut chat = phi.chat();

loop {
    let message = prompt_input("\n> ").unwrap();
    // Add a message to the chat session
    let mut response = chat(&message);

    // Stream the response to stdout
    response.to_std_out().await.unwrap();
}

Conclusion

This chapter demonstrated both completion and chat models. Experiment with different prompts, configurations, and models to customize the generated text to your needs. If you need more control over the text generation process, check out the next chapter on structured generation