Text Generation with Kalosm in Rust
Introduction
Kalosm supports two different model types: complete and streaming. Completion models are trained to complete free-form text. Chat models are trained with a chat format and can be used to generate responses to user prompts.
Building a Local Model
You can use Llama::builder()
to create a local completion or chat model. You can set the source of the model with with_source
to a local or remote file:
use kalosm::language::*; use std::path::PathBuf; // Create a builder for a chat model let phi = Llama::builder() // Set the source of the model .with_source(LlamaSource::phi_3_5_mini_4k_instruct()) // Build the model. This will fetch the model from the source if it is not cached. .build() .await .unwrap(); // You can also create a model from a local file let model = Llama::builder() // To use a custom model, you can create a new LlamaSource .with_source(LlamaSource::new( // Llama source takes a gguf file to load the model, tokenizer, and chat template from FileSource::Local(PathBuf::from("path/to/model.gguf")), )) .build() .await .unwrap();
Bonus: Download progress
If you need to update progress while you are downloading the model, you can use the bert builder with the
build_with_loading_handler
method.// You can also create a model from a local file let model = Llama::builder() .with_source(LlamaSource::phi_3_5_mini_4k_instruct()) // You can call build_with_loading_handler to get progress updates // as the model is being downloaded and loaded .build_with_loading_handler(|progress| match progress { ModelLoadingProgress::Downloading { source, progress } => { let progress_percent = (progress.progress * 100) as u32; let elapsed = progress.start_time.elapsed().as_secs_f32(); println!("Downloading file {source} {progress_percent}% ({elapsed}s)"); } ModelLoadingProgress::Loading { progress } => { let progress = (progress * 100.0) as u32; println!("Loading model {progress}%"); } }) .await .unwrap();
Building a Remote Model
For remote chat models, you can use OpenAICompatibleChatModel::builder()
to create a connection to a remote model. You can set a specific model with one of the presets, or set a custom model id with the with_model
method:
let llm = OpenAICompatibleChatModel::builder() .with_gpt_4o_mini() .build();
Text Completion
Once you have a text completion model, you can call the model like a function with the text you need to complete. Before you await the response, you can modify the builder with additional settings like the maximum length of the generated text. Once you are done, you can await the response to get the full generated text:
let text = model("The capital of France is") // Set any options you need .with_sampler(GenerationParameters::default().with_max_length(1000)) // Once you are done, call `await` to get the generated text .await .unwrap(); println!("{}", text);
Instead of awaiting the full result, you can treat the response builder like a stream and read each token as it is generated:
let mut text_stream = model("The capital of France is") // Set any options you need .with_sampler(GenerationParameters::default().with_max_length(1000)); // Pipe the generated text to stdout or read individual tokens with the `next` method from the stream ext trait text_stream.to_std_out().await.unwrap();
Chat Models
If you have a chat model like OpenAICompatibleChatModel
or Llama
with a chat source, you can use the chat
method to start a chat session with the model. You can call the chat session with a new message to create a response builder. You can modify the builder with settings and then await the response or stream the response:
// Create a chat session let mut chat = phi.chat(); loop { let message = prompt_input("\n> ").unwrap(); // Add a message to the chat session let mut response = chat(&message); // Stream the response to stdout response.to_std_out().await.unwrap(); }
Conclusion
This chapter demonstrated both completion and chat models. Experiment with different prompts, configurations, and models to customize the generated text to your needs. If you need more control over the text generation process, check out the next chapter on structured generation