Text Recognition
Kalosm allows developers to extract text information from images using optical character recognition (OCR). This guide demonstrates how to perform single-line OCR using Kalosm's vision module.
Adding dependencies
Before we get started, we need to add an additional crate for image loading. Add the following line to your Cargo.toml
file:
[dependencies] # Your Kalosm dependency added in the start of the documentation... image = "0.24.7"
Creating an OCR Model
Kalosm's vision
module provides functionality for text recognition in images. In this example, the Ocr::builder()
method is used to create an OCR model that can transcribe single lines of text.
use kalosm::vision::*; let mut model = Ocr::builder().build().await.unwrap();
Loading Image
Next, we need to load an image that contains text. The image
crate provides the open method to load an image from a file path, or the Reader for more advanced loading options.
let image = image::open("examples/ocr.png").unwrap();
Replace the file path with the location of your image. This loaded image will be processed for text recognition.
Recognizing Text
Finally, we can use the recognize_text
method to extract text information from the image. The recognize_text
method takes an OcrInferenceSettings
struct as input. This struct contains the image to be processed, as well as other settings that can be used to customize the OCR process.
let text = model .recognize_text(OcrInferenceSettings::new(image)) .unwrap(); println!("{}", text);
Conclusion
This example provides a basic structure for performing single-line OCR using Kalosm's vision module. You can combine text recognition with an LLM to analyze complex documents or photos.