xr-ai-accelerator

IXrAiImageToText

The IXrAiImageToText interface defines the contract for AI models that generate text descriptions from images. This interface is implemented by various providers like Groq, Google, and Nvidia.

Interface Declaration

public interface IXrAiImageToText

Methods

Execute

Processes an image and generates a text description asynchronously.

public Task<XrAiResult<string>> Execute(byte[] imageBytes, string imageFormat, Dictionary<string, string> options = null)

Parameters:

Returns:

Usage Example

// Load the model
IXrAiImageToText imageToText = XrAiFactory.LoadImageToText("Groq", new Dictionary<string, string>
{
    { "apiKey", "your-groq-api-key" }
});

// Convert texture to bytes
byte[] imageBytes = texture.EncodeToJPG();

// Execute the model with options
var result = await imageToText.Execute(imageBytes, "image/jpeg", new Dictionary<string, string>
{
    { "model", "llama-vision-free" },
    { "prompt", "Describe what you see in this image in detail." }
});

// Handle the result
if (result.IsSuccess)
{
    Debug.Log($"Image description: {result.Data}");
}
else
{
    Debug.LogError($"Error: {result.ErrorMessage}");
}

Model-Specific Options

Different providers support different options:

Groq

Google

Nvidia

Image Format Support

The interface supports common image formats:

Use the XrAiImageHelper.EncodeTexture() method to convert Unity Texture2D objects to the appropriate byte array format.

Implementation Notes