The IXrAiImageToText
interface defines the contract for AI models that generate text descriptions from images. This interface is implemented by various providers like Groq, Google, and Nvidia.
public interface IXrAiImageToText
Processes an image and generates a text description asynchronously.
public Task<XrAiResult<string>> Execute(byte[] imageBytes, string imageFormat, Dictionary<string, string> options = null)
Parameters:
imageBytes
(byte[]): The image data as a byte arrayimageFormat
(string): The format of the image (e.g., “image/jpeg”, “image/png”)options
(Dictionary<string, string>, optional): Model-specific options and parametersReturns:
Task<XrAiResult<string>>
: A task that resolves to a result containing the generated text description// Load the model
IXrAiImageToText imageToText = XrAiFactory.LoadImageToText("Groq", new Dictionary<string, string>
{
{ "apiKey", "your-groq-api-key" }
});
// Convert texture to bytes
byte[] imageBytes = texture.EncodeToJPG();
// Execute the model with options
var result = await imageToText.Execute(imageBytes, "image/jpeg", new Dictionary<string, string>
{
{ "model", "llama-vision-free" },
{ "prompt", "Describe what you see in this image in detail." }
});
// Handle the result
if (result.IsSuccess)
{
Debug.Log($"Image description: {result.Data}");
}
else
{
Debug.LogError($"Error: {result.ErrorMessage}");
}
Different providers support different options:
model
: The model to use (e.g., “llama-vision-free”)prompt
: The prompt to guide the image descriptionprompt
: The prompt for image analysisurl
: API endpoint URLprompt
: The prompt for image descriptionmodel
: The model identifierurl
: API endpoint URLThe interface supports common image formats:
"image/jpeg"
- JPEG images"image/png"
- PNG imagesUse the XrAiImageHelper.EncodeTexture()
method to convert Unity Texture2D
objects to the appropriate byte array format.
Task
XrAiResult<string>
for consistent error handling