This documentation covers the core API classes and interfaces in the XR AI Library Runtime/Core module. These components provide the foundational interfaces and utilities for AI model integration in Unity XR applications.
The following interfaces define the contracts for different AI model pipelines:
Interface for AI models that generate text descriptions from images. Supports providers like Groq, Google, and Nvidia.
Interface for AI models that generate 3D models from 2D images. Currently supports StabilityAI.
Interface for AI models that detect and locate objects within images. Supports Google, YOLO, and Roboflow providers.
Interface for AI models that convert text into spoken audio. Generates Unity AudioClip objects from text input.
Interface for AI models that convert spoken audio into text. Processes audio data and returns transcribed text.
Interface for AI models that generate images from text descriptions. Creates visual content based on textual prompts.
Interface for AI models that transform or modify images based on text prompts. Enables image-to-image translation and style transfer.
Central factory class for creating instances of various AI model pipelines. Provides static methods to load different types of AI models by name.
Unified result type for all AI operations. Encapsulates both success and error states using a result pattern for consistent error handling.
MonoBehaviour component that manages AI model configurations, API keys, and workflow-specific properties. Provides centralized configuration management.
MonoBehaviour component that manages AI model assets required by local inference models. Serves as a container for model files and configuration data.
Struct representing detected object location and classification information in object detection results. Provides spatial boundaries and identification of detected objects.
Utility class for visualizing object detection results in Unity. Creates visual bounding boxes and labels to display detected objects on screen.
MonoBehaviour component that simplifies audio recording and conversion for speech-to-text operations. Handles microphone input and audio encoding.
Utility class for encoding Unity textures into standard image formats. Simplifies conversion of Texture2D objects into byte arrays for AI model processing.
XrAiFactory
to load an AI model by provider nameExecute
method with input data and optionsXrAiResult.IsSuccess
and process the data or error// Load an image-to-text model
IXrAiImageToText imageToText = XrAiFactory.LoadImageToText("Groq", new Dictionary<string, string>
{
{ "apiKey", "your-api-key" }
});
// Convert texture to bytes
byte[] imageData = XrAiImageHelper.EncodeTexture(texture, "image/jpeg");
// Execute the model
var result = await imageToText.Execute(imageData, "image/jpeg", new Dictionary<string, string>
{
{ "model", "llama-vision-free" },
{ "prompt", "Describe this image" }
});
// Handle the result
if (result.IsSuccess)
{
Debug.Log($"Description: {result.Data}");
}
else
{
Debug.LogError($"Error: {result.ErrorMessage}");
}
The library supports centralized configuration through XrAiModelManager
:
The core architecture follows these principles:
Task<XrAiResult<T>>
The library uses a result pattern rather than exceptions for expected failure cases:
XrAiResult.IsSuccess
before accessing dataErrorMessage
provides descriptive error informationThe modular design allows for easy extension: