Image Generation with DALL-E 3

Semantic Kernel is Microsoft’s powerful SDK that enables developers to integrate AI capabilities into their applications seamlessly. In this tutorial, we’ll explore how to build an interactive web application that combines multiple AI services: image generation with DALL-E 3, text embeddings, and semantic similarity comparison. You can clone the sample code repository by following this link

 
What We’re Building
 

We’ll create an ASP.NET Core MVC application that:

  1. Takes a user’s text prompt and generates an AI-enhanced image description
  2. Uses DALL-E 3 to create an image based on that description
  3. Asks the user to guess what the image represents
  4. Compares the original description with the user’s guess using embedding vectors and cosine similarity
  5. Displays a similarity score showing how close the guess was
Prerequisites
 

Before we begin, ensure you have:

  • .NET 8 or later installed
  • An OpenAI API key
  • Visual Studio 2022 or Visual Studio Code
 
Required NuGet Packages
 

Add the following packages to your project:

    <PackageReference Include="Microsoft.Extensions.AI" />

    <PackageReference Include="Microsoft.SemanticKernel" />

    <PackageReference Include="Microsoft.SemanticKernel.Connectors.OpenAI" />

    <PackageReference Include="System.Numerics.Tensors" />
 
Step 1: Configure Your Application

First, set up your OpenAI configuration in appsettings.json or user secrets:

    {

      "OpenAIConfig": {

        "ModelId": "gpt-4",

    "ApiKey": "your-api-key-here",

    "OrgId": "your-org-id"

      },

      "OpenAIEmbeddingsConfig": {

    "ModelId": "text-embedding-3-small",

    "ApiKey": "your-api-key-here",

    "OrgId": "your-org-id"

    }

}

Pro Tip: Use .NET User Secrets for development to keep your API keys secure:

dotnet user-secrets set “OpenAIConfig:ApiKey” “your-api-key”

 
Step 2: Register Semantic Kernel Services

In your Program.cs, configure Semantic Kernel with multiple AI services:

    var kernelBuilder = builder.Services.AddKernel();

    kernelBuilder.AddOpenAIChatCompletion(

    appConfig.OpenAIConfig.ModelId,

    appConfig.OpenAIConfig.ApiKey,

    appConfig.OpenAIConfig.OrgId);

    kernelBuilder.AddOpenAIEmbeddingGenerator(

    appConfig.OpenAIEmbeddingsConfig.ModelId,

    appConfig.OpenAIEmbeddingsConfig.ApiKey,

    appConfig.OpenAIEmbeddingsConfig.OrgId);

    kernelBuilder.AddOpenAITextToImage(

    appConfig.OpenAIConfig.ApiKey,

    appConfig.OpenAIConfig.OrgId);

This registration pattern makes all AI services available through dependency injection, allowing them to be injected into your controllers.

 
Step 3: Create Your Data Model

Define a model to hold the application state:

    public class DALModel
    {

    public string Prompt { get; set; }

    public string ImageUrl { get; set; }

    public string OriginalImageDescription { get; set; }

    public string UserGuess { get; set; }

    public float SimilarityScore { get; set; }

    public string GuessDescription { get; set; }

    }
 
Step 4: Build the Controller Logic

The controller orchestrates the entire workflow. Here’s how each piece works:

Injecting AI Services

    private readonly Kernel _kernel;

    private readonly ITextToImageService _dalE;

    private readonly IEmbeddingGenerator<string, Embedding<float>> _embeddingGenerator;

    public HomeController(Kernel kernel, ILogger<HomeController> logger)

    {

    _logger = logger;

    _kernel = kernel;

    _dalE = _kernel.GetRequiredService<ITextToImageService>();

    _embeddingGenerator = kernel.GetRequiredService<IEmbeddingGenerator<string, Embedding<float>>>();

    }
 
Generating Creative Image Descriptions

Instead of using the user’s prompt directly, we enhance it through an AI conversation:

    var promptContext = "You're chatting with a user. Instead of replying directly to the user"

    + " provide a description of a image that expresses what you want to say."

    + " The user will see your message and the image."

    + " Describe the image with details in one sentence.";

    var prompt = $@"{promptContext} usermessage: {model.Prompt} {{$input}}.";
Creating a Semantic Function

Semantic Kernel allows you to create functions from prompts:

    var executionSettings = new OpenAIPromptExecutionSettings
    {
    MaxTokens = 256,
    Temperature = 1
    };
    var genImgDescription = _kernel.CreateFunctionFromPrompt(prompt, executionSettings);

The Temperature setting of 1 ensures creative, varied outputs. A random number is added to increase variety across multiple generations.

 
Generating Images with DALL-E 3

Once we have the description, generating an image is straightforward:

    model.ImageUrl = await _dalE.GenerateImageAsync(
    imageDescription.Trim(),
    1024,
    1024);

DALL-E 3 returns a URL to the generated image, which we can display directly in our view.

 
Measuring Semantic Similarity

Here’s where it gets mathematically interesting. We use embeddings and cosine similarity to compare texts:

    var origEmbedding = await _embeddingGenerator.GenerateAsync(
    new List<string> { imageDescription });
    var guessEmbedding = await _embeddingGenerator.GenerateAsync(
    new List<string> { guess });
    var origVector = origEmbedding.First().Vector;
    var guessVector = guessEmbedding.First().Vector;
    var similarity = TensorPrimitives.CosineSimilarity(
    origVector.Span,
    guessVector.Span);

 

How Embeddings Work:

  • Embeddings convert text into high-dimensional vectors (arrays of numbers)
  • Semantically similar texts produce similar vectors
  • Cosine similarity measures the angle between vectors (0 to 1, where 1 is identical)

This means even if the user doesn’t use the exact words, semantically similar guesses will score highly!

 
Step 5: Create the View

The view provides a simple interface for interaction:

    <div class="col-6">

    <form method="post">

    <div class="form-group">

    <span class="label label-primary">Prompt to generate image</span>

    <input asp-for="Prompt" value="flying robin over water"

    maxlength="90" class="form-control" />

    </div>

    <div>

    <span class="label label-primary">User random guess about image</span>

    <input asp-for="UserGuess" value="Ice cream"

    maxlength="90" class="form-control" />

    </div>

    <button type="submit" class="btn btn-primary">Generate Image</button>

    </form>

    </div>

    <div class="col-6">

    <p>Original prompt: @Model.Prompt</p>

    <p>Original image description: @Model.OriginalImageDescription</p>

    <p>User guess: @Model.GuessDescription</p>

    <p class="alert alert-warning">Similarity score: @Model.SimilarityScore</p>

    <img src="@Model.ImageUrl" class="img-fluid" />

    </div>
 
Understanding the Workflow
 
  1. User Input: User enters a prompt (e.g., “flying robin over water”)
  2. AI Enhancement: Semantic Kernel transforms this into a creative image description
  3. Image Generation: DALL-E 3 creates an image from the description
  4. User Interaction: User guesses what the image represents
  5. Semantic Comparison: Embeddings convert both descriptions to vectors
  6. Similarity Scoring: Cosine similarity calculates how close the guess is
  7. Results Display: User sees the image, descriptions, and similarity score
 
Key Takeaways
 

Semantic Kernel Benefits:

  • Unified Interface: One SDK for multiple AI services
  • Dependency Injection: Clean, testable code architecture
  • Prompt Engineering: Easy creation of semantic functions
  • Service Orchestration: Seamlessly combine chat, embeddings, and image generation

 

Advanced AI Techniques Demonstrated:

  • Prompt engineering for creative outputs
  • Vector embeddings for semantic understanding
  • Cosine similarity for text comparison
  • Multi-modal AI integration (text and images)

 

Performance Considerations

The #pragma warning disable SKEXP0001 directive suppresses experimental feature warnings. Some Semantic Kernel features are still evolving, so be mindful when using them in production.

Extending This Application

Consider these enhancements:

  • Add support for Azure OpenAI Service
  • Implement conversation history for context-aware descriptions
  • Store generated images and scores in a database
  • Create a leaderboard for best guesses
  • Add different game modes with varying difficulty
  • Implement real-time streaming for faster feedback

 

Conclusion

This application demonstrates the power of combining multiple AI services through Semantic Kernel. By orchestrating chat completion, embeddings, and image generation, we’ve created an engaging experience that showcases semantic understanding and creative AI capabilities.

The beauty of Semantic Kernel lies in its abstraction—you can swap between OpenAI and Azure OpenAI, add plugins, and extend functionality without major architectural changes. This makes it an excellent choice for building production-ready AI applications on the Microsoft stack.

Ready to build your own AI-powered applications? Start experimenting with Semantic Kernel today and unlock the potential of orchestrated AI services!