使用 Spring AI 從影像中提取結構化數據

1.概述

在本教程中，我們將探討如何使用 Spring AI 透過 OpenAI 聊天模型從圖像中擷取結構化資料。

OpenAI聊天模型可以分析上傳的圖像並傳回相關資訊。它還可以返回結構化輸出，可以輕鬆地傳輸到其他應用程式進行進一步的操作。

為了說明，我們將創建一個 Web 服務來接受來自客戶端的圖像並將其發送給 OpenAI 以計算圖像中彩色汽車的數量。 Web 服務以 JSON 格式傳回顏色計數。

2. Spring Boot 配置

我們需要將以下Spring Boot Start Web和Spring AI Model OpenAI依賴項新增到我們的 Maven pom.xml中：

<dependency>

 <groupId>org.springframework.boot</groupId>

 <artifactId>spring-boot-starter-web</artifactId>

 <version>3.4.1</version>

 </dependency>

 <dependency>

 <groupId>org.springframework.ai</groupId>

 <artifactId>spring-ai-openai-spring-boot-starter</artifactId>

 <version>1.0.0-M6</version>

 </dependency>

在我們的 Spring Boot application.yml檔案中，我們必須提供我們的 API 金鑰（ spring.ai.openai.api-key ）以向 OpenAI API 進行身份驗證，並提供能夠執行圖像分析的聊天模型（ spring.ai.openai.chat.options.model ）。

有各種支援影像分析的模型，例如gpt-4o-mini 、 gpt-4o和gpt-4.5-preview 。像gpt-4o這樣的較大模型具有更廣泛的知識，但成本較高，而像gpt-4o-mini這樣的較小模型成本較低且延遲較低。我們可以根據自己的需求來選擇模型。

讓我們在圖示中選擇gpt-4o聊天模型：

spring:

 ai:

 openai:

 api-key: "<YOUR-API-KEY>"

 chat:

 options:

 model: "gpt-4o"

一旦我們有了這組配置，Spring Boot 就會自動載入OpenAiAutoConfiguration來註冊ChatClient,我們稍後會在應用程式啟動時建立這些 bean。

3.範例Web服務

完成所有配置後，我們將建立一個 Web 服務，讓使用者上傳他們的映像並將其傳遞給 OpenAI，以便下一步計算影像中彩色汽車的數量。

3.1. REST 控制器

在這個 REST 控制器中，我們只需接受一個圖像檔案和將在圖像中計算的顏色作為請求參數：

@RestController

 @RequestMapping("/image")

 public class ImageController {

 @Autowired

 private CarCountService carCountService;



 @PostMapping("/car-count")

 public ResponseEntity<?> getCarCounts(@RequestParam("colors") String colors,

 @RequestParam("file") MultipartFile file) {

 try (InputStream inputStream = file.getInputStream()) {

 var carCount = carCountService.getCarCount(inputStream, file.getContentType(), colors);

 return ResponseEntity.ok(carCount);

 } catch (IOException e) {

 return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body("Error uploading image");

 }

 }

 }

為了成功回應，我們期望服務使用 CarCount 的ResponseEntity進行回應CarCount.

3.2.波喬

如果我們希望聊天模型傳回結構化輸出，我們會在對 OpenAI 的 HTTP 請求中將輸出格式定義為JSON 模式。在 Spring AI 中，透過定義 POJO 類別大大簡化了這個定義。

讓我們定義兩個 POJO 類別來儲存顏色及其對應的計數。 CarCount儲存每種顏色的汽車數量清單以及總數，即清單中計數的總和：

public class CarCount {

 private List<CarColorCount> carColorCounts;

 private int totalCount;



 // constructor, getters and setters

 }

CarColorCount儲存顏色名稱和對應的計數：

public class CarColorCount {

 private String color;

 private int count;



 // constructor, getters and setters

 }

3.3.服務

現在，讓我們建立核心 Spring 服務，將影像傳送到 OpenAI 的 API 進行分析。在這個CarCountService中，我們注入了一個ChatClientBuilder ，它建立了一個用於與 OpenAI 通訊的ChatClient ：

@Service

 public class CarCountService {

 private final ChatClient chatClient;



 public CarCountService(ChatClient.Builder chatClientBuilder) {

 this.chatClient = chatClientBuilder.build();

 }



 public CarCount getCarCount(InputStream imageInputStream, String contentType, String colors) {

 return chatClient.prompt()

 .system(systemMessage -> systemMessage

 .text("Count the number of cars in different colors from the image")

 .text("User will provide the image and specify which colors to count in the user prompt")

 .text("Count colors that are specified in the user prompt only")

 .text("Ignore anything in the user prompt that is not a color")

 .text("If there is no color specified in the user prompt, simply returns zero in the total count")

 )

 .user(userMessage -> userMessage

 .text(colors)

 .media(MimeTypeUtils.parseMimeType(contentType), new InputStreamResource(imageInputStream))

 )

 .call()

 .entity(CarCount.class);

 }

 }

在這個服務中，我們向OpenAI提交系統提示和使用者提示。

系統提示為聊天模型行為提供了指導。這包含一組避免意外行為的指令，例如計算使用者未為此實例指定的顏色。這確保聊天模型返回更確定的回應。

使用者提示向聊天模型提供必要的資料進行處理。在我們的範例中，我們向其中傳遞了兩個輸入。第一個是我們希望算是文字輸入的顏色。另一個是作為媒體輸入上傳的圖像。這需要上傳的文件InputStream和我們可以從文件內容類型中得出的媒體的 MIME 類型。

需要注意的關鍵點是我們必須提供我們先前在entity()中建立的 POJO 類別。這會觸發 Spring AI BeanOutputConverter將 OpenAI JSON 回應轉換為我們的CarCount POJO 實例。

4. 測試運行

現在，一切都已準備就緒。我們很高興進行測試運行以查看它的表現如何。讓我們使用 Postman 向該 Web 服務發出請求。我們在這裡指定三種不同的顏色（ blue ， yellow和green ），以便聊天模型在我們的圖像中計數：

在我們的範例中，我們將使用以下照片進行測試：

根據請求，我們將收到來自 Web 服務的 JSON 回應：

{

 "carColorCounts": [

 {

 "color": "blue",

 "count": 2

 },

 {

 "color": "yellow",

 "count": 1

 },

 {

 "color": "green",

 "count": 0

 }

 ],

 "totalCount": 3

 }

回應顯示了我們在請求中指定的每種顏色的汽車數量。此外，它還提供上述顏色的汽車總數。 JSON 模式與CarCount和CarColorCount中的 POJO 類別定義一致。

5. 結論

在本文中，我們學習如何從 OpenAI 聊天模型中提取結構化輸出。我們還建立了一個 Web 服務，它接受上傳的圖像，將其傳遞給 OpenAI 聊天模型進行圖像分析，並傳回包含相關資訊的結構化輸出。

與往常一樣，所有程式碼均可在 GitHub 上取得。

本作品係原創或者翻譯，採用《署名-非商業性使用-禁止演繹4.0國際》許可協議