아이티브에이아이 기술 블로그 | YOLO를 활용해 Detection과 Segmentation 동시에 하기

오늘 할거

오늘은 YOLO11을 통해 Segmentation과 Detection을 동시에 진행해서 추론하는 방법에 대해 리뷰하고자 해요. 평소부터 왜 이 두가지를 동시에 할 수는 없나...를 고민하던 도중에 "그냥 모델을 두개 만들면 되는거 아닌가...?" 라는 아이디어가 떠올랐고, Segmentation 모델과 Detection 모델을 만들어, 각 모델의 추론 결과를 하나의 이미지에 그리면 될것 같은데? 라는 생각으로 만들었더니, 꽤나 나쁘지 않은 결과를 얻어 이렇게 리뷰하게 되었어요.

준비물

사용 언어와 라이브러리는 아래와 같아요.

구성
- OS : Ubuntu 24.04 LTS
- GPU : RTX-4070Ti-Super
- nvidia 드라이버 : 555.42.06
  - 드라이버 버전은 여러분들 GPU에 맞는 드라이버 쓰시면 되고, 다만 나중에 사용할 라이브러리에 대한 종속성이 있으니, 반드시 설치는 해주셔야해요!
사용 언어 및 라이브러리
- python 3.12.8
- ultralytics
- cv2
- numpy
- PIL
- os
- glob

데이터 수집

이번에 사용해볼 데이터는 요즘에 제가 관심을 가지고 있는 자전거 관련 데이터에요. 이사를 하면서 자전거로 출퇴근을 하게 되었는데, 요즘 자전거 도로 상태가 말이 아니더라구요!!(왜 길이 있다가 없어짐? 버근가?)

그래서 무슨 데이터를 가지고 해볼까 하다가, AI hub에 자전거 도로 관련 데이터가 있어서 이번에 사용해 보기로 했어요.

AI hub에서는 회원가입을 한 내국인 (여기도 척화비가..) 회원에 한해 영리적 목적 없이 사용할 수 있는 공공 데이터를 제공하고 있어요. 사용법은 AI hub내에 친절히 공지되어 있으니, 편하게 사용하시면 될 것 같아요!

alt text

632GB라니.. 생각보다 데이터가 많죠? 전국의 자전거 도로에 대한 정보들이다 보니 꽤나 많은 양이라, 이번에는 서울 자전거 도로만 사용해 보기로 했어요.

데이터 전처리

데이터를 받았으니, 우리가 사용할 수 있는 데이터인지 한번 확인해 봐야겠죠? 우리는 detection 모델과 segmentation 모델, 두가지를 만들어야 하니, 각각의 형태에 맞게 전 처리를 해주었어요.

결측치(?) 제거

이상이 있는 라벨 파일이나 이미지 파일을 확인하여 제거해줬어요.
결국 내부적으로 openCV 를 사용하게 되니, openCV로 읽어지지 않는 이미지와, 그에 대응하는 라벨 파일을 삭제해 줬어요.
또한, 라벨 데이터가 비어 있는 데이터와 그에 대응하는 이미지를 삭제해 줬어요.

학습 데이터 제작

yolo11을 학습하기 위한 데이터 형태로 만들기 위해, key 값을 매핑하여 처리해줬어요.
1. 불필요 key 값 제거
  - segmentation 데이터의 경우 bbox 좌표
  - 촬영 장비 및 위치에 대한 메타데이터
2. key값, value값 수정
  - "category_name" 과 "sub_category_name" 의 경우 도로결함_크랙 과 같이 병합
  - "drawing" → "shape_type", "segmentation" → "points" 등
3. Segmentation / Detection 데이터 분할
  - "shape_type"에 따라 각각 객체들을 분할하여 라벨 생성

alt text

모델 학습

학습 코드

from ultralytics import YOLO
import torch

import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

# 클래스 이름들이 한글로 적혀있어서 나눔 고딕 폰트를 적용했어요.
print(plt.rcParams['font.family'])
plt.rcParams['axes.unicode_minus'] = False

font_path = "/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf"
font_name = fm.FontProperties(fname=font_path).get_name()
plt.rc('font', family=font_name)
print(plt.rcParams['font.family'])

print(torch.cuda.is_available())
# detection model 학습 시 yolo11m-seg.pt → yolo11m.pt 변경
model = YOLO("yolo11m-seg.pt")

model.train(
    # detection model 학습 시 detection dataset.yaml 경로로 변경
    data='/home/itivai-1/Projects/demoProject/bicycle/segmentation/orgData/YOLODataset/dataset.yaml',
    epochs=1000,
    project='../bicycle/trainResults/',
    resume=False,
    batch = 16,
    pretrained=True
)

설마하니 이걸 따라하실 분은 없으시다고 생각되지만, 혹시나 해보신다면 마음을 단단히 드시고 진행하셔야해요.. 왜냐하면 2주 가까이 걸렸거든요... 혹시 따라하실 분들은 foreground에서 진행하지 마시고, 아래의 명령을 통해 background로 실행시키는 것을 추천드려요

$ nohup python ./train_model.py &

학습 결과

인고의 시간이 끝나고, 드디어 학습이 종료되었어요. 결과를 한번 확인해 볼까요?

1. Labels.png 들

segmentation labels / detection labels
라벨 분포를 확인해서 우리가 의도한대로 구성이 되었는지 확인해요.
원본 데이터의 클래스를 병합하고, Segmentation과 Detection 클래스가 잘 분할 되었는지 확인해 봐요. 2. PR Curve.png 들
detection pr curve / segmentation pr curve
역시 적은 데이터였던 녀석들 성능이 좋지 않은게 보이네요.
클래스 불균형은 모든 AI 학습에서 꽤나 중요한 부분이라고 할 수 있어요.

3. Results.png 들

segmentation results.png / detection results.png
확실히 detection에 비해 segmentation 성능이 떨어진 다는 것을 볼 수 있어요.
하지만, Loss와 성능 지표들의 추이를 확인해 보았을 때, 의도한 대로 학습이 진행되었다 라는 것은 알 수 있어요.

결과

기존 데이터에서 클래스 불균형의 문제가 있어 지표상 이렇게 나온 것 같고, 각 클래스 별 성능은 나쁘지 않은 듯 하네요.

Segmentation과 Detection 추론 결과 병합하여 결과 만들기

자 드디어 본론이에요.

두가지 모델에 대한 추론 결과를 합치기 위해 몇가지 고려할 사항이 있어요.

각 모델의 클래스 번호 구분
- YOLO 모델의 클래스는 results.cls와 results.names로 매핑되어 있어요.
- 그런데 서로 다른 모델에서는 클래스의 인덱스가 겹칠 문제가 있어 추론 이후 Detection 모델의 추론 결과를 segmentation에서 나온 모든 클래스 번호 이후의 번호로 시작하도록 해주었어요.
클래스 별 색상
- 클래스 번호와 동일하게, YOLO에서는 기본적으로 cycler를 통해 색상을 매핑시키고 있어요.
- 하지만 이 색상또한 겹칠 수 있기 때문에, 번호에 맞게 매핑해 주었어요.
- 추가적으로, 불량을 표시하는 클래스에 대해서는 빨간색으로 표시되도록 설정해 주었어요.

그러면 이제 코드를 한번 볼까요?

EnsemblePrediction.py

import cv2
import numpy as np
from ultralytics import YOLO
from PIL import Image, ImageDraw, ImageFont
import os
import glob
# -----------------------------
# 1. 모델 로드 (Detection, Segmentation)
# -----------------------------
detection_model = YOLO("../model/detection_best.pt")
segmentation_model = YOLO("../model/segmentation_best.pt")

# -----------------------------
# 2. 시각화 옵션 (폰트, 색상 등)
# -----------------------------
FONT_PATH = "./Fonts/D2Coding-Ver1.3.2-20180524-all.ttc"

predefined_colors = [
    (255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 165, 0), (255, 255, 0),
    (0, 255, 255), (255, 0, 255), (128, 0, 0), (0, 128, 0), (0, 0, 128),
    (128, 128, 0), (0, 128, 128), (128, 0, 128), (255, 105, 180), (75, 0, 130),
    (255, 20, 147), (255, 69, 0), (124, 252, 0), (30, 144, 255), (220, 20, 60),
    (186, 85, 211), (154, 205, 50), (70, 130, 180), (240, 128, 128), (46, 139, 87)
]

num_detection_classes = len(detection_model.names)
num_segmentation_classes = len(segmentation_model.names)

detection_colors = {i: predefined_colors[i % len(predefined_colors)]
                    for i in range(num_detection_classes)}
segmentation_colors = {i: predefined_colors[(i + num_detection_classes) % len(predefined_colors)]
                       for i in range(num_segmentation_classes)}

# -----------------------------
# 3. 유틸 함수들
# -----------------------------
def draw_text_with_background(image, text, position, font_path=FONT_PATH, font_size=24,
                              text_color=(255, 255, 255), bg_color=(0, 0, 0)):
    """
    OpenCV 이미지에 배경 사각형을 그리고 텍스트를 출력합니다.
    이 함수는 PIL을 사용하여 텍스트를 그리므로 최종 결과에는 blend 후 텍스트가 그대로 남습니다.
    """
    pil_img = Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    draw = ImageDraw.Draw(pil_img)
    try:
        font = ImageFont.truetype(font_path, font_size)
    except IOError:
        print("font is not found. use default font.")
        font = ImageFont.load_default()

    # OpenCV(BGR) → PIL(RGB)
    bg_color_rgb = (bg_color[2], bg_color[1], bg_color[0])
    text_bbox = draw.textbbox((0, 0), text, font=font)
    text_w = text_bbox[2] - text_bbox[0]
    text_h = text_bbox[3] - text_bbox[1]
    x, y = position
    draw.rectangle(((x, y), (x + text_w + 10, y + text_h + 6)), fill=bg_color_rgb)
    draw.text((x + 5, y + 3), text, font=font, fill=text_color)

    return cv2.cvtColor(np.array(pil_img), cv2.COLOR_RGB2BGR)

def draw_segmentation_mask(image, segmentation_results, debug_show=False):
    """
    segmentation_results.boxes의 모든 객체에 대해,
    대응하는 segmentation_results.masks.xy의 점들을 연결하여 다각형을 만들고 색칠합니다.
    마스크의 바운딩 박스 좌측 상단에 클래스 이름을 표시합니다.
    클래스 이름에 "결함", "크랙", "불량"이 포함되면 색상을 빨간색(BGR: (0, 0, 255))으로 설정합니다.
    클래스 이름은 마지막에 별도로 그려져 투명화되지 않습니다.
    """
    if segmentation_results.masks is None:
        print("[WARNING] Segmentation results is None")
        return image

    overlay = image.copy()
    text_items = []  # 나중에 텍스트를 그리기 위한 정보 (class_name, 위치, 색상)
    num_boxes = len(segmentation_results.boxes)
    num_masks = len(segmentation_results.masks.xy) if segmentation_results.masks.xy is not None else 0

    for idx in range(num_boxes):
        class_id = int(segmentation_results.boxes.cls[idx])
        # 기본 색상
        color = segmentation_colors.get(class_id, (0, 0, 255))
        class_name = segmentation_results.names[class_id]
        # 키워드 포함 시 색상 override
        if any(keyword in class_name for keyword in ["결함", "크랙", "불량"]):
            color = (255, 0, 0)

        if idx < num_masks:
            poly = segmentation_results.masks.xy[idx]
            poly_arr = np.array(poly, dtype=np.int32)
            if debug_show:
                print(f"[DEBUG] index [{idx}] polygon[x,y]:", poly_arr.tolist())
            poly_arr_reshaped = poly_arr.reshape((-1, 1, 2))
            cv2.fillPoly(overlay, [poly_arr_reshaped], color)
            # 다각형의 바운딩 박스 계산 후 텍스트 위치 저장
            x, y, w, h = cv2.boundingRect(poly_arr_reshaped)
            text_items.append((class_name, (x, y - 10), color))
        else:
            if debug_show:
                print(f"No mask point in [DEBUG] index [{idx}].")

    # 마스크 부분만 blend하여 투명 효과 적용
    blended = cv2.addWeighted(overlay, 0.5, image, 0.5, 0)

    # blend 후, 텍스트는 마지막에 별도로 그려서 불투명하게 처리
    for class_name, position, color in text_items:
        blended = draw_text_with_background(blended, class_name, position,
                                            font_size=20, text_color=(255, 255, 255), bg_color=color)
    return blended

def draw_detection_boxes(image, detection_results):
    """
    Detection 결과의 바운딩 박스와 라벨을 그립니다.
    """
    for box in detection_results.boxes:
        class_id = int(box.cls)
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        color = detection_colors.get(class_id, (255, 0, 0))
        cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)
        label = f"{detection_results.names[class_id]}"
        image = draw_text_with_background(image, label, (x1, y1 - 10),
                                          font_size=20, text_color=(255, 255, 255), bg_color=color)
    return image

def run_inference(image_path, save_path="output.png", debug_show=False):
    """
    이미지에 대해 Detection 및 Segmentation 모델을 실행하여 결과를 시각화하고 저장합니다.
    """
    original_bgr = cv2.imread(image_path)
    original_rgb = cv2.cvtColor(original_bgr, cv2.COLOR_BGR2RGB)

    results_detection = detection_model(original_bgr, imgsz=1920)[0]
    results_segmentation = segmentation_model(original_bgr, imgsz=1920,save=True)[0]

    annotated_rgb = draw_segmentation_mask(original_rgb.copy(), results_segmentation, debug_show)
    annotated_rgb = draw_detection_boxes(annotated_rgb, results_detection)

    annotated_bgr = cv2.cvtColor(annotated_rgb, cv2.COLOR_RGB2BGR)
    cv2.imwrite(save_path, annotated_bgr)
    print(f"[INFO] Image save sucsessfully: {save_path}")

if __name__ == "__main__":
    input_directory = "../copiedimg_train"
    output_directory = "output_images_0217"
    process_directory(input_directory, output_directory)

이 글을 보시는 분들에게는 코드에 대한 설명은 주석으로 충분하다고 생각되어 코드 설명은 따로 진행하지 않을게요.

더 자세히 알아보고 싶으신 분들은 YOLOv11의 predict.results 에 대한 API 문서를 통해 확인하시면 더욱 풍부한 경험을 하실 수 있을거라고 생각돼요.

추론 결과

alt text

캬.. 아름답게 나왔죠? 결과를 봤을 때, 생각보다 많은 종류의 클래스를 잡고 있는 것 같아요. 사용 방법이나 최적화 방안에 따라 다양하게 사용할 수 있을 것 같아요. AI는 만드는 것도 중요하지만 어떻게 활용하는지에 따라 무궁무진한 가능성을 가지고 있다고 생각해요.

예를 들어, 도로와 도로에 대한 불량 클래스만 남기고 재 학습해서 자전거 도로의 상태를 인식하는 모델을 만들다던지, 자전거 도로와 퀵보드와 같은 것을 인식하여 도로 위 주정차 등을 인식하는 등의 작업도 할 수 있을 것 같아요! AI 기반 자전거 도로 크랙 탐지

오늘은 Segmentation과 Detection 데이터 셋이 함께 있는 경우 전처리를 통해 분할 하고, 두가지 모델을 만들어 두가지 모델의 추론 결과를 병합하여 추론하는 방법에 대해 알아보았어요.