Knowledge Graph-Based Semantic Scene Understanding
for Autonomous Driving in Dilemma Situations

딜레마 상황 자율주행을 위한 지식그래프 기반 의미 장면 이해

Woongje Cho1, Hyeonseo Oh1, Junseok Lee2, Shiho Kim*3
1School of Mechanical Engineering, Yonsei University  ·  2School of Civil and Environmental Engineering, Yonsei University  ·  3School of Integrated Technology, Yonsei University
KSAE 2026 Spring Conference (한국자동차공학회 춘계학술대회) · Accepted
Framework Architecture

The proposed KG-enhanced semantic scene understanding framework. Given a scene, the KG represents objects, traffic rules, class hierarchies, and scenario-specific semantic descriptions, enabling richer dilemma reasoning than 3D Scene Graphs.

Key Contributions

  • KG-enhanced scene representation — a Knowledge Graph encoding not only spatial structure but also traffic rules, class hierarchies, commonsense context, and scenario-specific rdfs:comment descriptions, enabling structured dilemma reasoning beyond what 3DSGs provide.
  • Controlled LLM-as-judge evaluation — 30 scene-understanding queries across 6 cognitive categories, evaluated N=300 times (GPT-4o-mini answering / GPT-4o judging), yielding statistically rigorous comparisons with Wilcoxon signed-rank test and effect size reporting.
  • Large performance gain on dilemma reasoning — KG significantly outperforms 3DSG baseline (mean 0.00 vs. 3.56, p<0.001, Cohen's d=0.84), with the largest gain in dilemma-specific queries (+1.71).
  • 5-condition ablation revealing the key driver — systematic ablation (LLM-only → KG-structure → 3DSG → 3DSG+NL → KG full) isolates natural-language rdfs:comment descriptions as the dominant contributor, not ontological structure alone.

Abstract

For autonomous vehicles to operate safely in real-world environments, they must go beyond object detection and understand semantic relations and situational context within a scene. In particular, dilemma situations involving conflicting constraints — such as accident avoidance, pedestrian priority, and traffic rule compliance — are difficult to resolve using conventional 3D Scene Graphs (3DSGs), which mainly represent spatial structure.

To address this limitation, this paper proposes a Knowledge Graph (KG)-enhanced semantic scene understanding framework tailored to autonomous driving dilemma scenarios. The proposed KG represents not only objects, attributes, and relations, but also traffic rules, class hierarchies, commonsense context, and scenario-specific semantic descriptions in a structured form.

We evaluate the framework using 30 scene-understanding queries across six cognitive categories under a controlled LLM-as-judge setting (GPT-4o-mini for answering, GPT-4o for judging; N=300). Results show that the KG-based method significantly outperforms the 3DSG baseline in reasoning quality (mean 4.41 vs. 3.56, Wilcoxon p<0.001, Cohen's d=0.84), with the largest gains in dilemma reasoning (+1.71) while spatial queries confirm design fairness (−0.07). A five-condition ablation study reveals that natural-language semantic descriptions (rdfs:comment) are the dominant contributor to performance.

Method

Experimental Setup

We construct a controlled dilemma scenario at an urban intersection and compare two retrieval strategies on identical perceptual input — isolating the contribution of structured domain knowledge. Both modes receive the same objects, spatial relations, and observational states; only the knowledge representation differs.

  • KG mode: SPARQL queries over GraphDB DrivingKG (1,498 triples; OWL RDFplus-optimized). Context includes triples, rdfs:comment natural-language descriptions, and ontology class hierarchy.
  • 3DSG baseline: Graph traversal over scene_graph.json (48 nodes, 23 edges). An enhanced 3DSG with spatial relations, object attributes, and observable states — matching the informationally richer definition from Armeni et al. and Rosinol et al.

Answers generated by GPT-4o-mini (temperature=0.3) and scored by GPT-4o judge (temperature=0.0) on a 1–5 rubric across 6 cognitive categories, repeated 5×/query for variance estimation.

6 Cognitive Query Categories

Category Example Query KG Advantage
Object RecognitionWhat vehicles are in the intersection zone?Moderate
Spatial RelationsWhat is behind the ego vehicle?Neutral (−0.07)
Traffic Rule ComplianceIs it legal to proceed on yellow?Large
Safety & Risk AssessmentWhat is the highest collision risk?Large
Dilemma ReasoningWhat should the ego vehicle do?+1.71 (largest)
Commonsense InferenceWhy might the ambulance be prioritized?Large

Category-wise Comparison

Category-wise KG vs 3DSG comparison

Category-wise score comparison. Spatial queries remain comparable (−0.07), validating benchmark fairness. Dilemma reasoning shows the largest gain (+1.71).

Per-query score differences

Per-query score difference (KG − 3DSG). Negative values on spatial queries confirm the 3DSG is a fair, competitive baseline.

Results

Overall Performance

Method Mean Score (1–5) Δ Dilemma p-value Cohen's d
3DSG Baseline 3.56
KG (Ours) 4.41 +1.71 <0.001 0.84

Score Distribution

Score distribution

Score distribution across N=300 evaluations. KG concentrates at scores 4–5; 3DSG shows broader spread.

Per-Query Heatmap

Per-query performance heatmap

Per-query score heatmap across all 30 queries and 2 conditions. Warm colors indicate high scores.

5-Condition Ablation Study

#ConditionMean
1LLM-only (no context)1.25
2KG structure-only3.29
33DSG baseline3.56
43DSG + NL descriptions4.14
5KG full (ours)4.41
Ablation study

The ablation reveals that natural-language semantic descriptions (rdfs:comment) are the dominant performance contributor — the jump from condition 2 (KG structure-only: 3.29) to condition 5 (KG full: 4.41) is driven primarily by NL annotations, not ontological structure alone. Comparing conditions 3 and 4 (3DSG → 3DSG+NL: +0.58) confirms this: adding NL descriptions to the 3DSG already closes much of the gap.

BibTeX

@inproceedings{cho2026kg,
  title     = {Knowledge Graph-Based Semantic Scene Understanding
               for Autonomous Driving in Dilemma Situations},
  author    = {Cho, Woongje and Oh, Hyeonseo and Lee, Junseok and Kim, Shiho},
  booktitle = {Proceedings of the KSAE Spring Conference},
  year      = {2026},
  address   = {Seoul, Republic of Korea}
}