KG-AD: Knowledge Graph-Based Scene Understanding

The proposed KG-enhanced semantic scene understanding framework. Given a scene, the KG represents objects, traffic rules, class hierarchies, and scenario-specific semantic descriptions, enabling richer dilemma reasoning than 3D Scene Graphs.

Key Contributions

KG-enhanced scene representation — a Knowledge Graph encoding not only spatial structure but also traffic rules, class hierarchies, commonsense context, and scenario-specific rdfs:comment descriptions, enabling structured dilemma reasoning beyond what 3DSGs provide.
Controlled LLM-as-judge evaluation — 30 scene-understanding queries across 6 cognitive categories, evaluated N=300 times (GPT-4o-mini answering / GPT-4o judging), yielding statistically rigorous comparisons with Wilcoxon signed-rank test and effect size reporting.
Large performance gain on dilemma reasoning — KG significantly outperforms 3DSG baseline (mean 0.00 vs. 3.56, p<0.001, Cohen's d=0.84), with the largest gain in dilemma-specific queries (+1.71).
5-condition ablation revealing the key driver — systematic ablation (LLM-only → KG-structure → 3DSG → 3DSG+NL → KG full) isolates natural-language rdfs:comment descriptions as the dominant contributor, not ontological structure alone.

Abstract

For autonomous vehicles to operate safely in real-world environments, they must go beyond object detection and understand semantic relations and situational context within a scene. In particular, dilemma situations involving conflicting constraints — such as accident avoidance, pedestrian priority, and traffic rule compliance — are difficult to resolve using conventional 3D Scene Graphs (3DSGs), which mainly represent spatial structure.

To address this limitation, this paper proposes a Knowledge Graph (KG)-enhanced semantic scene understanding framework tailored to autonomous driving dilemma scenarios. The proposed KG represents not only objects, attributes, and relations, but also traffic rules, class hierarchies, commonsense context, and scenario-specific semantic descriptions in a structured form.

We evaluate the framework using 30 scene-understanding queries across six cognitive categories under a controlled LLM-as-judge setting (GPT-4o-mini for answering, GPT-4o for judging; N=300). Results show that the KG-based method significantly outperforms the 3DSG baseline in reasoning quality (mean 4.41 vs. 3.56, Wilcoxon p<0.001, Cohen's d=0.84), with the largest gains in dilemma reasoning (+1.71) while spatial queries confirm design fairness (−0.07). A five-condition ablation study reveals that natural-language semantic descriptions (rdfs:comment) are the dominant contributor to performance.

Method

Experimental Setup

We construct a controlled dilemma scenario at an urban intersection and compare two retrieval strategies on identical perceptual input — isolating the contribution of structured domain knowledge. Both modes receive the same objects, spatial relations, and observational states; only the knowledge representation differs.

KG mode: SPARQL queries over GraphDB DrivingKG (1,498 triples; OWL RDFplus-optimized). Context includes triples, rdfs:comment natural-language descriptions, and ontology class hierarchy.
3DSG baseline: Graph traversal over scene_graph.json (48 nodes, 23 edges). An enhanced 3DSG with spatial relations, object attributes, and observable states — matching the informationally richer definition from Armeni et al. and Rosinol et al.

Answers generated by GPT-4o-mini (temperature=0.3) and scored by GPT-4o judge (temperature=0.0) on a 1–5 rubric across 6 cognitive categories, repeated 5×/query for variance estimation.

6 Cognitive Query Categories

Category	Example Query	KG Advantage
Object Recognition	What vehicles are in the intersection zone?	Moderate
Spatial Relations	What is behind the ego vehicle?	Neutral (−0.07)
Traffic Rule Compliance	Is it legal to proceed on yellow?	Large
Safety & Risk Assessment	What is the highest collision risk?	Large
Dilemma Reasoning	What should the ego vehicle do?	+1.71 (largest)
Commonsense Inference	Why might the ambulance be prioritized?	Large

Category-wise Comparison

Category-wise score comparison. Spatial queries remain comparable (−0.07), validating benchmark fairness. Dilemma reasoning shows the largest gain (+1.71).

Per-query score difference (KG − 3DSG). Negative values on spatial queries confirm the 3DSG is a fair, competitive baseline.

Results

Overall Performance

Method	Mean Score (1–5)	Δ Dilemma	p-value	Cohen's d
3DSG Baseline	3.56	—	—	—
KG (Ours)	4.41	+1.71	<0.001	0.84

Score Distribution

Score distribution across N=300 evaluations. KG concentrates at scores 4–5; 3DSG shows broader spread.

Per-Query Heatmap

Per-query score heatmap across all 30 queries and 2 conditions. Warm colors indicate high scores.

5-Condition Ablation Study

#	Condition	Mean
1	LLM-only (no context)	1.25
2	KG structure-only	3.29
3	3DSG baseline	3.56
4	3DSG + NL descriptions	4.14
5	KG full (ours)	4.41

The ablation reveals that natural-language semantic descriptions (rdfs:comment) are the dominant performance contributor — the jump from condition 2 (KG structure-only: 3.29) to condition 5 (KG full: 4.41) is driven primarily by NL annotations, not ontological structure alone. Comparing conditions 3 and 4 (3DSG → 3DSG+NL: +0.58) confirms this: adding NL descriptions to the 3DSG already closes much of the gap.

BibTeX

@inproceedings{cho2026kg,
  title     = {Knowledge Graph-Based Semantic Scene Understanding
               for Autonomous Driving in Dilemma Situations},
  author    = {Cho, Woongje and Oh, Hyeonseo and Lee, Junseok and Kim, Shiho},
  booktitle = {Proceedings of the KSAE Spring Conference},
  year      = {2026},
  address   = {Seoul, Republic of Korea}
}

Knowledge Graph-Based Semantic Scene Understanding for Autonomous Driving in Dilemma Situations