2.5次元分解がLLMの空間認識を変えるか？

2.5次元分解を用いた神経記号パイプラインが、LLMの空間認識能力を向上させた。

元記事タイトル: 2.5次元分解によるLLMベースの空間構築

arXiv cs.AI 2026年06月24日

査読未完了の可能性があります。完成した査読済み論文としてではなく、研究コミュニティ向けの早期共有として読んでください。

RESEARCH 研究論文 / Preprint

Field Note 読む前に確認

3行まとめ

自然言語指令に基づく構造物建設でLLMの三次元配置エラーを解消
GPT-4o-miniはBuild What I Meanベンチマークで94.6%の精度を達成
Nemotron-3 120Bはエッジデバイス上でクラウド結果と同等の性能を発揮

こんな人に関係ある話

AI研究者自動化システム開発者建設業界エンジニア

信頼度メモ

プレプリント論文（査読前の可能性あり）

記事の読み解き Reading

元記事を材料に、要点、編集視点、良い点と懸念点を読みやすい順に整理しています。

自然言語指令に基づいて構造物を建設する自律システムは、信頼性のある空間認識が必要である。しかし、大規模言語モデル（LLMs）は三次元ブロック配置生成時に系統的な座標エラーを起こす。本研究では、2.5次元分解に基づく神経記号パイプラインを提案し、LLMが水平面で計画を行い、決定論的実行者が垂直配置を計算することで、エラーのクラス全体を排除する。この手法により、GPT-4o-miniはBuild What I Meanベンチマークで94.6%の平均構造精度を達成し、建築者エージェントの誤りによる上限である97.6%に近づいた。

編集部コメント

この研究は、LLMの空間認識能力を改善し、自律的な構築タスクにおける精度を大幅に向上させる手法を提案している。特に2.5次元分解という新規なアプローチが注目される一方で、物理的制約がない場合や建築者エージェントによる誤りの影響についても考察が必要である。

評価ポイント Assessment

良い点

2.5次元分解によりLLMの出力空間から決定論的な次元を除去することで、垂直配置の正確性が向上する
GPT-4o-miniはこの手法で他のシステムよりも高い精度を達成し、建築タスクでの実用性を示す
Nemotron-3 120Bはエッジデバイス上で直接動作可能で、クラウド結果と同等の性能を発揮

懸念点

建築者エージェントの誤りにより、完全な精度向上が制限される可能性がある
特定の物理的制約がない場合や重力以外の要因が影響を与える場合での効果は不明確

業界・社会への影響 Impact

この研究は、自律的な構築または組み立てタスクにおけるLLMの空間認識能力を大幅に向上させ、建設業界や製造業においてより正確で信頼性のあるシステム開発を可能にする。また、エッジデバイスでの直接実行により、クラウドへの依存度が低減され、リアルタイム応答性とセキュリティも向上する。

参照元 Sources

元記事と、深堀りで参照した情報源です。コミュニティ投稿やプレプリントでは、ここから根拠を確認できます。

2.5次元分解によるLLMベースの空間構築

arXiv cs.AI

https://arxiv.org/abs/2605.07066

この記事の見取り図

読む前に確認
記事の読み解き
参照元
AI要約について
関連記事

キーワード

2.5-D decomposition LLM-based spatial construction autonomous systems Nemotron-3 GPT-4o-mini

AI要約について

本記事の要約・分類・読み解きにはAIを使用しています。内容確認に努めていますが、誤訳・解釈違い・元記事更新の反映漏れを含む可能性があります。重要な判断を行う場合は、必ず元記事もご確認ください。

速報について — 速報は追加調査や本文抽出の結果で内容が更新される場合があります。初期要約には誤りや不足が含まれる可能性があります。

記事データ

Source	プレプリント
Category	研究論文
Status	速報
出典	arXiv cs.AI
公開日	2026-06-24

元記事の説明文

arXiv:2605.07066v3 Announce Type: replace Abstract: Autonomous systems that build structures from natural-language instructions need reliable spatial reasoning, yet large language models (LLMs) make systematic coordinate errors when generating three-dimensional block placements. We present a neuro-symbolic pipeline based on \emph{2.5-D decomposition}: the LLM plans in the two-dimensional horizontal plane while a deterministic executor computes all vertical placement from column occupancy, eliminating an entire class of errors. On the Build What I Mean benchmark (160 rounds), GPT-4o-mini with this pipeline achieves 94.6\% mean structural accuracy across 12 independent runs, within 3.0 percentage points of the 97.6\% ceiling imposed by architect-agent errors that no builder-side improvement can address. This outperforms both GPT-4o at 90.3\% and the best competing system at 76.3\%. A controlled ablation confirms that 2.5-D decomposition is the dominant contributor, accounting for 50.7 percentage points of accuracy. The pipeline transfers directly to edge hardware: Nemotron-3 120B running locally on an NVIDIA Jetson Thor AGX matches the cloud result at 94.5\% with no prompt modifications. The underlying principle, removing deterministic dimensions from the LLM's output space, applies to any autonomous construction or assembly task where gravity or other physical constraints fix one or more degrees of freedom. A transfer experiment on 500 IGLU collaborative building tasks confirm the effect generalizes beyond the primary benchmark.