Paper Feed¶

Must-read papers on LLM-driven automated planning specification.

New Papers!¶

“Make Planning Research Rigorous Again!” Katz et al. (2025) [paper]

“Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM” Attolino et al. (2025) [paper] [code]

“LODGE: Joint Hierarchical Task Planning and Learning of Domain Models with Grounded Execution” Kienle et al. (2025) [paper] [code]

“Large Language Models for Planning: A Comprehensive and Systematic Survey” Cao et al. (2025) [paper] [code]

“Text2World: Benchmarking Large Language Models for Symbolic World Model Generation” Hu et al. (2025) [paper] [code]

Paper List¶

This section presents a taxonomy of research within Model Construction, organized into three broad categories: Model Generation, Model Editing, and Model Benchmarks. Within each category, the most recent contributions are listed first (then alphabetically).

Model Generation¶

Task Modeling

“Instruction-Augmented Long-Horizon Planning: Embedding Grounding Mechanisms in Embodied Mobile Manipulation” Wang et al. (2025)	[paper] [code]
“TIC: Translate-Infer-Compile for accurate ‘text to plan’ using LLMs and logical intermediate representations” Agarwal and Sreepathy (2024)	[paper]
“AutoGPT+P: Affordance-based Task Planning with Large Language Models” Birr et al. (2024)	[paper] [code]
“Leveraging LLMs for Generating Document-Informed Hierarchical Planning Models: A Proposal” Fine-Morris et al. (2024)	[paper]
“A Demonstration of Natural Language Understanding in Embodied Planning Agents” Grover and Mohan (2024)	[paper]
“CaStL: Constraints as Specifications through LLM Translation for Long-Horizon Task and Motion Planning” Guo et al. (2024)	[paper]
“PlanCollabNL: Leveraging Large Language Models for Adaptive Plan Generation in Human-Robot Collaboration” Izquierdo-Badiola et al. (2024)	[paper]
“Enabling Semantic Reasoning in Robots through Natural Language Processing” Kalland (2024)	[paper]
“Thought of Search: Planning with Language Models Through The Lens of Efficiency” Katz et al. (2024)	[paper]
“Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition” Kwon et al. (2024)	[paper] [code]
“Planning AI Assistant for Emergency Decision-Making (PlanAID): Framing Planning Problems and Assessing Plans with Large Language Models” Lee et al. (2024)	[paper]
“Safe Planner: Empowering Safety Awareness in Large Pre-Trained Models for Robot Task Planning” Li et al. (2024)	[paper]
“Towards Human Awareness in Robot Task Planning with Large Language Models” Liu et al. (2024)	[paper]
“LLM Reasoner and Automated Planner: A New NPC Approach” Merino and Sabater-Mir (2024)	[paper]
“Bootstrapping Object-level Planning with Large Language Models” Paulius et al. (2024)	[paper]
“TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners” Rosa et al. (2024)	[paper]
“TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models” Singh et al. (2024)	[paper] [code]
“Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration” Singh et al. (2024)	[paper] [code]
“PDDLEGO: Iterative Planning in Textual Environments” Zhang et al. (2024)	[paper] [code]
“LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner” Zhang et al. (2024)	[paper] [code]
“LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement” Chang et al. (2023)	[paper] [code]
“AutoTAMP: Autoregressive Task and Motion Planning with LLMs as Translators and Checkers” Chen et al. (2023)	[paper] [code]
“Dynamic Planning with a LLM” Dagan et al. (2023)	[paper] [code]
“Task and Motion Planning with Large Language Models for Object Rearrangement” Ding et al. (2023)	[paper] [code]
“LLM+P: Empowering Large Language Models with Optimal Planning Proficiency” Liu et al. (2023)	[paper] [code]
“Faithful Chain-of-Thought Reasoning” Lyu et al. (2023)	[paper] [code]
“Vision-Language Interpreter for Robot Task Planning” Shirai et al. (2023)	[paper] [code]
“Translating natural language to planning goals with large-language models” Xie et al. (2023)	[paper] [code]
“Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behaviour in out-of-distribution reasoning tasks” Collins et al. (2022)	[paper] [code]

Domain Modeling

“Predicate Invention from Pixels via Pretrained Vision-Language Models” Athalye et al. (2024)	[paper]
“Language-Augmented Symbolic Planner for Open-World Task Planning” Chen at al. (2024)	[paper]
“Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts” Huang et al. (2024)	[paper] [code]
“Learning Compositional Behaviors from Demonstration and Language” Liu et al. (2024)	[paper]
“Using Large Language Models to Extract Planning Knowledge from Common Vulnerabilities and Exposures” Oates et al. (2024)	[paper] [code]
“Large Language Models as Planning Domain Generators” Oswald et al. (2024)	[paper] [code]
“Autonomously Learning World-Model Representations For Efficient Robot Planning” Shah (2024)	[paper]
“Creating PDDL Models from Javascript using LLMs: Preliminary Results” Sikes et al. (2024)	[paper]
“Leveraging LLMs for HTN domain model generation via prompt engineering” Sinha (2024)	[paper]
“Making Large Language Models into World Models with Precondition and Effect Knowledge” Xie at al. (2024)	[paper]
“PROC2PDDL: Open-Domain Planning Representations from Texts” Zhang et al. (2024)	[paper] [code]
“Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds” Ding et al. (2023)	[paper] [code]
“Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning” Guan et al. (2023)	[paper] [code]
“Learning adaptive planning representations with natural language guidance” Wong et al. (2023)	[paper]

Hybrid Modeling

“NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions” Gestrin et al. (2024)	[paper] [code]
“InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning” Han et al. (2024)	[paper] [code]
“Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming” Hao et al. (2024)	[paper] [code]
“AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation” Hu et al. (2024)	[paper] [code]
“DELTA: Decomposed Efficient Long-Term Robot Task Planning using Large Language Models” Liu et al. (2024)	[paper]
“Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models” Mahdavi et al. (2024)	[paper]
“Consolidating Trees of Robotic Plans Generated Using Large Language Models to Improve Reliability” Sakib and Sun (2024)	[paper]
“Toward a Method to Generate Capability Ontologies from Natural Language Descriptions” Silva et al. (2024)	[paper]
“Generating consistent PDDL domains with Large Language Models” Smirnov et al. (2024)	[paper]
“MORPHeus: a Multimodal One-armed Robot-assisted Peeling System with Human Users In-the-loop” Ye et al. (2024)	[paper] [code]
“There and Back Again: Extracting Formal Domains for Controllable Neurosymbolic Story Authoring” Kelly et al. (2023)	[paper] [code]
“The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs” Ying et al. (2023)	[paper]
“ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning” Zhou et al. (2023)	[paper] [code]

Model Editing¶

“Can LLMs Fix Issues with Reasoning Models? Towards More Likely Models for AI Planning” Caglar et al. (2024)	[paper]
“LLMs for AI Planning: A Study on Error Detection and Correction in PDDL Domain Models” Patil (2024)	[paper]
“Traversing the Linguistic Divide: Aligning Semantically Equivalent Fluents Through Model Refinement” Sikes et al. (2024)	[paper]
“Exploring the limitations of using large language models to fix planning tasks” Gragera and Pozanco (2023)	[paper]

Model Benchmarks¶

LLMs-as-Planners

“A Roadmap to Guide the Integration of LLMs in Hierarchical Planning” Puerta-Merino et al. (2025)	[paper] [code]
“Exploring and Benchmarking the Planning Capabilities of Large Language Models” Bohnet et al. (2024)	[paper]
“ACPBench: Reasoning about Action, Change, and Planning” Kokel et al. (2024)	[paper] [code]
“TravelPlanner: A Benchmark for Real-World Planning with Language Agents” Xie et al. (2024)	[paper] [code]
“NATURAL PLAN: Benchmarking LLMs on Natural Language Planning” Zheng et al. (2024)	[paper] [code]
“Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning” (Household) Guan et al. (2023)	[paper] [code]
“Automating the Generation of Prompts for LLM-based Action Choice in PDDL Planning” Stein et al. (2023)	[paper] [code]
“ON THE PLANNING ABILITIES OF LARGE LANGUAGE MODELS (A CRITICAL INVESTIGATION WITH A PROPOSED BENCHMARK)” Valmeekam et al. (2023)	[paper] [code]
“PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change” Valmeekam et al. (2023)	[paper] [code]
“ALFWorld: Aligning Text and Embodied Environments for Interactive Learning” Shridhar et al. (2021)	[paper] [code]

LLMs-as-Formalizers PDDL Benchmarks

“Text2World: Benchmarking Large Language Models for Symbolic World Model Generation” Hu et al. (2025)	[paper] [code]
“Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages” Zuo et al. (2024)	[paper] [code]

The following is the core summary of model generation frameworks in “LLMs as Planning Formalizers: A Survey for Leveraging Large Language Models to Construct Automated Planning Models”: