Paper Feed¶
Must-read papers on LLM-driven automated planning specification.
New Papers!¶
“Make Planning Research Rigorous Again!” Katz et al. (2025) [paper]
“Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM” Attolino et al. (2025) [paper] [code]
“LODGE: Joint Hierarchical Task Planning and Learning of Domain Models with Grounded Execution” Kienle et al. (2025) [paper] [code]
“Large Language Models for Planning: A Comprehensive and Systematic Survey” Cao et al. (2025) [paper] [code]
“Text2World: Benchmarking Large Language Models for Symbolic World Model Generation” Hu et al. (2025) [paper] [code]
Paper List¶
This section presents a taxonomy of research within Model Construction, organized into three broad categories: Model Generation, Model Editing, and Model Benchmarks. Within each category, the most recent contributions are listed first (then alphabetically).
Model Generation¶
Task Modeling
“Instruction-Augmented Long-Horizon Planning: Embedding Grounding Mechanisms in Embodied Mobile Manipulation” Wang et al. (2025) |
|
“TIC: Translate-Infer-Compile for accurate ‘text to plan’ using LLMs and logical intermediate representations” Agarwal and Sreepathy (2024) |
|
“AutoGPT+P: Affordance-based Task Planning with Large Language Models” Birr et al. (2024) |
|
“Leveraging LLMs for Generating Document-Informed Hierarchical Planning Models: A Proposal” Fine-Morris et al. (2024) |
|
“A Demonstration of Natural Language Understanding in Embodied Planning Agents” Grover and Mohan (2024) |
|
“CaStL: Constraints as Specifications through LLM Translation for Long-Horizon Task and Motion Planning” Guo et al. (2024) |
|
“PlanCollabNL: Leveraging Large Language Models for Adaptive Plan Generation in Human-Robot Collaboration” Izquierdo-Badiola et al. (2024) |
|
“Enabling Semantic Reasoning in Robots through Natural Language Processing” Kalland (2024) |
|
“Thought of Search: Planning with Language Models Through The Lens of Efficiency” Katz et al. (2024) |
|
“Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition” Kwon et al. (2024) |
|
“Planning AI Assistant for Emergency Decision-Making (PlanAID): Framing Planning Problems and Assessing Plans with Large Language Models” Lee et al. (2024) |
|
“Safe Planner: Empowering Safety Awareness in Large Pre-Trained Models for Robot Task Planning” Li et al. (2024) |
|
“Towards Human Awareness in Robot Task Planning with Large Language Models” Liu et al. (2024) |
|
“LLM Reasoner and Automated Planner: A New NPC Approach” Merino and Sabater-Mir (2024) |
|
“Bootstrapping Object-level Planning with Large Language Models” Paulius et al. (2024) |
|
“TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners” Rosa et al. (2024) |
|
“TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models” Singh et al. (2024) |
|
“Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration” Singh et al. (2024) |
|
“PDDLEGO: Iterative Planning in Textual Environments” Zhang et al. (2024) |
|
“LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner” Zhang et al. (2024) |
|
“LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement” Chang et al. (2023) |
|
“AutoTAMP: Autoregressive Task and Motion Planning with LLMs as Translators and Checkers” Chen et al. (2023) |
|
“Dynamic Planning with a LLM” Dagan et al. (2023) |
|
“Task and Motion Planning with Large Language Models for Object Rearrangement” Ding et al. (2023) |
|
“LLM+P: Empowering Large Language Models with Optimal Planning Proficiency” Liu et al. (2023) |
|
“Faithful Chain-of-Thought Reasoning” Lyu et al. (2023) |
|
“Vision-Language Interpreter for Robot Task Planning” Shirai et al. (2023) |
|
“Translating natural language to planning goals with large-language models” Xie et al. (2023) |
|
“Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behaviour in out-of-distribution reasoning tasks” Collins et al. (2022) |
Domain Modeling
“Predicate Invention from Pixels via Pretrained Vision-Language Models” Athalye et al. (2024) |
|
“Language-Augmented Symbolic Planner for Open-World Task Planning” Chen at al. (2024) |
|
“Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts” Huang et al. (2024) |
|
“Learning Compositional Behaviors from Demonstration and Language” Liu et al. (2024) |
|
“Using Large Language Models to Extract Planning Knowledge from Common Vulnerabilities and Exposures” Oates et al. (2024) |
|
“Large Language Models as Planning Domain Generators” Oswald et al. (2024) |
|
“Autonomously Learning World-Model Representations For Efficient Robot Planning” Shah (2024) |
|
“Creating PDDL Models from Javascript using LLMs: Preliminary Results” Sikes et al. (2024) |
|
“Leveraging LLMs for HTN domain model generation via prompt engineering” Sinha (2024) |
|
“Making Large Language Models into World Models with Precondition and Effect Knowledge” Xie at al. (2024) |
|
“PROC2PDDL: Open-Domain Planning Representations from Texts” Zhang et al. (2024) |
|
“Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds” Ding et al. (2023) |
|
“Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning” Guan et al. (2023) |
|
“Learning adaptive planning representations with natural language guidance” Wong et al. (2023) |
Hybrid Modeling
“NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions” Gestrin et al. (2024) |
|
“InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning” Han et al. (2024) |
|
“Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming” Hao et al. (2024) |
|
“AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation” Hu et al. (2024) |
|
“DELTA: Decomposed Efficient Long-Term Robot Task Planning using Large Language Models” Liu et al. (2024) |
|
“Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models” Mahdavi et al. (2024) |
|
“Consolidating Trees of Robotic Plans Generated Using Large Language Models to Improve Reliability” Sakib and Sun (2024) |
|
“Toward a Method to Generate Capability Ontologies from Natural Language Descriptions” Silva et al. (2024) |
|
“Generating consistent PDDL domains with Large Language Models” Smirnov et al. (2024) |
|
“MORPHeus: a Multimodal One-armed Robot-assisted Peeling System with Human Users In-the-loop” Ye et al. (2024) |
|
“There and Back Again: Extracting Formal Domains for Controllable Neurosymbolic Story Authoring” Kelly et al. (2023) |
|
“The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs” Ying et al. (2023) |
|
“ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning” Zhou et al. (2023) |
Model Editing¶
“Can LLMs Fix Issues with Reasoning Models? Towards More Likely Models for AI Planning” Caglar et al. (2024) |
|
“LLMs for AI Planning: A Study on Error Detection and Correction in PDDL Domain Models” Patil (2024) |
|
“Traversing the Linguistic Divide: Aligning Semantically Equivalent Fluents Through Model Refinement” Sikes et al. (2024) |
|
“Exploring the limitations of using large language models to fix planning tasks” Gragera and Pozanco (2023) |
Model Benchmarks¶
LLMs-as-Planners
“A Roadmap to Guide the Integration of LLMs in Hierarchical Planning” Puerta-Merino et al. (2025) |
|
“Exploring and Benchmarking the Planning Capabilities of Large Language Models” Bohnet et al. (2024) |
|
“ACPBench: Reasoning about Action, Change, and Planning” Kokel et al. (2024) |
|
“TravelPlanner: A Benchmark for Real-World Planning with Language Agents” Xie et al. (2024) |
|
“NATURAL PLAN: Benchmarking LLMs on Natural Language Planning” Zheng et al. (2024) |
|
“Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning” (Household) Guan et al. (2023) |
|
“Automating the Generation of Prompts for LLM-based Action Choice in PDDL Planning” Stein et al. (2023) |
|
“ON THE PLANNING ABILITIES OF LARGE LANGUAGE MODELS (A CRITICAL INVESTIGATION WITH A PROPOSED BENCHMARK)” Valmeekam et al. (2023) |
|
“PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change” Valmeekam et al. (2023) |
|
“ALFWorld: Aligning Text and Embodied Environments for Interactive Learning” Shridhar et al. (2021) |
LLMs-as-Formalizers PDDL Benchmarks
The following is the core summary of model generation frameworks in “LLMs as Planning Formalizers: A Survey for Leveraging Large Language Models to Construct Automated Planning Models”:
