
PointLLM-V2: Empowering Large Language Models to Better Understand Point Clouds
Author(s) -
Runsen Xu,
Shuai Yang,
Xiaolong Wang,
Tai Wang,
Yilun Chen,
Jiangmiao Pang,
Dahua Lin
Publication year - 2025
Publication title -
ieee transactions on pattern analysis and machine intelligence
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 3.811
H-Index - 372
eISSN - 1939-3539
pISSN - 0162-8828
DOI - 10.1109/tpami.2025.3590784
Subject(s) - computing and processing , bioengineering
The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding. This paper introduces PointLLM, a preliminary effort to fill this gap, empowering LLMs to understand point clouds and offering a new avenue beyond 2D data. PointLLM understands colored object point clouds with human instructions, including coordinate-based part specifications, and generates contextually appropriate responses, illustrating its grasp of point clouds and common sense. Specifically, it leverages a point cloud encoder with a powerful LLM to effectively fuse geometric, appearance, and linguistic information. To overcome the scarcity of point-text instruction following data, we developed an automated data generation pipeline, collecting a large-scale dataset of about 1.8M samples with 1M different 3D objects, which facilitates the adoption of the two-stage training strategy prevalent in MLLM development. Additionally, we address the absence of appropriate benchmarks and the limitations of current evaluation metrics by proposing two novel benchmarks: Generative 3D Object Classification and 3D Object Captioning, which are supported by new, comprehensive evaluation metrics derived from human and GPT analyses. Through exploring various training strategies, we develop PointLLM, significantly outperforming 2D and 3D baselines and achieving SOTA performance, with a notable achievement in object captioning tasks where it surpasses human annotators in over 50% of the samples. Codes, datasets, and benchmarks will be available at https://github.com/OpenRobotLab/PointLLM .
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom