PointLLM-V2: Empowering Large Language Models to Better Understand Point Clouds | Zendy

Runsen Xu | Zendy; Shuai Yang | Zendy; Xiaolong Wang | Zendy; Tai Wang | Zendy; Yilun Chen | Zendy; Jiangmiao Pang | Zendy; Dahua Lin | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

PointLLM-V2: Empowering Large Language Models to Better Understand Point Clouds

Author(s) -

Runsen Xu,

Shuai Yang,

Xiaolong Wang,

Tai Wang,

Yilun Chen,

Jiangmiao Pang,

Dahua Lin

Publication year - 2025

Publication title -

ieee transactions on pattern analysis and machine intelligence

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 3.811

H-Index - 372

eISSN - 1939-3539

pISSN - 0162-8828

DOI - 10.1109/tpami.2025.3590784

Subject(s) - computing and processing , bioengineering

The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding. This paper introduces PointLLM, a preliminary effort to fill this gap, empowering LLMs to understand point clouds and offering a new avenue beyond 2D data. PointLLM understands colored object point clouds with human instructions, including coordinate-based part specifications, and generates contextually appropriate responses, illustrating its grasp of point clouds and common sense. Specifically, it leverages a point cloud encoder with a powerful LLM to effectively fuse geometric, appearance, and linguistic information. To overcome the scarcity of point-text instruction following data, we developed an automated data generation pipeline, collecting a large-scale dataset of about 1.8M samples with 1M different 3D objects, which facilitates the adoption of the two-stage training strategy prevalent in MLLM development. Additionally, we address the absence of appropriate benchmarks and the limitations of current evaluation metrics by proposing two novel benchmarks: Generative 3D Object Classification and 3D Object Captioning, which are supported by new, comprehensive evaluation metrics derived from human and GPT analyses. Through exploring various training strategies, we develop PointLLM, significantly outperforming 2D and 3D baselines and achieving SOTA performance, with a notable achievement in object captioning tasks where it surpasses human annotators in over 50% of the samples. Codes, datasets, and benchmarks will be available at https://github.com/OpenRobotLab/PointLLM .

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research