publications | Shilong Liu Homepage

A full paper list is available at my google scholar page.

2023

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Shilong Liu, Hao Cheng, Haotian Liu, and 10 more authors

arXiv:2311.05437, 2023

Equip multimodal large language models with tools to create multimodal agents.

arXiv Bib HTML Code

@article{liu2023grounding,
  title = {LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents},
  author = {Liu, Shilong and Cheng, Hao and Liu, Haotian and Zhang, Hao and Li, Feng and Ren, Tianhe and Zou, Xueyan and Yang, Jianwei and Su, Hang and Zhu, Jun and Zhang, Lei and Gao, Jianfeng and Li, Chunyuan},
  journal = {arXiv:2311.05437},
  year = {2023},
  codebadge = {https://img.shields.io/github/stars/LLaVA-VL/LLaVA-Plus-Codebase},
}

Grounding DINO: Marrying dino with grounded pre-training for open-set object detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, and 8 more authors

arXiv preprint arXiv:2303.05499, 2023

SOTA open-set object detector. 52.5AP on COCO without COCO training data!

arXiv Bib Code

@article{liu2023groundinh,
  title = {Grounding {DINO}: Marrying dino with grounded pre-training for open-set object detection},
  author = {Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},
  journal = {arXiv preprint arXiv:2303.05499},
  year = {2023},
  codebadge = {https://img.shields.io/github/stars/IDEA-Research/GroundingDINO,https://img.shields.io/github/stars/IDEA-Research/Grounded-Segment-Anything}
}

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

Feng Li, Hao Zhang, Huaizhe Xu, and 4 more authors

In , 2023

SOTA object detection and segmentation model.

arXiv Bib Code

@inproceedings{FengLi2023MaskDT,
  title = {Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation},
  author = {Li, Feng and Zhang, Hao and Xu, Huaizhe and Liu, Shilong and Zhang, Lei and Ni, Lionel M and Shum, Heung-Yeung},
  year = {2023},
}

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

Liu Shilong, Liang Yaoyuan, Huang Shijia, and 5 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence, 2023

A comparison of object detection, REC, and phrase grounding tasks.

arXiv Bib Code

@inproceedings{dqdetr,
  title = {{DQ-DETR}: Dual Query Detection Transformer for Phrase Extraction and Grounding},
  author = {Shilong, Liu and Yaoyuan, Liang and Shijia, Huang and Feng, Li and Hao, Zhang and Hang, Su and Jun, Zhu and Lei, Zhang},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  journal = {AAAI},
  year = {2023},
}

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Hao Zhang*, Feng Li*, Shilong Liu*, and 5 more authors

2023

The first DETR-based object detector that achieved 1st on the COCO detection leaderboard.

arXiv Bib Code

@misc{zhang2022dino,
  title = {DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection},
  author = {Zhang*, Hao and Li*, Feng and Liu*, Shilong and Zhang, Lei and Su, Hang and Zhu, Jun and Ni, Lionel M. and Shum, Heung-Yeung},
  booktitle = {International Conference on Learning Representations},
  year = {2023},
  codebadge = {https://img.shields.io/github/stars/IDEA-Research/DINO}
}

2022

DN-DETR: Accelerate detr training by introducing query denoising

Feng Li*, Hao Zhang*, Shilong Liu, and 3 more authors

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

A novel denoising training strategy for DETR, achieving faster convergence and better performance.

arXiv Bib Code

@inproceedings{li2022dn,
  title = {DN-DETR: Accelerate detr training by introducing query denoising},
  author = {Li*, Feng and Zhang*, Hao and Liu, Shilong and Guo, Jian and Ni, Lionel M and Zhang, Lei},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages = {13619--13627},
  year = {2022},
}

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

Shilong Liu, Feng Li, Hao Zhang, and 5 more authors

In International Conference on Learning Representations, 2022

A deep understanding of DETR’s query, and formulating queries as anchor boxes.

arXiv Bib Code

@inproceedings{liu2022dabdetr,
  title = {{DAB}-{DETR}: Dynamic Anchor Boxes are Better Queries for {DETR}},
  author = {Liu, Shilong and Li, Feng and Zhang, Hao and Yang, Xiao and Qi, Xianbiao and Su, Hang and Zhu, Jun and Zhang, Lei},
  booktitle = {International Conference on Learning Representations},
  year = {2022},
  url = {https://openreview.net/forum?id=oMI9PjOb9Jl},
  codebadge = {https://img.shields.io/github/stars/IDEA-Research/DAB-DETR}
}

2021

Query2Label: A Simple Transformer Way to Multi-Label Classification

Shilong Liu, Lei Zhang, Xiao Yang, and 2 more authors

2021

A novel transformer-based multi-label classification model, achieving SOTA on four benchmarks.

arXiv Bib Code

@misc{liu2021query2label,
  title = {Query2Label: A Simple Transformer Way to Multi-Label Classification},
  author = {Liu, Shilong and Zhang, Lei and Yang, Xiao and Su, Hang and Zhu, Jun},
  year = {2021},
  eprint = {2107.10834},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
  codebadge = {https://img.shields.io/github/stars/SlongLiu/query2labels}
}