The Language-Guided Navigation module leverages an LLM (like ChatGPT) and the open-set O3D-SIM.The Language-Guided Navigation module leverages an LLM (like ChatGPT) and the open-set O3D-SIM.

VLN: LLM and CLIP for Instance-Specific Navigation on 3D Maps

Abstract and 1 Introduction

  1. Related Works

    2.1. Vision-and-Language Navigation

    2.2. Semantic Scene Understanding and Instance Segmentation

    2.3. 3D Scene Reconstruction

  2. Methodology

    3.1. Data Collection

    3.2. Open-set Semantic Information from Images

    3.3. Creating the Open-set 3D Representation

    3.4. Language-Guided Navigation

  3. Experiments

    4.1. Quantitative Evaluation

    4.2. Qualitative Results

  4. Conclusion and Future Work, Disclosure statement, and References

3.4. Language-Guided Navigation

In this section, we leverage the LLM-based approach from [1], which uses ChatGPT [35] to understand and map language commands to pre-defined function primitives that the robot can understand and execute. However, there are a few differences between our current approach and the approach in [1] regarding the use case of the LLM and the implementation of our function primitives. The previous approach used the LLM’s ability to bring in an open-set understanding by mapping general queries to the already-known closed-set class labels obtained via Mask2Former [7].

\ However, given the open-set nature of our new representation, O3D-SIM, the LLM does not need to do that. Figure 4 shows both approaches’ code output differences. The function primitives work similarly to the older approach, requiring the desired object type and its instance as an input. But now, the desired object is not from a pre-defined set of classes but a small query defining the object, so the implementation to find the desired location changes. We use the text and image-aligned nature of CLIP embeddings to find the desired object, where the input description is passed to the model, and its corresponding embedding is used to find the object in O3D-SIM.

\ A cosine similarity is calculated between the embedding of the description and all the embeddings of our representation. These are ranked in a decreasing order, and the desired instance is selected. Once the instance is finalized, a goal corresponding to this instance is generated and passed to the navigation stack for autonomous navigation of the robot, hence achieving Language-Guided Navigation.

\

:::info Authors:

(1) Laksh Nanwani, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;

(2) Kumaraditya Gupta, International Institute of Information Technology, Hyderabad, India;

(3) Aditya Mathur, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;

(4) Swayam Agrawal, International Institute of Information Technology, Hyderabad, India;

(5) A.H. Abdul Hafez, Hasan Kalyoncu University, Sahinbey, Gaziantep, Turkey;

(6) K. Madhava Krishna, International Institute of Information Technology, Hyderabad, India.

:::


:::info This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.

:::

\

Piyasa Fırsatı
Large Language Model Logosu
Large Language Model Fiyatı(LLM)
$0.0003468
$0.0003468$0.0003468
+5.02%
USD
Large Language Model (LLM) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen [email protected] ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Kalshi Jumps to 62% Market Share While Polymarket Eyes $10B Valuation

Kalshi Jumps to 62% Market Share While Polymarket Eyes $10B Valuation

The post Kalshi Jumps to 62% Market Share While Polymarket Eyes $10B Valuation appeared on BitcoinEthereumNews.com. Fintech 19 September 2025 | 16:03 Event-based trading platforms are no longer niche experiments – they’re emerging as a major arena where finance, crypto, and information converge. After months of subdued activity, volumes are climbing again, and U.S.-regulated Kalshi has unexpectedly taken the lead. Betting on Everything From Rates to Sports Analysts at Bernstein describe prediction markets as a new “interface for information,” where users speculate not only on sports results but also on Federal Reserve decisions, quarterly earnings, and even crypto price moves. This year alone, more than $200 million changed hands on Polymarket contracts linked to the Fed’s recent 25 bps rate cut, while $85 million traded on Kalshi around the same decision. Mainstream brokers like Coinbase and Robinhood are watching closely, with ambitions to capture some of the momentum. With U.S. sports betting already worth tens of billions annually, the overlap is too big to ignore. Against that backdrop, Kalshi has delivered one of its strongest months since the 2024 elections. The platform reports $1.3 billion in trading volume so far in September, accounting for 62% of global prediction market activity. Just a year ago, Kalshi’s share stood at 3%. CEO Tarek Mansour called the growth “remarkable,” noting that the exchange still serves only U.S. clients. Polymarket’s Pushback Its main rival, Polymarket, has logged about $773 million in trades this month. While that trails Kalshi for now, Polymarket has unique advantages: as a crypto-native platform, it has carved out strong global demand and is working toward a formal U.S. relaunch via its acquisition of derivatives exchange QCEX. The two platforms now stand as the clear leaders of the sector, though they embody different philosophies — one regulated from the ground up, the other built around decentralization. Investors Take Notice The boom hasn’t escaped venture capital. Reports suggest…
Paylaş
BitcoinEthereumNews2025/09/19 21:34
Visa Expands USDC Stablecoin Settlement For US Banks

Visa Expands USDC Stablecoin Settlement For US Banks

The post Visa Expands USDC Stablecoin Settlement For US Banks appeared on BitcoinEthereumNews.com. Visa Expands USDC Stablecoin Settlement For US Banks
Paylaş
BitcoinEthereumNews2025/12/17 15:23
Bitcoin Lightning Network Capacity Surges to Historic Peak as Exchange Adoption Accelerates

Bitcoin Lightning Network Capacity Surges to Historic Peak as Exchange Adoption Accelerates

The Bitcoin Lightning Network has reached an all-time high in total network capacity, marking a significant milestone for the layer-2 scaling solution designed to enable fast and inexpensive Bitcoin transactions. The surge comes as major cryptocurrency exchanges increasingly integrate Lightning functionality, bringing the technology to millions of users who previously relied solely on slower, more expensive on-chain transactions. This capacity expansion reflects growing confidence in Lightning's reliability and utility after years of development and real-world testing. What began as an experimental protocol discussed primarily among technical enthusiasts has matured into infrastructure that some of the industry's largest platforms now consider essential to their operations.
Paylaş
MEXC NEWS2025/12/17 17:14