Details MIVPG experiments across single- and multi-image scenarios. Model uses frozen LLM and Visual Encoder, updating only the MIVPG for efficiency.Details MIVPG experiments across single- and multi-image scenarios. Model uses frozen LLM and Visual Encoder, updating only the MIVPG for efficiency.

Evaluating Visual Adapters: MIVPG Performance on Single and Multi-Image Inputs

2025/11/15 11:12

Abstract and 1 Introduction

  1. Related Work

    2.1. Multimodal Learning

    2.2. Multiple Instance Learning

  2. Methodology

    3.1. Preliminaries and Notations

    3.2. Relations between Attention-based VPG and MIL

    3.3. MIVPG for Multiple Visual Inputs

    3.4. Unveiling Instance Correlation in MIVPG for Enhanced Multi-instance Scenarios

  3. Experiments and 4.1. General Setup

    4.2. Scenario 1: Samples with Single Image

    4.3. Scenario 2: Samples with Multiple Images, with Each Image as a General Embedding

    4.4. Scenario 3: Samples with Multiple Images, with Each Image Having Multiple Patches to be Considered and 4.5. Case Study

  4. Conclusion and References

\ Supplementary Material

A. Detailed Architecture of QFormer

B. Proof of Proposition

C. More Experiments

4. Experiments

To assess the effectiveness of our proposed approach, we conduct evaluations across various scenarios:

\

  1. where each sample comprises a single image, and patches are naturally considered as instances;

    \

  2. where each sample includes multiple instances, but we use a general embedding for each image;

    \

  3. where each sample contains multiple images, with each image containing multiple patches.

4.1. General Setup

We initialize our model using BLIP2 [22] with FLAN-T5- XL. MIVPG is initialized with weights from QFormer. The model consists of a frozen language model and a frozen visual model. During training, we only update the MIVPG. The visual encoder, ViT-G, is employed to encode images into patches of embeddings, and the images are resized to dimensions of 224 × 224. In our experiments, we observed that unfreezing the visual encoder does not lead to additional improvements in datasets with small sizes. Further details can be found in the supplementary C.1.

\

:::info Authors:

(1) Wenliang Zhong, The University of Texas at Arlington ([email protected]);

(2) Wenyi Wu, Amazon ([email protected]);

(3) Qi Li, Amazon ([email protected]);

(4) Rob Barton, Amazon ([email protected]);

(5) Boxin Du, Amazon ([email protected]);

(6) Shioulin Sam, Amazon ([email protected]);

(7) Karim Bouyarmane, Amazon ([email protected]);

(8) Ismail Tutar, Amazon ([email protected]);

(9) Junzhou Huang, The University of Texas at Arlington ([email protected]).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Mt. Gox moves $936M in Bitcoin after eight-month dormancy

Mt. Gox moves $936M in Bitcoin after eight-month dormancy

The post Mt. Gox moves $936M in Bitcoin after eight-month dormancy appeared on BitcoinEthereumNews.com. Key Takeaways Mt. Gox moved $936 million in Bitcoin after eight months of inactivity. The movement relates to the exchange’s ongoing court-supervised creditor repayment process. Mt. Gox, the defunct crypto exchange, moved $936 million worth of Bitcoin today after remaining dormant for eight months. The transfer involved shifting Bitcoin to a new wallet address, marking the first significant activity from the exchange’s holdings since March. The movement comes as Mt. Gox continues its court-supervised creditor repayment process. The rehabilitation trustee has extended the deadline for creditor reimbursements to allow more time for managing Bitcoin distributions. Mt. Gox has been gradually shifting Bitcoin to new addresses as part of its ongoing efforts to repay creditors. The exchange collapsed in 2014 following a massive hack that resulted in the loss of around 850,000 Bitcoin. The latest wallet activity suggests preparations may be underway for additional creditor payments, though the exchange has not disclosed specific timelines for distributions. Mt. Gox began returning funds to creditors in 2024 after years of legal proceedings. This is a developing story. Source: https://cryptobriefing.com/mt-gox-moves-936m-in-bitcoin-after-eight-month-dormancy/
Share
BitcoinEthereumNews2025/11/18 12:58