The Tribe Research Collection The Tribe Research Collection is an application intended to help users learn about the research publications that William & Mary professors have been working on. Project data is web-scraped from William & Mary websites using Firecrawl, and summarized with GPT-4o. Built and deployed with Flask and Vercel. Clicking on each project card leads to the publication for the corresponding project.

Computer Science Mathematics Biology Data Science
Plant and animal endemism in the eastern Andean slope: challenges to conservation
Jennifer Swenson
The Andes-Amazon basin in Peru and Bolivia is a biologically rich but understudied area with high endemism and unknown species distributions.
Measuring News Similarity Across Ten U.S. News Sites
Alexander Nwala
The text discusses a method to identify and measure the similarity of top emphasized news stories across various U.S.-based news websites, highlighting the widespread but poorly quantified phenomena of editorial decision-making and story selection.
The Many Shapes of Archive-It
Alexander Nwala
Web archives are crucial for digital preservation, enabling journalists, social scientists, historians, and government organizations to curate and manage their own collections by selecting original resources.
Bootstrapping Web Archive Collections from Social Media
Alexander Nwala
Automatically and semiautomatically generated archived web collections from social media platforms offer a cost-effective alternative to human-curated collections and were analyzed for their similarity to Archive-It collections.
Scraping SERPs for Archival Seeds: It Matters When You Start
Alexander Nwala
The paper investigates how the retrievability of URIs of news stories found on Google changes over time, impacting event-based collection building.
Query-Driven Multimodal GraphRAG: Dynamic Local Knowledge Graph Construction for Online Reasoning
Yi He
AThe proposed Query-Driven Multimodal GraphRAG framework enhances interpretability and reliability of LLMs in complex reasoning tasks by dynamically constructing query-specific local knowledge graphs, excelling in cross-modal understanding and achieving state-of-the-art performance on MultimodalQA and WebQA datasets.
HGDL: Heterogeneous Graph Label Distribution Learning
Heng Lian, Yi He
This paper introduces a novel framework for heterogeneous graph label distribution learning (HGDL) that addresses challenges of node type, attribute, and neighborhood structure heterogeneity using proactive graph topology homogenization and a consistency-aware graph transformer, demonstrating its effectiveness through theoretical and empirical validation.
Learning Gradual Typing Performance
Yi He
Gradual typing, which seeks to merge the benefits of static and dynamic typing, faces challenges with unpredictable performance, prompting efforts to optimize it, though understanding and managing the performance landscape during program migration remains underdeveloped.
Towards Utilitarian Online Learning – A Review of Online Algorithms in Open Feature Space
Yi He
This paper reviews recent advancements in Utilitarian Online Learning (UOL) within open feature spaces, categorizes existing models, assesses their strengths and weaknesses, examines application scenarios, benchmarks model performance, and explores challenges and future research directions.
Generating Virtual Reality Stroke Gesture Data from Out-of-Distribution Desktop Stroke Gesture Data
Jindong Wang
The paper utilizes desktop interaction data to generate VR interaction data, focusing on time-varying stroke gestures to aid user behavior analysis and experience enhancement.
Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning
Jindong Wang
The paper introduces Robustness Critical Fine-Tuning (RiFT), a method aimed at improving the generalization of deep neural networks while maintaining adversarial robustness, addressing limitations of traditional Adversarial Training.
MetaFed: Federated Learning among Federations with Cyclic Knowledge Distillation for Personalized Healthcare
Jindong Wang
MetaFed is a novel framework for federated learning that enhances model personalization and performance across federations without a central server, using Cyclic Knowledge Distillation to overcome data heterogeneity, and improves accuracy and communication efficiency in healthcare applications.
Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction
Jindong Wang
The paper proposes a Multi-Grained Consistency Regularization (MGCR) method to leverage unlabeled data for improving Target-oriented Opinion Words Extraction (TOWE), addressing training data scarcity and distribution shifts, and demonstrates its effectiveness on benchmark datasets.
Generalizing to Unseen Domains: A Survey on Domain Generalization
Jindong Wang
This paper provides a comprehensive review of recent advances in domain generalization, a field focused on developing models that can generalize to unseen test domains, by defining the concept, categorizing related algorithms, discussing theories, and suggesting future research topics.
AutoRuleSQL: Hybrid Text-to-SQL via Rule-Driven Fast Paths and LLM Bootstrapping
Haipeng Chen
AutoRuleSQL, a hybrid NL2SQL system, enhances real-time query efficiency by combining template-based methods with LLM fallback, reducing latency by over 12.6% and improving accuracy by up to 4.0%.
SEQUENTIAL STOCHASTIC COMBINATORIAL OPTIMIZATION USING HIERARCHICAL REINFORCEMENT LEARNING
Haipeng Chen
This paper introduces a novel hierarchical reinforcement learning framework, wake-sleep option (WS-option), to address sequential stochastic combinatorial optimization problems effectively, demonstrating improved effectiveness, generalizability, and computational efficiency over traditional methods.
CAN REINFORCEMENT LEARNING SOLVE ASYMMETRIC COMBINATORIAL-CONTINUOUS ZERO-SUM GAMES?
Haipeng Chen
The paper introduces and analyzes two-player Asymmetric Combinatorial-Continuous zEro-Sum (ACCES) games, proves Nash equilibrium existence, develops the Combinatorial Continuous DO (CCDO) algorithm to solve them, and presents the CCDORL algorithm based on reinforcement learning, with experiments validating their effectiveness.
Population Aware Diffusion for Time Series Generation
Haipeng Chen
PaD-TS is a new time series generation model designed to preserve population-level properties, like value distributions and cross-correlation, reducing distribution shifts while maintaining individual-level data authenticity, demonstrating substantial improvements over existing models.
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Qingyun Wang
The study explores how Vision-Language Models can self-correct and improve using a Self-Correction Learning approach during fine-tuning, demonstrating enhanced performance without external feedback, unlike during iterative inference.
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering
Qingyun Wang
This paper investigates the ability of Language Models to align multilingual knowledge, improving cross-lingual question answering and performing effectively in zero-shot and retrieval-augmented contexts.
SCIMON : Scientific Inspiration Machines Optimized for Novelty
Qingyun Wang
The study aims to improve neural language models' capacity to generate innovative scientific ideas from literature by using background contexts instead of traditional binary link prediction, thus enhancing expressivity and novelty.