Computer Vision in Langauge

Matthias Scharf's Leadership in Transforming AI, Cloud, and Computer Vision into Business Value

Matthias Scharf is a multi-entrepreneur driving AI, cloud, and computer-vision solutions across telecom, energy, fintech and industrial ...

Frontiers

Foundation Models for Healthcare: Innovations in Generative AI, Computer Vision, Language Models, and Multimodal Systems

Artificial Intelligence (AI) has undergone remarkable advancements, revolutionizing fields such as general computer vision ...

Carnegie Mellon University

Blending Humanistic Inquiry and Technology, Carnegie Mellon Leads a New Era of Cultural Study and Research

Carnegie Mellon University will introduce new academic programs and resources for students and researchers to blend traditional humanistic inquiry with computational methods like computer vision, ...

Microsoft

Fara-7B: An Efficient Agentic Model for Computer Use

In 2024, Microsoft introduced small language models (SLMs) to customers, starting with the release of Phi (opens in new tab) models on Microsoft Foundry (opens in new tab), as well as deploying Phi ...

Dark Reading

Vision Language Models Keep an Eye on Physical Security

Vision language models (VLMs) have made impressive strides over the past year, but can they handle real-world enterprise challenges? All signs point to yes, with one caveat: They still need maturing ...

GitHub

MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

MobileVLA-R1 enables robust real-world quadruped control by unifying language reasoning and continuous action through structured CoT alignment and GRPO training. Grounding natural-language ...

GitHub

DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models

DeepThinkVLA rethinks Vision-Language-Action (VLA) policies with explicit deliberation. Starting from the public pi0-FAST checkpoint, we refactor the policy into a 2.9B parameter hybrid decoder that ...

Microsoft

OPA-DPO: Efficiently minimizing hallucinations in large vision-language models

Large vision-language models are improving at describing images, yet hallucinations still erode trust by introducing contradictions and fabricated details that limit practical applications. In ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results