AGI Advance: Weekly AI & AGI Insights (Mar 25, 2025)

Turing Staff
26 Mar 20253 mins read
LLM training and enhancement
GenAI
AGI_Advance_Newsletter_Cover

Welcome to AGI Advance, Turing’s weekly briefing on AI breakthroughs, AGI research, and industry trends.

This week, we’re exploring how LLMs are stepping into the product design loop, how multi-agent systems are enabling more refined reasoning, and why GPT-4o continues to lead in process-level intelligence.

What we're thinking

At Turing, we’re applying AGI capabilities to reshape how product managers build, test, and evaluate workflows—without writing a single line of code.

This week, we’re spotlighting AI’s role in technical vetting and product flow automation:

  • From manual to AI-driven scoring: We’ve automated the evaluation of thousands of business and technical writing assessments—using GPT-based logic to score grammar, clarity, and analytical depth. All of this runs at scale without engineering support.
  • Prompting as product specification: Instead of shipping PRDs to ML teams, PMs now define and refine scoring logic by writing prompts—essentially building micro-models on their own. Using ChatGPT, they also generate custom Apps Scripts to automate workflows directly within Google Sheets.
  • Ternary evaluation > binary outcomes: By categorizing model outputs as "obviously good," "unsure," or "obviously bad," we’re building more explainable and transparent evaluations that rival expert reviews.

The result: AI-powered PMs can now launch and refine vetting systems at scale—turning Google Sheets into an AI stack and saving thousands of engineering hours.

What we're saying

Insights from Turing’s leadership on the latest AGI developments and industry shifts:

Research: Knowledge-Aware Iterative Retrieval for Multi-Agent Systems proposes a framework where LLM agents refine their own queries, update internal memory, and coordinate for better reasoning outcomes.

Sam Ho, Product Leader:
"AI isn’t just getting smarter—it’s learning to work like a team. Early models predicted words. Then came Chain of Thought and Deep Research, mimicking solo reasoning. Now, we’re entering a new phase: multi-agent AI that collaborates, refines its own searches, and forgets what doesn’t matter.

Instead of static lookups, these systems challenge assumptions, iterate, and reduce noise—leading to sharper insights at lower cost. It’s not just faster AI, it’s AI that thinks better."

What we're reading

  • Smoldocling: An Ultra-Compact Vision-Language Model For End-to-End Multi-Modal Document Conversion
    IBM and HuggingFace researchers introduced SmolDocling, a compact (256M) vision-language model that converts entire document pages into structured markup (DocTags) with layout, content, and spatial annotations. Unlike larger LVLMs or pipeline-based systems, SmolDocling runs end-to-end—accurately parsing tables, code, equations, charts, and footnotes across business, academic, and legal documents.
    Despite being up to 27× smaller than some competitors, it outperforms or matches larger models in tasks like text recognition, code listing extraction, and formula conversion, setting a new bar for lightweight, multi-modal document understanding​.
  • Towards Effective Extraction and Evaluation of Factual Claims
    Microsoft’s Claimify introduces a more robust way to extract and evaluate factual claims from long-form LLM outputs—handling ambiguity, underspecification, and decontextualization.
    Backed by a new evaluation framework (entailment, coverage, decontextualization), Claimify outperformed five other methods—achieving 99% entailment, the highest coverage accuracy, and the most reliable context handling​.
  • MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
    This paper introduces MPBench, the first large-scale multimodal benchmark to evaluate Process Reward Models (PRMs) across three reasoning tasks: step correctness, answer aggregation, and reasoning process search.
    With over 9,700 labeled instances across science, math, and commonsense, MPBench enables structured assessment of PRMs in both training and inference contexts. GPT-4o emerged as the top-performing model, especially in tree-structured reasoning search, but the study highlights ongoing challenges—especially in mathematical reasoning, where even top models struggle​.

Where we’ll be

Turing will be at two major AI conferences in the coming months—join us to discuss the future of AGI:

  • ICLR 2025 [Singapore | Apr 24 – 28]
    A top-tier deep learning conference covering representation learning, AI optimization, and theoretical advancements.
  • MLSys 2025 [Santa Clara, CA | May 12 – 15]
    A major event focused on the intersection of machine learning and systems, discussing efficient AI model training, distributed learning, and AI hardware innovations.

If you’re attending, reach out—we’d love to connect and exchange insights!

Stay ahead with AGI Advance

Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.

[Subscribe & Read More]

Want to accelerate your business with AI?

Talk to one of our solutions architects and start innovating with AI-powered talent.

Get Started