Miguel Otero Pedrido

ML Engineer|Founder

Miguel Otero Pedrido is the founder of The Neural Maze, a hub for machine learning (ML) projects where concepts are explained step-by-step with code, articles, and video tutorials. He is a seasoned AI professional with extensive experience in developing and implementing AI solutions across various industries. Miguel has a strong background in machine learning, natural language processing, and computer vision, and has contributed to numerous projects that leverage AI to solve complex problems. Passionate about sharing his knowledge, he has mentored and taught, helping others understand and apply AI technologies effectively.

In this workshop, we’ll build a fully functional multimodal Telegram agent, putting into practice a wide range of concepts from the world of Agentic AI. This isn’t just another PoC — it's designed for those who are ready to level up and build complex, production-ready agentic applications.

Throughout the session, you’ll learn how to build a Telegram agent you can chat with directly from your phone, master the creation and management of workflows with LangGraph, and set up a long-term memory system using Qdrant as a vector database. We’ll also leverage the fast LLMs served by Groq to power the agent’s responses, implement Speech-to-Text capabilities with Whisper, and integrate Text-to-Speech using ElevenLabs. Beyond language, you’ll learn to generate high-quality images using diffusion models, and process visual inputs with Vision-Language Models such as Llama 3.2 Vision.

Finally, we’ll bring it all together by connecting the complete agentic application directly to Telegram, enabling a rich, multimodal user experience. Throughout the day, you will focus on the following key areas:

Understand the full architecture and stack for building production-grade multimodal agents.
Learn to build and debug agent workflows using LangGraph and LangGraph Studio.
Implement short-term (SQLite) and long-term memory (Qdrant) systems for your agent.
Enable speech interactions using Whisper (STT) and ElevenLabs (TTS).
Integrate vision-language understanding with Llama 3.2 Vision and generate images via diffusion models.
Connect your agent to Telegram for real-time, mobile-accessible interactions.

In this workshop, participants will work hands-on with a cutting-edge stack of tools and technologies tailored for building multimodal, production-ready agentic applications. LangGraph serves as the backbone for orchestrating agent workflows, with LangGraph Studio enabling easy debugging and visualization. SQLite powers short-term memory within the agent, while Qdrant, a high-performance vector database, handles long-term memory for contextual awareness. Fast and efficient responses are delivered using Groq LLMs, complemented by natural voice interactions through Whisper for speech-to-text and ElevenLabs for text-to-speech synthesis. For visual intelligence, Llama 3.2 Vision interprets image inputs, and diffusion models are used to generate high-quality visuals. Finally, the complete system is integrated with the Telegram Bot API, allowing users to interact with the agent in real time via chat, voice, or image directly from their mobile devices.

Prerequisites:

Basic Python programming skills
Familiarity with LangChain or LangGraph
Basic understanding of multimodal AI concepts

*Note: These are tentative details and are subject to change.

In this hands-on session, we'll move beyond demos and PoCs to dive into how to build complex agentic systems that work in real-world scenarios. We’ll start by covering the fundamentals of agents (short-term memory, long-term memory, tool use, reasoning techniques, etc), then introduce Agentic RAG and how it differs from traditional RAG, and show how to bring these concepts into production using LLMOps practices like agent monitoring, prompt versioning, dataset management and RAG evaluation. We'll wrap up with a real-time simulation of agents operating inside a video game, seeing all these concepts come to life in action.

Managing and scaling ML workloads have never been a bigger challenge in the past. Data scientists are looking for collaboration, building, training, and re-iterating thousands of AI experiments. On the flip side ML engineers are looking for distributed training, artifact management, and automated deployment for high performance

View all speakers

Miguel Otero Pedrido

09:30AM - 05:30PM Building Intelligent Multimodal Agents: Integrating Vision, Speech & Language Miguel Otero Pedrido ML Engineer|Founder

Hack Sessions Beyond PoCs: Building Real-World Agentic Systems Miguel Otero Pedrido ML Engineer|Founder

Keynote 10:00 - 11.30AM Generative AI and I – Understanding what the new iPhone moment means to us Arnav Garg Data scientist at Fractal Arnav Garg Data scientist at Fractal

Powertalk 10:00 - 11.30AM • AUDI 1 Generative AI and I – Understanding what the new iPhone moment means to us Arnav Garg Data scientist at Fractal

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#

APISID

SSID

HSID

DV

NID

1P_JAR

OTZ

Facebook (2)

_fbp

fr

LinkedIn (6)

bscookie

lidc

bcookie

aam_uuid

UserMatchHistory

li_sugr

Microsoft (2)

MR

ANONCHK

04

10

19

48