Patrick Ellis lives on the cutting edge of AI — literally. As CTO and co-founder of Snapbar, he's been running AI-native revenue pipelines for three years, back when most companies were still figuring out what ChatGPT was. His company builds AI-powered experiential marketing tools for brands like FIFA, the NFL, Google, and Paramount, creating immersive experiences at live events through image and video generation.
Patrick wears another hat, too: resident expert at Seattle's AI2 incubator, where he advises startups on how to leverage AI effectively. When we sat to talk, I knew I wasn't getting surface-level takes. Patrick's been in the trenches dealing with the messy reality of getting AI from demo to production, navigating rapidly changing model capabilities, and figuring out which tools actually work for which problems.
Here are three takeaways from our conversation that shifted how I think about building with AI.
Patrick described where we are with AI as a "jagged frontier of innovation." We've only had three years to explore not just the technology itself, but also to find new use cases for it. Whether you're a startup or corporate, you're having to figure this out in real-time. The challenging part? There's no roadmap. The exciting part? There's no roadmap.
OpenAI mentioned recently that the capabilities of current models far exceed the applied uses that most people are actually leveraging them for. We're sitting on this iceberg of potential, but we're only seeing what's above the water. Most of us don't fully understand what's below yet.
For startups especially, this creates opportunity. You can carve out space and stay ahead just by pressing into new innovations and exploring workflows no one has documented yet. But it also means you're constantly operating in unprecedented territory, learning from peers who are also exploring, because traditional experts don't really exist in this space yet.
Here's the problem many AI startups are running into: you can create a shiny demo, generate a cool image in a specific case, or craft a compelling prompt. But when you go to apply that across the full breadth of user applications and ideas, you start running into serious issues around reliability.
Patrick's team at Snapbar has spent significant time and resources on what they call "evals" — testing frameworks to ensure AI outputs are consistently delivering the results they need. For big brands, this means being really creative within confines. The AI can come up with all kinds of cool outputs that users influence, but it still needs to stay within safeguards those brands require.
The good news? In the last six to nine months, models have gotten smart enough that companies can start to let go of some safeguards, relying more on the innate model capabilities to steer direction. The infrastructure is finally getting to the point where it's becoming easier to build without needing deep expertise in LLM evaluation and all these new disciplines that have emerged around the models.
But make no mistake: getting from demo to production still requires a good chunk of R&D, testing, and figuring out exactly how to craft experiences that are both predictable and creative.
This might be Patrick's hottest take: his favorite model is Google's Gemini. But there’s nuance. He uses different models for different types of work, and that's the key insight.
For coding and engineering, Anthropic's Claude Sonnet 4.5 is his go-to. He described it as a "scrappy entrepreneur" that searches your codebase, uses web search when needed, and does all kinds of work. It handles financial modeling in Excel, organizes files in Google Drive, and helps create to-dos for the day based on goals and projects. For real-time information, he uses Grok because it has access to Twitter, where a lot of breaking news and AI industry updates happen first. For general questions replacing Google searches, ChatGPT works great. For coding architecture and bigger engineering thinking, he likes Codex.
But his favorite overall is Gemini, specifically for its Deep Research feature (250+ Google searches and compiles executive summaries) and Deep Think (different models analyze problems from different perspectives and aggregate into one recommendation). These save him hours or even days of research when tackling meaty business strategy, life decisions, or architectural challenges.
The lesson? Stop being loyal to a single AI tool. Model capabilities change week by week. What's ranking best today might be different next month. You need to have your finger on the pulse of what models are currently performing best for your specific use case and be willing to switch.
What stood out to me from this conversation was Patrick's emphasis on orchestration around the models. AI systems are incredibly capable, in some ways operating at PhD level, but they're only as good as the information they have access to, the tools they can use, and the clarity around what good output looks like.
It's like onboarding a new employee. You need to give them context about your project, your company, and the specific task. You need to provide them with the right tools: access to Google, custom internal tools, the ability to make changes in your system. And you need to give them examples of what good looks like from a quality perspective.
The exciting news? A lot of research that cutting-edge startups are doing is getting brought into the core functionality of ChatGPT, Claude, and other tools. Features like Claude Skills (where you can specify different processes you do day-to-day and train the model on your exact workflows) are making it easier for everyone to leverage AI without being an expert.
We're still on that jagged frontier. The infrastructure is getting better, the models are getting smarter, but the gap between experimentation and production value? That's going to take time. The good news is that pioneers like Patrick are figuring out what actually works, so the rest of us don't have to start from scratch.
Listen to the full episode of Actually Intelligent to hear more from Patrick Ellis about AI benchmarks, why the model evaluation system is totally broken right now, and which AI podcast he listens to daily (no, it's not this one … yet).