Breaking Down Palantir MMDP: A Data Platform That Moves Compute, Not Data

Breaking Down Palantir MMDP: A Data Platform That Moves Compute, Not Data (Illustration: In the process of exploring the world and becoming yourself (being), ensure freedom, ensure the underlying safety net mechanisms, ensure common interface interoperability, while tuning parameters (metacognition) and running into the wind with open arms. That said, running still requires eating, and eating can never be let go. Taken in 2016 in Madrid, allegedly at the world’s oldest restaurant. Made sure to bring the magic card, made sure the communication interface was interoperable, left the rest to the kitchen, and comfortably walked out five minutes before the restaurant closed at eleven. Photo by Ernest.)

MMDP = Multimodal Data Plane

At Palantir DevCon 5, Data Plane Group co-lead Ted introduced MMDP. I was curious enough to practice breaking it down.

  • First, the feature list is long, and I wanted to compare it against my own product roadmap.
  • Second, their design trade-offs reflect a core question: when enterprise data is scattered everywhere, should you move the data or move the compute? (No right or wrong answer here, but I wanted to compare the reasoning and parameters.)

⌬ A Long Feature List

  • On the surface, Palantir MMDP is a data platform supporting structured, unstructured, media, time series, and streaming data.
  • It has a SQL Console for querying datasets and ontology.
  • It has Pipeline Builder for building workflows.
  • It has Spark for large-scale computation, plus a governance and security model.
  • Ted calls it “the foundation of our AI platform.”
  • SQL Console supports saving analyses as worksheets, with two states: draft (private, auto-saved) and published (shareable).

If you stop here, the differentiation from other data platforms isn’t particularly obvious. The interesting stuff always requires digging deeper.

⌬ Data Stays Home, Compute Goes to It

Continuing to peel back the layers, what I wanted to pay attention to is how choices are made — design choices. (System design requires subtraction, not addition.)

Virtual Tables don’t copy data; they reference it:

  • Create a reference in Foundry that points to data in Snowflake, BigQuery, or Databricks.
  • All analytics pipelines and ontology ingestion can use it directly, as if it were a native dataset.
  • Combined with native Apache Iceberg tables as an open standard, this doesn’t just unlock cross-platform interoperability — it also brings performance improvements.
  • Data doesn’t need to move.

The Furnace query engine decouples SQL syntax from the underlying execution engine:

  • Two benefits: the system can use heuristics to route queries to the most appropriate compute resource, and upgrading the underlying engine won’t break existing queries.
  • The Create Table feature in SQL Console can materialize query results into tables, with data lineage automatically inherited.
  • From query to materialization to building ontology object types, the entire path runs through a single interface.

(This reminds me that when designing workflow systems, the key usually isn’t the surface-level features, but the interoperability interfaces. When interfaces are stable, surface implementations can be swapped out as times and market demands change. Furnace’s decoupled design follows the same line of thinking.)

(Decoupled design usually creates comfort — like how I appreciated Manny using the lens of “comfort” to interpret AK LLM Wiki PKM the other day.) (Sneaking in another confession XDD)

The introduction of single-node engines (Polars, DuckDB, Data Fusion):

  • Ted said Spark is “a jack of all trades, master of none.”
  • In cloud-native environments, advances in GPU and CPU architectures have dramatically improved single-machine parallel processing — you no longer need distributed computing for everything.
  • Ted mentioned that adoption of these engines across their customer base has been surprisingly strong, particularly in enabling low-latency workflows that were previously impossible with batch compute.

⌬ Governance Never Gets Outsourced

MMDP’s deepest design logic: governance and orchestration can never be let go, but the location of compute and storage can remain flexible.

  • Federated compute is the ultimate expression of this logic.
  • Define transformation logic in Foundry, but push the compute to Snowflake.
  • Pipeline Builder’s strong typing and preview features work as normal, data lineage is tracked as normal, governance is uninterrupted.
  • Ted specifically showed a detail in the demo: after computation completes, the Foundry interface has a button that takes you directly to Snowflake to monitor execution status.
  • This detail represents: even though compute runs externally, the user experience stays consistent.

(Comparing this to what we learned when we participated in defining the Bluetooth FTMS specification years ago: true interoperability means ensuring consistency of the common interface while preserving each party’s freedom of implementation. Federated compute follows the same thinking — which compute engines to adopt can vary, but the governance interface is shared, negotiated, and defined.)

Assembling MMDP as a whole:

  • Let AI applications quickly access the data they need, regardless of where it lives.
  • In the AIP architecture, the bottom layer is MMDP handling data, the middle layer is governance and orchestration for scaling, and the top layer is ontology driving decisions and agentic applications.
  • Ted concluded that the success of all AI/agentic workflows depends on data availability, the ability to scale, and the ability to build.

MMDP doesn’t need to be the only data platform — instead, it becomes the governance and orchestration layer between various platforms. This arrangement is perhaps a design decision worth watching.

MMDP’s deepest design logic: governance and orchestration can never be let go. Just like hitting that “like” or “heart” button every day can never be let go — it’s the same underlying logic. A day without pressing it, and running into the wind feels like it’s missing the breeze. One quick press, and then comfortably walk away.


✳️ Further Reading


✳️ Knowledge Graph

(More about Knowledge Graph…)

graph LR
    subgraph Storage_Layer [Storage Layer]
        Iceberg["Apache Iceberg"]
        VirtualTables["Virtual Tables"]
        MediaSets["Media Sets"]
    end

    subgraph Compute_Layer [Compute Layer]
        Furnace["Furnace Engine"]
        Spark["Apache Spark"]
        SingleNode["Single Node Engines: Polars, DuckDB"]
        ExternalCompute["External Compute: Snowflake, BigQuery"]
    end

    subgraph Management [Management and Governance]
        Governance["Governance and Security"]
        Lineage["Data Lineage"]
        Orchestration["Orchestration"]
    end

    subgraph Application_Layer [Application and AI Layer]
        Ontology["Ontology: Semantic Layer"]
        AIP_Agents["AIP Agents"]
        Workshop["Workshop Apps"]
    end

    VirtualTables -- "references data in" --> ExternalCompute
    Iceberg -- "enables interoperability for" --> Storage_Layer
    Furnace -- "routes queries to" --> SingleNode
    Furnace -- "routes queries to" --> Spark
    PipelineBuilder["Pipeline Builder"] -- "defines logic for" --> ExternalCompute
    Storage_Layer -- "feeds data into" --> Ontology
    Management -- "enforces rules on" --> Compute_Layer
    Management -- "enforces rules on" --> Storage_Layer
    Ontology -- "provides context for" --> AIP_Agents
    SQL_Console["SQL Console"] -- "interacts with" --> Furnace

    style Iceberg fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style VirtualTables fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style MediaSets fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style Furnace fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style Spark fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style SingleNode fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style ExternalCompute fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style AIP_Agents fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style Workshop fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style SQL_Console fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style PipelineBuilder fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff

    style Storage_Layer fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Compute_Layer fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Management fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Application_Layer fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Ontology fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Governance fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Lineage fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Orchestration fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
sequenceDiagram
    participant User as Developer
    participant Console as SQL Console
    participant Furnace as Furnace Engine
    participant Snowflake as Snowflake (External)
    participant Governance as Governance and Lineage
    participant Storage as Foundry Storage (Iceberg)

    User->>Console: Write SQL query (Create Table)
    Console->>Furnace: Submit Query
    Furnace->>Governance: Check Permissions and Lineage
    Governance-->>Furnace: Access Granted
    Furnace->>Snowflake: Execute Federated Compute
    Snowflake-->>Furnace: Return Results/Status
    Furnace->>Storage: Materialize Results as Iceberg
    Storage-->>User: Table Ready in Compass

✳️ Transcripts

Introduction and Data Plane Overview

  • Hello everyone.
  • Uh thank you very much for coming today.
  • I’m going to talk to you about the latest and greatest in the multimodal data plane.
  • Uh but before I explain what that is, I’ll quickly introduce myself.
  • So I’m Ted.
  • Uh I co-lead our data plane group.
  • Broadly speaking, we look after all of the core data platform features within Foundry.
  • So, uh, a lot of you are probably going to be thinking off the bat, what is a multimodal data plane?
  • It's our open data and compute architecture that allows you to integrate, manage, and ultimately extract value from your data.
  • It supports structured data, unstructured data, media, time series, and streams.
  • And at the crux of it all, uh, we want to provide this without requiring you to do lengthy migrations or expensive replatforming efforts.
  • Everything should just work seamlessly.
  • In a single phrase, it’s the foundation of our AI platform.
  • So, in more concrete terms, uh, let me give you an overview of some of the products that fall into MMDP.
  • You can broadly break the data platform down into three pillars.
  • In the core, we have features that run across the entirety of Foundry and AIP.
  • These are based around orchestration and governance.
  • They provide functionalities like data lineage, our enterprise pipeline management tooling, and of course, our best-in-class security model.
  • They’re absolutely core to the development of MMDP.
  • And whether we’re building interoperability with external platforms or extending the feature set of our internal data platform, these are always critical and we never compromise.
  • On the left, you can see we have storage and the theme here is interoperability.
  • Whether data is stored externally or whether it’s stored within the platform, all of our products should work seamlessly and there should be no gaps.
  • The most popular product which really encompasses this to date is virtual tables which allow you to store references to externally stored data sets.
  • These could be in platforms like Databricks, BigQuery or Snowflake.
  • All of our analytics pipeline and ingestion into ontology features work seamlessly with these as if they were natively hosted data sets.
  • A more recent project is our undertaking of providing native Iceberg tables.
  • The key motivator for this was to unlock the interoperability features that Iceberg brings as an open standard.
  • But on top of this, we get a bunch of extra really cool features in the platform and some pretty nice performance upgrades too.
  • So these two interoperability plays are on top of our core multimodal data architecture which we get from media sets, streams and data sets all of which can power multimodal models in the platform.
  • Finally on the right here um I have our compute pillar and the theme here is flexibility in the platform.
  • We allow you to choose the most appropriate compute for the job.
  • So for real-time processing you can use Flink on top of streams.
  • For low latency hyperefficient batch transformations you can use our single node compute engine offerings.
  • And then for really big data you can utilize Spark.
  • On top of these in-platform offerings, we allow you to push down compute to external systems through our federated compute.
  • A more recent area of investment has been SQL.
  • We undertook this as a way of exposing Foundry compute in a way that was more familiar to more users, helping make the whole platform more accessible.
  • Today, I’m going to do a real deep dive on two of these, federated compute and SQL.
  • We’re going to start with SQL.

SQL Console Introduction

  • Our investments in this space run really deep in the platform, but the primary thing that you’re going to notice will probably be the new SQL console.
  • It looks like a pretty standard SQL editor, and it’s very lightweight and minimal.
  • It allows you to run queries on both data sets and ontologies.
  • It allows you to create and update tables.
  • And it allows you to save these analyses in worksheets in your compass file tree.
  • So the question that you might ask off the bat is why do we care so much about investing in SQL?

Benefits of SQL Investment

  • Well, SQL truly is the lingua franca of data by allowing users to work with ontologies and data sets with SQL.
  • We fully democratize access.
  • There’s no Palantir syntax, no special jargon, no custom tool chains required.
  • I can’t count the number of times where I’ve been on site with a customer or I’ve spoken to a guardian at previous DevCons and they've told me that an org of data scientists and data engineers have an entirely SQL-based tool chain and when they have to migrate to Foundry, this causes friction.
  • These are the kind of barriers that we’re looking to remove and allow you to extract as much value as possible from AIP.
  • Another major benefit of SQL is that it’s fast by design.
  • Its simplicity allows editors to stay lightweight and snappy and it allows engines to perform very very well.
  • To fully leverage this, we've actually built out two new query engines, one for data sets and one for ontologies.
  • So in the data set case, we have Furnace and this decouples the SQL query syntax from the underlying execution engine.
  • This has two main benefits.
  • The first is that we can use heuristics to route to the most appropriate compute possible.
  • But it also means that we can pick up on the latest and greatest in query engine technology without breaking user queries.
  • The final and perhaps the most exciting benefit of adopting SQL is that it's perfectly suited to an agentic world.
  • LLMs have long proved that they're really good at writing SQL queries.
  • This will come as no surprise to anyone who’s used AIF or the Foundry MCP server.
  • As the SQL tool gets used incredibly extensively in both, as we invest in the actual behaviors that we support through SQL, we’re unlocking more functionality in a syntax that is very well supported by these models.

Beyond the Console: Integrations

  • So, a SQL editor, it’s not particularly revolutionary, right?
  • Well, that’s right.
  • And this is why I said our SQL investments run deep.
  • The SQL console is just an interface, but it’s not the reason you should be getting excited about SQL in Foundry and AIP.
  • This is a reflection on what we’ve known here at Palantir for a long time, which is that a SQL editor on its own is a useful tool, but it doesn't provide value in production workflows.
  • What you should instead be excited by is the integrations that we're building out.
  • So yes, you can analyze ontologies and you can analyze data sets, but more importantly, you can power agents.
  • As I already said, it’s used extensively by AFDE and in Foundry MCP today.
  • And as we add more and more functionalities, it’s going to allow agents to interact with the platform in an idiomatic way.
  • We allow you to transform data in place for the first time in Foundry, which unlocks new rapid prototyping flows.
  • And finally, we shipped SQL functions, which allow you to define functions on top of your ontology in SQL, providing a new and incredibly easy way to build with ontology.

Demo: SQL Features in Action

  • I have a short demo that runs you through these suite of features.
  • Um, so as you would expect, we’re able to run some queries here which look at data sets in our platform.
  • We can load up a worksheet that has some saved analysis and run through these queries.
  • As I said, these worksheets encompass resources, so they can be shared between you and your colleagues.
  • We can then take advantage of an all new feature, the ability to run a create table statement and materialize the results of a query.
  • This was never possible in an interactive interface like this before, but you’ll see that under the hood, it still just runs a build, meaning it’s fully integrated with our orchestration and governance, and things like data lineage work exactly as you would expect.
  • In the video, I go on to build an object type off of the table that I just created.
  • The reason I’m doing this is so I can show off some of the ontology SQL functionalities, but I’ve sped it up here for convenience.
  • I wanted to keep the whole video without any breaks, though.
  • Just so I could show you how easy it is to go from analyzing data sets, transforming your data into the shape to ingest into ontology, bringing it through to the ontology and then going on into the app building, AI, and ultimately value creation layer of the platform.
  • So you can see from ontology manager we have the same SQL console interface only this time we have ontology resources in our explorer.
  • I can once again run interactive queries on top of this.
  • You can see here that I do a basic filter just to get a single row from the object type.
  • And then when I’m happy with my analysis, I can save it in a worksheet.
  • There’s actually two states of the worksheets.
  • There’s a draft state which is private and autosaves and allows you to iterate on some analysis.
  • And then when you’re happy with it, you can save it as a fully fledged file.
  • Once saved, we can publish a function.
  • And it literally takes one click, a few fields, and you’re ready to go.
  • This function can then be used from workshops, actions, whatever you want.
  • I think that this is particularly exciting as it’s one of the fastest ways to get through to that app building layer of the platform.
  • So hopefully based on what you’ve seen, you’re excited.

Future SQL Integrations

  • But I want to tell you that we’re far from done.
  • As I said, we want to make SQL a fundamental building block in the platform that allows you to leverage the full AIP ecosystem.
  • As such, we have many more integrations to build.
  • Firstly, we want to build a workshop integration to allow you to leverage SQL queries and SQL analysis from AIP applications.
  • We want to add support for SQL object sets so that you can maintain the semantic layer of the ontology on top of the output of your queries.
  • We want to allow you to embed these analyses in reports so that you can create artifacts that are more meaningful than the raw data alone.
  • We want to allow you to store procedures so that you can modularize pieces of logic and reuse them in multiple places.
  • We want SQL enterprise pipelining capabilities to allow you to take these analyses and scale them into something for production workflows.
  • And finally, we want integrations in our data catalog to give you a new way of exploring your data.

Deep Dive: Federated Compute

  • The next thing I wanted to talk about was federated compute.

AIP Architecture and Compute

  • I really like this diagram as I think it's a good representation of how a lot of Palantirians think about AIP.
  • At the top layer, we have the ontology which forms the foundation of our value and decision-making layer.
  • That's what's powering agents, automations, and applications.
  • Below that, in the core, we have governance and orchestration.
  • And this is what allows you to deploy and to run these applications at scale.
  • And then in the foundation we have the multimodal data plane.
  • This layer exists to get your data into the ontology as quickly and as easily as possible.
  • We recognize that to do that it may mean taking advantage of existing resources in your organization, fitting seamlessly into what data architecture you already have set up, or just working around existing barriers that exist in messy real world situations.
  • So to power the multimodal data plane we have both in-platform compute and federated compute.

Evolution of Compute Options

  • A few years ago, our data platforming capabilities looked something like this.
  • You had a Foundry data set input.
  • You could define a transformation on that which would run in Spark and you would output to another Foundry data set.
  • Now, Spark was a pretty reasonable choice here.
  • It can run arbitrary data scale.
  • It will work for a 1 megabyte table and it will work for a one petabyte table.
  • However, a jack of all trades is a master of none.
  • And the compute landscape has shifted significantly since Spark was originally developed.
  • In recognition of this fact, we invested heavily in compute optionality in the platform.
  • We added support for single node compute engines like Polars, DuckDB, and Data Fusion.
  • This was in response to the fact that we now run in a cloud-native way instead of running on commodity hardware where we need to share the load of compute between a bunch of machines together.
  • On top of that, advancements in GPU and CPU architectures have meant that the parallel processing that we can leverage on a single machine has greatly improved.
  • As such, these single node engines are able to deliver incredible performance and efficiency characteristics.
  • Seeing the adoption across the fleet since we initially rolled this out has been amazing and it’s been particularly cool to see the latency critical workflows that were previously not possible with batch compute but can now be delivered with these single node engines.
  • If you would like to hear more detail about these offerings in particular, we actually did a deep dive on this last DevCon which you can find on YouTube.

Federated Compute and Snowflake Integration

  • We built on this flexibility that we established with the options to choose single node compute engines and we extended the virtual table primitive that existed in the platform.
  • So as I already mentioned, virtual tables allow you to essentially store references to externally stored data and in the platform they look just like a regular data set.
  • Well, what we then did is we allowed you to define a transform on that data just as you would be able to do with any other data set.
  • But what was special is that the compute would run in the external source system.
  • This shows our commitment to two things.
  • Firstly, it shows that you’re able to leverage whatever compute makes most sense for your organization and use case.
  • And secondly, it shows the commitment to fit with existing data architectures.
  • There is one key problem with this architecture though and it’s that the data and the compute must be collocated and federated.
  • While in some cases this makes perfect sense, it does have the downside that it means you’re tightly coupled to whatever external system you choose.
  • A more recent development is our deep integration with Snowflake allowing you to leverage Snowflake compute on top of Foundry defined transforms in the platform.
  • So as you can see here with a Foundry Iceberg input table and a Foundry Iceberg output table you now have the choice of single node compute, Spark, or Snowflake.
  • What’s particularly special about this is that you can continue to leverage the orchestration, governance, and pipeline authoring features of Foundry while leveraging Snowflake’s compute.

Demo: Snowflake Federated Compute

  • For me, this is an incredibly energizing workflow to deliver on as it really hits at the crux of what MMDP is trying to achieve, that is a truly interoperable data plane where there are no caveats, it’s low friction, and it’s a real first-class offering.
  • I have a short video that shows you just how first class this is.
  • So from a table you can go to create a Pipeline Builder pipeline as you normally would.
  • And at the bottom where you would normally select between Spark or single node you can also select to choose external. Here we set it up with an existing Snowflake connection.
  • And this connection just tells the pipeline where the compute should run.
  • From there you get a completely normal pipeline authoring experience in Pipeline Builder.
  • And this to me is what is so special about this offering.
  • Pipeline Builder in my opinion is one of our most impressive applications.
  • It’s so strongly typed.
  • It’s so feature-rich and there’s really nothing else like it on the market.
  • To be able to take advantage of this in your broader data architecture is a humongous win.
  • In this case, we’re just defining a really simple pipeline that filters down to a single row.
  • And as you can see, all of the normal pipeline capabilities like preview work exactly as they would with any other Pipeline Builder pipeline.
  • When we go ahead and run this, you’ll be able to see that the compute is running in Snowflake.
  • And once that underlying query gets kicked off, we actually get a nice button that can take us to the external platform and actually show us where that compute ran.

Conclusion: Data Plane’s Role in AIP

  • So just to round out the discussion, I wanted to frame how you should think about data plane in the broader context of AIP today.
  • Over the next two days, you’re going to see a ton of really cool demos and products that make use of AI and agentic flows.
  • But the success of all of these depends on your data, your ability to scale, and your ability to build.
  • That is what we are here to deliver.