拆解 Palantir MMDP:如何設計一個不搬資料的資料平台

拆解 Palantir MMDP:如何設計一個不搬資料的資料平台 (圖說:探索世界、成為自己(being)的過程中,確保自由,確保底層安全網機制,確保共同介面互通,一邊調整參數(metacognition),一邊放手迎風奔跑。說是這樣說啦,但跑跑還是需要吃吃,吃吃永遠不能放掉。2016 攝於馬德里,據說是世界最早的餐廳,確保有帶魔法小卡,確保溝通介面互通,剩下交給廚房,然後在晚上十一點餐廳關門前五分鐘,舒服地離開餐廳。圖片來源:Ernest。)

MMDP = Multimodal Data Plane

Palantir DevCon 5 會議上,Data Plane Group co-lead Ted 介紹 MMDP,我充滿好奇地想練習拆解看看。

  • 一來是功能清單很長,我滿想對比自己現有產品規劃,
  • 二來因為他們的設計取捨反映了一個核心問題:當企業資料散在各處,該搬資料還是搬計算?(無關對錯,但想對照思路與參數。)

⌬ 多功能清單

  • 表面上 Palantir MMDP 是一個資料平台,支援結構化、非結構化、媒體、時序和串流資料。
  • SQL Console 可以查詢資料集和 ontology,
  • Pipeline Builder 可以建工作流,
  • 有 Spark 跑大型運算,加上一套治理和安全模型。
  • Ted 稱它是「我們 AI 平台的基礎」。
  • SQL Console 支援將分析存成工作表,有草稿(私人自動儲存)和正式發布(可分享)兩種狀態。

如果只看到這裡,跟其他資料平台的差異不算明顯。有趣的事情總需要深挖。

⌬ 資料不搬家,計算去找它

繼續抽絲剝繭,想要留意的是如何做出選擇,設計選擇。(設計系統需要減法,而不是加法。)

Virtual Tables 不走資料複製,而是引用:

  • 在 Foundry 裡建立一個指向 Snowflake、BigQuery 或 Databricks 資料的參照,
  • 所有 analytics pipelineontology ingestion 都能直接使用,就像原生資料集一樣。
  • 搭配原生 Apache Iceberg 表格作為開放標準,不只解鎖了跨平台互通,還帶來效能提升。
  • 資料不需要搬家。

Furnace 查詢引擎將 SQL 語法和底層執行引擎解耦:

  • 好處有二:系統能用啟發式演算法把查詢導向最合適的運算資源,底層引擎升級時也不會破壞既有查詢。
  • SQL Console 裡的 Create Table 功能可以把查詢結果實體化成表格,資料血緣自動繼承。
  • 從查詢到實體化再到建立 ontology object type,整條路徑在一個介面裡直接走完。

(這讓我想到在設計工作流程系統的時候,關鍵通常不是表象功能,而是互通介面。當介面穩定,表象實作需要可以隨著時代與市場需求隨時抽換。Furnace 的解耦設計就是相類似的思路。)

(解耦設計,通常能夠創造舒服,就像我喜歡前幾天 Manny 用舒服的角度去詮釋 AK LLM Wiki PKM。)(又在偷告白 XDD)

單節點引擎(Polars、DuckDB、Data Fusion)的引入:

  • Ted 說 Spark 是「樣樣都說會,樣樣不精通」。
  • 雲端原生環境裡,GPU 和 CPU 架構的進步讓單機並行處理能力大幅提升,不再需要每件事都動用分散式運算。
  • Ted 提到這些引擎在客戶端的採用成長頗驚人,特別是開啟了以前批次計算做不到的低延遲工作流。

⌬ 治理不外包

MMDP 最底層的設計邏輯是:治理和編排永遠不能放掉,但計算和儲存的位置可以保持彈性。

  • Federated compute 是這個邏輯的極致展現。
  • 在 Foundry 定義轉換邏輯,但運算推到 Snowflake 執行。
  • Pipeline Builder 的強型別和預覽功能照常運作,資料血緣照常追蹤,治理不中斷。
  • Ted 展示中特別秀了一個細節:跑完運算後,Foundry 介面上有個按鈕可以直接跳到 Snowflake 監看執行狀態。
  • 這個細節代表的是:雖然計算跑在外面,但使用者體驗保持一致。

(對照我們當年參與制定藍牙 FTMS 規格時學到的:真正的互通性,是在保持各自實作自由的同時,確保共同介面的一致性。Federated compute 也是這個思路,採用哪些運算引擎可以各自,但治理介面是共通的、溝通過的、定義的。)

最後將 MMDP 組裝起來:

  • 讓 AI 應用能快速取得所需資料,不管資料住在哪裡。
  • AIP 架構裡,底層是 MMDP 負責資料,中層是治理與編排負責規模化,上層是 ontology 驅動決策和自主式應用。
  • Ted 最後說,所有 AI/agentic 流程的成功,都取決於資料的可用性、規模化能力和建構能力

MMDP 不需要成為唯一的資料平台,而是成為各種平台之間的治理層和編排層。這種安排,也許是值得觀察的設計決定。

MMDP 最底層設計邏輯:治理和編排永遠不能放掉。就如每天順手按個「讚」或「愛心」永遠不能放掉,是相同的底層邏輯。一天沒按,就覺得放手奔跑少了風,而順手按一下,然後舒服地離開。


✳️ 延伸閱讀


✳️ 知識圖譜

(更多關於知識圖譜…)

graph LR
    subgraph Storage_Layer [Storage Layer]
        Iceberg["Apache Iceberg"]
        VirtualTables["Virtual Tables"]
        MediaSets["Media Sets"]
    end

    subgraph Compute_Layer [Compute Layer]
        Furnace["Furnace Engine"]
        Spark["Apache Spark"]
        SingleNode["Single Node Engines: Polars, DuckDB"]
        ExternalCompute["External Compute: Snowflake, BigQuery"]
    end

    subgraph Management [Management and Governance]
        Governance["Governance and Security"]
        Lineage["Data Lineage"]
        Orchestration["Orchestration"]
    end

    subgraph Application_Layer [Application and AI Layer]
        Ontology["Ontology: Semantic Layer"]
        AIP_Agents["AIP Agents"]
        Workshop["Workshop Apps"]
    end

    VirtualTables -- "references data in" --> ExternalCompute
    Iceberg -- "enables interoperability for" --> Storage_Layer
    Furnace -- "routes queries to" --> SingleNode
    Furnace -- "routes queries to" --> Spark
    PipelineBuilder["Pipeline Builder"] -- "defines logic for" --> ExternalCompute
    Storage_Layer -- "feeds data into" --> Ontology
    Management -- "enforces rules on" --> Compute_Layer
    Management -- "enforces rules on" --> Storage_Layer
    Ontology -- "provides context for" --> AIP_Agents
    SQL_Console["SQL Console"] -- "interacts with" --> Furnace

    style Iceberg fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style VirtualTables fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style MediaSets fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style Furnace fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style Spark fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style SingleNode fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style ExternalCompute fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style AIP_Agents fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style Workshop fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style SQL_Console fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff
    style PipelineBuilder fill:#0080FF,stroke:#333,stroke-width:2px,color:#fff

    style Storage_Layer fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Compute_Layer fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Management fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Application_Layer fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Ontology fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Governance fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Lineage fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
    style Orchestration fill:#FF8000,stroke:#333,stroke-width:2px,color:#fff
sequenceDiagram
    participant User as Developer
    participant Console as SQL Console
    participant Furnace as Furnace Engine
    participant Snowflake as Snowflake (External)
    participant Governance as Governance and Lineage
    participant Storage as Foundry Storage (Iceberg)

    User->>Console: Write SQL query (Create Table)
    Console->>Furnace: Submit Query
    Furnace->>Governance: Check Permissions and Lineage
    Governance-->>Furnace: Access Granted
    Furnace->>Snowflake: Execute Federated Compute
    Snowflake-->>Furnace: Return Results/Status
    Furnace->>Storage: Materialize Results as Iceberg
    Storage-->>User: Table Ready in Compass

✳️ 逐字稿與筆記

介紹與 Data Plane 概覽

  • Hello everyone.
    大家好。
  • Uh thank you very much for coming today.
    非常感謝大家今天的到來。
  • I’m going to talk to you about the latest and greatest in the multimodal data plane.
    我要跟大家談談 Multimodal Data Plane 的最新進展。
  • Uh but before I explain what that is, I’ll quickly introduce myself.
    但在我解釋它之前,先快速自我介紹一下。
  • So I’m Ted.
    我是 Ted。
  • Uh I co-lead our data plane group.
    我共同帶領我們的 Data Plane Group。
  • Broadly speaking, we look after all of the core data platform features within Foundry.
    概括來說,我們負責 Foundry 內所有核心資料平台功能。
  • So, uh, a lot of you are probably going to be thinking off the bat, what is a multimodal data plane?
    我想在座很多人大概會直接想問,什麼是 multimodal data plane?
  • It’s our open data and compute architecture that allows you to integrate, manage, and ultimately extract value from your data.
    它是我們的開放式資料與運算架構,讓你能整合、管理,並最終從資料中提取價值
  • It supports structured data, unstructured data, media, time series, and streams.
    它支援結構化資料、非結構化資料、媒體、時序資料和串流資料
  • And at the crux of it all, uh, we want to provide this without requiring you to do lengthy migrations or expensive replatforming efforts.
    而最關鍵的是,我們希望在不需要你進行冗長的遷移或昂貴的重新平台化的情況下提供這些功能
  • Everything should just work seamlessly.
    一切都應該無縫運作
  • In a single phrase, it’s the foundation of our AI platform.
    用一句話來說,它是我們 AI 平台的基礎。
  • So, in more concrete terms, uh, let me give you an overview of some of the products that fall into MMDP.
    更具體地說,讓我給大家概覽一下 MMDP 旗下的產品。
  • You can broadly break the data platform down into three pillars.
    你可以概略地將資料平台拆解為三大支柱。
  • In the core, we have features that run across the entirety of Foundry and AIP.
    在核心層,我們有橫跨整個 FoundryAIP 的功能。
  • These are based around orchestration and governance.
    這些功能圍繞著編排治理而建構。
  • They provide functionalities like data lineage, our enterprise pipeline management tooling, and of course, our best-in-class security model.
    它們提供資料血緣、企業級管線管理工具,以及我們業界領先的安全模型等功能。
  • They’re absolutely core to the development of MMDP.
    它們是 MMDP 發展的絕對核心。
  • And whether we’re building interoperability with external platforms or extending the feature set of our internal data platform, these are always critical and we never compromise.
    不論我們是在建構與外部平台的互通性,還是擴展內部資料平台的功能集,這些都是至關重要的,我們絕不妥協。
  • On the left, you can see we have storage and the theme here is interoperability.
    左邊可以看到我們有儲存層,這裡的主題是互通性。
  • Whether data is stored externally or whether it’s stored within the platform, all of our products should work seamlessly and there should be no gaps.
    不論資料是存在外部還是存在平台內部,我們所有的產品都應該無縫運作,不應該有任何落差。
  • The most popular product which really encompasses this to date is virtual tables which allow you to store references to externally stored data sets.
    到目前為止最能反映這一點的產品是虛擬表格(Virtual Tables),它讓你可以儲存對外部資料集的參照。
  • These could be in platforms like Databricks, BigQuery or Snowflake.
    這些可以是在 Databricks、BigQuery 或 Snowflake 等平台上。
  • All of our analytics pipeline and ingestion into ontology features work seamlessly with these as if they were natively hosted data sets.
    我們所有的分析管線和 ontology 匯入功能都能與這些虛擬表格無縫配合,就像它們是原生託管的資料集一樣。
  • A more recent project is our undertaking of providing native Iceberg tables.
    一個更近期的專案是我們提供原生 Iceberg 表格的工作。
  • The key motivator for this was to unlock the interoperability features that Iceberg brings as an open standard.
    這項工作的主要動機是解鎖 Iceberg 作為開放標準所帶來的互通性功能。
  • But on top of this, we get a bunch of extra really cool features in the platform and some pretty nice performance upgrades too.
    除此之外,我們還在平台上獲得了一系列額外的很酷功能和相當不錯的效能提升。
  • So these two interoperability plays are on top of our core multimodal data architecture which we get from media sets, streams and data sets all of which can power multimodal models in the platform.
    這兩項互通性佈局建立在我們的核心多模態資料架構之上,這個架構來自媒體集、串流和資料集,它們都能驅動平台上的多模態模型。
  • Finally on the right here um I have our compute pillar and the theme here is flexibility in the platform.
    最後在右邊,我們有運算支柱,這裡的主題是平台中的彈性。
  • We allow you to choose the most appropriate compute for the job.
    我們讓你可以為工作選擇最合適的運算方式。
  • So for real-time processing you can use Flink on top of streams.
    對於即時處理,你可以在串流上使用 Flink。
  • For low latency hyperefficient batch transformations you can use our single node compute engine offerings.
    對於低延遲、超高效的批次轉換,你可以使用我們的單節點運算引擎。
  • And then for really big data you can utilize Spark.
    對於真正的大數據,你可以使用 Spark。
  • On top of these in-platform offerings, we allow you to push down compute to external systems through our federated compute.
    在這些平台內建選項之上,我們還讓你可以透過聯邦運算將運算推送到外部系統。
  • A more recent area of investment has been SQL.
    SQL 是我們近期的一個投資重點。
  • We undertook this as a way of exposing Foundry compute in a way that was more familiar to more users, helping make the whole platform more accessible.
    我們這樣做是為了以更多使用者更熟悉的方式來暴露 Foundry 的運算能力,讓整個平台更容易使用。
  • Today, I’m going to do a real deep dive on two of these, federated compute and SQL.
    今天,我要深入探討其中兩個:聯邦運算和 SQL。
  • We’re going to start with SQL.
    我們先從 SQL 開始。

SQL 控制台介紹

  • Our investments in this space run really deep in the platform, but the primary thing that you’re going to notice will probably be the new SQL console.
    我們在這個領域的投資在平台中非常深入,但你最先注意到的大概會是全新的 SQL 控制台。
  • It looks like a pretty standard SQL editor, and it’s very lightweight and minimal.
    它看起來像一個相當標準的 SQL 編輯器,非常輕量且精簡。
  • It allows you to run queries on both data sets and ontologies.
    它讓你可以對資料集和 ontology 執行查詢。
  • It allows you to create and update tables.
    它讓你可以建立和更新表格。
  • And it allows you to save these analyses in worksheets in your compass file tree.
    它讓你可以把這些分析存成工作表,放在你的 Compass 檔案樹中。
  • So the question that you might ask off the bat is why do we care so much about investing in SQL?
    所以你可能馬上會問的問題是,為什麼我們這麼重視投資 SQL?

SQL 投資的效益

  • Well, SQL truly is the lingua franca of data by allowing users to work with ontologies and data sets with SQL.
    SQL 真的是資料界的通用語言(lingua franca),透過讓使用者用 SQL 來操作 ontology 和資料集
  • We fully democratize access.
    我們完全實現了存取的民主化
  • There’s no Palantir syntax, no special jargon, no custom tool chains required.
    不需要 Palantir 特有的語法,不需要特殊術語,不需要客製化的工具鏈。
  • I can’t count the number of times where I’ve been on site with a customer or I’ve spoken to a guardian at previous DevCons and they’ve told me that an org of data scientists and data engineers have an entirely SQL-based tool chain and when they have to migrate to Foundry, this causes friction.
    我數不清有多少次在客戶現場或在之前的 DevCon 上與使用者交流時,他們告訴我他們的資料科學家和資料工程師團隊有一套完全基於 SQL 的工具鏈,當他們必須遷移到 Foundry 時,這就造成了摩擦
  • These are the kind of barriers that we’re looking to remove and allow you to extract as much value as possible from AIP.
    這些就是我們想要移除的障礙,讓你能從 AIP 中提取盡可能多的價值。
  • Another major benefit of SQL is that it’s fast by design.
    SQL 的另一個主要優點是它在設計上就是快速的。
  • Its simplicity allows editors to stay lightweight and snappy and it allows engines to perform very very well.
    它的簡潔性讓編輯器保持輕量且靈敏,也讓引擎能表現得非常出色。
  • To fully leverage this, we’ve actually built out two new query engines, one for data sets and one for ontologies.
    為了充分利用這一點,我們實際上建構了兩個新的查詢引擎,一個用於資料集,一個用於 ontology
  • So in the data set case, we have Furnace and this decouples the SQL query syntax from the underlying execution engine.
    在資料集的情境中,我們有 Furnace,它將 SQL 查詢語法與底層執行引擎解耦。
  • This has two main benefits.
    這有兩個主要好處。
  • The first is that we can use heuristics to route to the most appropriate compute possible.
    第一是我們可以使用啟發式演算法來路由到最合適的運算資源。
  • But it also means that we can pick up on the latest and greatest in query engine technology without breaking user queries.
    但這也意味著我們可以採用最新、最先進的查詢引擎技術,而不會破壞使用者的查詢。
  • The final and perhaps the most exciting benefit of adopting SQL is that it’s perfectly suited to an agentic world.
    採用 SQL 最後也可能是最令人興奮的好處,是它完美適合自主式世界(agentic world)
  • LLMs have long proved that they’re really good at writing SQL queries.
    LLM 早就證明了它們非常擅長撰寫 SQL 查詢
  • This will come as no surprise to anyone who’s used AIF or the Foundry MCP server.
    對任何使用過 AIF 或 Foundry MCP 伺服器的人來說,這不會令人意外。
  • As the SQL tool gets used incredibly extensively in both, as we invest in the actual behaviors that we support through SQL, we’re unlocking more functionality in a syntax that is very well supported by these models.
    隨著 SQL 工具在兩者中被大量使用,隨著我們投資 SQL 所支援的實際行為,我們正在以這些模型非常支持的語法解鎖更多功能。

超越控制台:整合功能

  • So, a SQL editor, it’s not particularly revolutionary, right?
    一個 SQL 編輯器,並不算特別革命性的,對吧?
  • Well, that’s right.
    確實如此。
  • And this is why I said our SQL investments run deep.
    這就是為什麼我說我們的 SQL 投資是深入的。
  • The SQL console is just an interface, but it’s not the reason you should be getting excited about SQL in Foundry and AIP.
    SQL 控制台只是一個介面,但它不是你應該對 Foundry 和 AIP 中的 SQL 感到興奮的原因。
  • This is a reflection on what we’ve known here at Palantir for a long time, which is that a SQL editor on its own is a useful tool, but it doesn’t provide value in production workflows.
    這反映了我們在 Palantir 長期以來的認知:一個獨立的 SQL 編輯器是有用的工具,但它無法在生產工作流中提供價值
  • What you should instead be excited by is the integrations that we’re building out.
    你應該感到興奮的,是我們正在建構的整合功能
  • So yes, you can analyze ontologies and you can analyze data sets, but more importantly, you can power agents.
    是的,你可以分析 ontology,你可以分析資料集,但更重要的是,你可以驅動 agent。
  • As I already said, it’s used extensively by AFDE and in Foundry MCP today.
    正如我已經提到的,它在 AFDE 和 Foundry MCP 中被廣泛使用。
  • And as we add more and more functionalities, it’s going to allow agents to interact with the platform in an idiomatic way.
    隨著我們加入越來越多的功能,它將讓 agent 能以慣用的方式與平台互動。
  • We allow you to transform data in place for the first time in Foundry, which unlocks new rapid prototyping flows.
    我們首次在 Foundry 中允許你就地轉換資料,這解鎖了新的快速原型開發流程。
  • And finally, we shipped SQL functions, which allow you to define functions on top of your ontology in SQL, providing a new and incredibly easy way to build with ontology.
    最後,我們推出了 SQL 函式(SQL functions),讓你可以用 SQL 在 ontology 之上定義函式,提供了一種全新且非常簡便的方式來基於 ontology 進行開發。

示範:SQL 功能實際操作

  • I have a short demo that runs you through these suite of features.
    我有一個簡短的示範帶大家看看這些功能。
  • Um, so as you would expect, we’re able to run some queries here which look at data sets in our platform.
    如你所預期的,我們可以在這裡執行一些查詢來查看平台中的資料集。
  • We can load up a worksheet that has some saved analysis and run through these queries.
    我們可以載入一個有已儲存分析的工作表,然後執行這些查詢。
  • As I said, these worksheets encompass resources, so they can be shared between you and your colleagues.
    如我所說,這些工作表包含資源,所以可以在你和同事之間分享。
  • We can then take advantage of an all new feature, the ability to run a create table statement and materialize the results of a query.
    然後我們可以利用一個全新功能:執行 create table 語句並將查詢結果實體化(materialize)
  • This was never possible in an interactive interface like this before, but you’ll see that under the hood, it still just runs a build, meaning it’s fully integrated with our orchestration and governance, and things like data lineage work exactly as you would expect.
    這在以前的互動式介面中從未實現過,但你會看到在底層它仍然只是執行一個建構作業,這意味著它與我們的編排和治理完全整合,資料血緣等功能也完全如你所期待的運作。
  • In the video, I go on to build an object type off of the table that I just created.
    在影片中,我接著從剛才建立的表格上建構一個 object type
  • The reason I’m doing this is so I can show off some of the ontology SQL functionalities, but I’ve sped it up here for convenience.
    我這麼做是為了展示一些 ontology SQL 功能,但為了方便我加速了這段。
  • I wanted to keep the whole video without any breaks, though.
    不過我想要讓整支影片沒有任何中斷。
  • Just so I could show you how easy it is to go from analyzing data sets, transforming your data into the shape to ingest into ontology, bringing it through to the ontology and then going on into the app building, AI, and ultimately value creation layer of the platform.
    就是為了讓你看到從分析資料集、將資料轉換成匯入 ontology 的格式、帶入 ontology,然後進入應用程式建構、AI,最終到平台的價值創造層,整個過程有多簡單
  • So you can see from ontology manager we have the same SQL console interface only this time we have ontology resources in our explorer.
    你可以看到在 ontology manager 中我們有相同的 SQL 控制台介面,只是這次我們的瀏覽器中有 ontology 資源。
  • I can once again run interactive queries on top of this.
    我可以再次在上面執行互動式查詢。
  • You can see here that I do a basic filter just to get a single row from the object type.
    你可以看到我做了一個基本的篩選,只是為了從 object type 中取得一筆資料。
  • And then when I’m happy with my analysis, I can save it in a worksheet.
    然後當我對分析結果滿意時,我可以將它存成工作表。
  • There’s actually two states of the worksheets.
    工作表實際上有兩種狀態。
  • There’s a draft state which is private and autosaves and allows you to iterate on some analysis.
    有一個草稿狀態,它是私有的、自動儲存的,讓你可以迭代某些分析。
  • And then when you’re happy with it, you can save it as a fully fledged file.
    然後當你滿意後,你可以將它存成一個正式的檔案。
  • Once saved, we can publish a function.
    一旦儲存,我們就可以發布一個函式。
  • And it literally takes one click, a few fields, and you’re ready to go.
    只需要一個點擊、幾個欄位,就準備好了。
  • This function can then be used from workshops, actions, whatever you want.
    這個函式接著可以從 Workshop、Actions 或任何你想要的地方使用。
  • I think that this is particularly exciting as it’s one of the fastest ways to get through to that app building layer of the platform.
    我覺得這特別令人興奮,因為它是到達平台應用程式建構層最快的方式之一。
  • So hopefully based on what you’ve seen, you’re excited.
    希望根據你所看到的,你也感到興奮。

未來的 SQL 整合

  • But I want to tell you that we’re far from done.
    但我想告訴你,我們還遠遠沒有完成。
  • As I said, we want to make SQL a fundamental building block in the platform that allows you to leverage the full AIP ecosystem.
    正如我所說,我們想要讓 SQL 成為平台中的基礎構建模塊,讓你可以善用完整的 AIP 生態系統。
  • As such, we have many more integrations to build.
    因此,我們還有更多整合要建構。
  • Firstly, we want to build a workshop integration to allow you to leverage SQL queries and SQL analysis from AIP applications.
    首先,我們想建構 Workshop 整合,讓你可以從 AIP 應用程式中使用 SQL 查詢和 SQL 分析。
  • We want to add support for SQL object sets so that you can maintain the semantic layer of the ontology on top of the output of your queries.
    我們想加入 SQL 物件集的支援,讓你可以在查詢輸出之上維持 ontology 的語義層
  • We want to allow you to embed these analyses in reports so that you can create artifacts that are more meaningful than the raw data alone.
    我們想讓你可以將這些分析嵌入報表中,讓你可以建立比原始資料本身更有意義的成品。
  • We want to allow you to store procedures so that you can modularize pieces of logic and reuse them in multiple places.
    我們想讓你可以儲存預存程序,讓你可以將邏輯片段模組化並在多處重複使用。
  • We want SQL enterprise pipelining capabilities to allow you to take these analyses and scale them into something for production workflows.
    我們想要 SQL 企業級管線功能,讓你可以將這些分析擴展為生產工作流。
  • And finally, we want integrations in our data catalog to give you a new way of exploring your data.
    最後,我們想要在資料目錄中整合,給你一種探索資料的新方式。

深入探討:聯邦運算

  • The next thing I wanted to talk about was federated compute.
    接下來我想談的是聯邦運算(Federated Compute)

AIP 架構與運算

  • I really like this diagram as I think it’s a good representation of how a lot of Palantirians think about AIP.
    我非常喜歡這張圖,因為我覺得它很好地呈現了許多 Palantir 人對 AIP 的思考方式
  • At the top layer, we have the ontology which forms the foundation of our value and decision-making layer.
    在最上層,我們有 ontology,它構成了我們的價值和決策層的基礎
  • That’s what’s powering agents, automations, and applications.
    這就是驅動 agent、自動化和應用程式的東西
  • Below that, in the core, we have governance and orchestration.
    在它之下的核心層,我們有治理和編排
  • And this is what allows you to deploy and to run these applications at scale.
    這就是讓你能大規模部署和執行這些應用程式的東西。
  • And then in the foundation we have the multimodal data plane.
    然後在基礎層我們有 multimodal data plane
  • This layer exists to get your data into the ontology as quickly and as easily as possible.
    這一層的存在是為了盡可能快速且輕鬆地將你的資料送入 ontology
  • We recognize that to do that it may mean taking advantage of existing resources in your organization, fitting seamlessly into what data architecture you already have set up, or just working around existing barriers that exist in messy real world situations.
    我們認知到要做到這一點,可能意味著利用你組織中現有的資源,無縫融入你已經建立的資料架構,或者繞過在混亂的現實世界中存在的障礙。
  • So to power the multimodal data plane we have both in-platform compute and federated compute.
    因此,為了驅動 multimodal data plane,我們同時有平台內運算和聯邦運算。

運算選項的演進

  • A few years ago, our data platforming capabilities looked something like this.
    幾年前,我們的資料平台能力大概是這樣的。
  • You had a Foundry data set input.
    你有一個 Foundry 資料集輸入。
  • You could define a transformation on that which would run in Spark and you would output to another Foundry data set.
    你可以在上面定義一個轉換,它會在 Spark 中執行,然後輸出到另一個 Foundry 資料集。
  • Now, Spark was a pretty reasonable choice here.
    Spark 在這裡是一個相當合理的選擇。
  • It can run arbitrary data scale.
    它可以處理任意規模的資料。
  • It will work for a 1 megabyte table and it will work for a one petabyte table.
    它對 1 MB 的表格有效,對 1 PB 的表格也有效。
  • However, a jack of all trades is a master of none.
    然而,樣樣通就是樣樣鬆。
  • And the compute landscape has shifted significantly since Spark was originally developed.
    而且自從 Spark 最初開發以來,運算格局已經發生了重大變化。
  • In recognition of this fact, we invested heavily in compute optionality in the platform.
    認識到這個事實,我們在平台的運算可選性上進行了大量投資。
  • We added support for single node compute engines like Polars, DuckDB, and Data Fusion.
    我們加入了對單節點運算引擎的支援,如 Polars、DuckDB 和 Data Fusion。
  • This was in response to the fact that we now run in a cloud-native way instead of running on commodity hardware where we need to share the load of compute between a bunch of machines together.
    這是因應我們現在以雲端原生方式運作,而不是在需要在多台機器之間分擔運算負載的商用硬體上執行。
  • On top of that, advancements in GPU and CPU architectures have meant that the parallel processing that we can leverage on a single machine has greatly improved.
    除此之外,GPU 和 CPU 架構的進步意味著我們在單台機器上可以利用的平行處理能力已大幅提升。
  • As such, these single node engines are able to deliver incredible performance and efficiency characteristics.
    因此,這些單節點引擎能夠提供驚人的效能和效率表現。
  • Seeing the adoption across the fleet since we initially rolled this out has been amazing and it’s been particularly cool to see the latency critical workflows that were previously not possible with batch compute but can now be delivered with these single node engines.
    自從我們最初推出以來,看到在整個客戶群中的採用情況令人驚艷,尤其是看到那些過去用批次運算做不到的低延遲關鍵工作流,現在可以用這些單節點引擎來實現。
  • If you would like to hear more detail about these offerings in particular, we actually did a deep dive on this last DevCon which you can find on YouTube.
    如果你想了解更多關於這些方案的細節,我們在上次 DevCon 上做了一個深入探討,你可以在 YouTube 上找到。

聯邦運算與 Snowflake 整合

  • We built on this flexibility that we established with the options to choose single node compute engines and we extended the virtual table primitive that existed in the platform.
    我們在選擇單節點運算引擎所建立的彈性基礎上,進一步擴展了平台中既有的虛擬表格基礎建設。
  • So as I already mentioned, virtual tables allow you to essentially store references to externally stored data and in the platform they look just like a regular data set.
    如我已經提到的,虛擬表格基本上讓你可以儲存對外部資料的參照,在平台中它們看起來就像一般的資料集。
  • Well, what we then did is we allowed you to define a transform on that data just as you would be able to do with any other data set.
    然後我們做的是讓你可以在那些資料上定義轉換,就像你對任何其他資料集能做的一樣。
  • But what was special is that the compute would run in the external source system.
    但特別的是,運算會在外部來源系統中執行。
  • This shows our commitment to two things.
    這展現了我們對兩件事的承諾。
  • Firstly, it shows that you’re able to leverage whatever compute makes most sense for your organization and use case.
    第一,它表明你可以利用對你的組織和使用情境最有意義的任何運算資源。
  • And secondly, it shows the commitment to fit with existing data architectures.
    第二,它展現了與現有資料架構配合的承諾。
  • There is one key problem with this architecture though and it’s that the data and the compute must be collocated and federated.
    不過這個架構有一個關鍵問題,就是資料和運算必須共置且聯邦化。
  • While in some cases this makes perfect sense, it does have the downside that it means you’re tightly coupled to whatever external system you choose.
    雖然在某些情況下這完全合理,但缺點是你會與你選擇的任何外部系統緊密耦合。
  • A more recent development is our deep integration with Snowflake allowing you to leverage Snowflake compute on top of Foundry defined transforms in the platform.
    更近期的發展是我們與 Snowflake 的深度整合,讓你可以在平台中 Foundry 定義的轉換之上利用 Snowflake 的運算。
  • So as you can see here with a Foundry Iceberg input table and a Foundry Iceberg output table you now have the choice of single node compute, Spark, or Snowflake.
    如你在這裡所見,有了 Foundry Iceberg 輸入表格和 Foundry Iceberg 輸出表格,你現在可以選擇單節點運算、Spark 或 Snowflake。
  • What’s particularly special about this is that you can continue to leverage the orchestration, governance, and pipeline authoring features of Foundry while leveraging Snowflake’s compute.
    特別之處在於你可以繼續使用 Foundry 的編排、治理和管線撰寫功能,同時利用 Snowflake 的運算。

示範:Snowflake 聯邦運算

  • For me, this is an incredibly energizing workflow to deliver on as it really hits at the crux of what MMDP is trying to achieve, that is a truly interoperable data plane where there are no caveats, it’s low friction, and it’s a real first-class offering.
    對我來說,這是一個令人非常振奮的工作流,因為它真正觸及了 MMDP 試圖達成的核心:一個真正互通的 data plane,沒有附加條件、低摩擦,而且是真正的一等公民方案。
  • I have a short video that shows you just how first class this is.
    我有一段短影片讓你看看它有多一等公民。
  • So from a table you can go to create a Pipeline Builder pipeline as you normally would.
    從一個表格,你可以像往常一樣建立一個 Pipeline Builder 管線。
  • And at the bottom where you would normally select between Spark or single node you can also select to choose external. Here we set it up with an existing Snowflake connection.
    在底部你通常會選擇 Spark 或單節點的地方,你也可以選擇外部。這裡我們用一個現有的 Snowflake 連線來設定。
  • And this connection just tells the pipeline where the compute should run.
    這個連線只是告訴管線運算應該在哪裡執行。
  • From there you get a completely normal pipeline authoring experience in Pipeline Builder.
    從那裡你在 Pipeline Builder 中得到完全正常的管線撰寫體驗。
  • And this to me is what is so special about this offering.
    對我來說,這就是這個方案如此特別的地方。
  • Pipeline Builder in my opinion is one of our most impressive applications.
    在我看來,Pipeline Builder 是我們最令人印象深刻的應用程式之一。
  • It’s so strongly typed.
    它有非常強的型別系統。
  • It’s so feature-rich and there’s really nothing else like it on the market.
    它功能非常豐富,市場上真的沒有其他類似的東西。
  • To be able to take advantage of this in your broader data architecture is a humongous win.
    能在你更廣泛的資料架構中利用這一點是一個巨大的勝利。
  • In this case, we’re just defining a really simple pipeline that filters down to a single row.
    在這個案例中,我們只是定義了一個非常簡單的管線,篩選到單一列。
  • And as you can see, all of the normal pipeline capabilities like preview work exactly as they would with any other Pipeline Builder pipeline.
    如你所見,所有正常的管線功能如預覽,都與任何其他 Pipeline Builder 管線完全一樣地運作。
  • When we go ahead and run this, you’ll be able to see that the compute is running in Snowflake.
    當我們繼續執行時,你可以看到運算是在 Snowflake 中執行的。
  • And once that underlying query gets kicked off, we actually get a nice button that can take us to the external platform and actually show us where that compute ran.
    一旦底層查詢被啟動,我們實際上會得到一個按鈕,可以帶我們到外部平台,實際展示運算在哪裡執行。

結論:Data Plane 在 AIP 中的角色

  • So just to round out the discussion, I wanted to frame how you should think about data plane in the broader context of AIP today.
    為了總結討論,我想說明你應該如何在今天 AIP 的更廣泛脈絡中思考 data plane。
  • Over the next two days, you’re going to see a ton of really cool demos and products that make use of AI and agentic flows.
    在接下來的兩天中,你會看到大量非常酷的使用 AI 和自主式流程的示範和產品。
  • But the success of all of these depends on your data, your ability to scale, and your ability to build.
    但所有這些的成功取決於你的資料、你的規模化能力和你的建構能力。
  • That is what we are here to deliver.
    這就是我們要交付的。