BigQuery :: Continuonus Query

BigQuery 近年来一直不断扩充它功能的广度，让他远远不只是 Data Warehouse 这么简单，而是功能强大的 Data Platform（成本也越来越高...）

有蛮多方法可以实时写进 BigQuery 的，GCP 自身的产品服务就有很高的支援度，不过过去比较难透过 BigQuery 去处理实时的需求，下游服务若要即时更新资料，需要即时打 API 来取得资料，然而 BigQuery 在连线设计上对于高併发需求的支援不是这么的高，并不是设计来这样使用的。

近期还在 Preview 阶段的这个功能 -- continuonus query，看起来想要让 BigQuery 也能支援即时的资料处理，以因应近年不断成长的各种越来越即时的资料应用。

EXPORT DATA
OPTIONS (
format = \'CLOUD_PUBSUB\',
uri = \'https://pubsub.googleapis.com/projects/myproject/topics/sales-3\')
AS (
SELECT
customer_id,
product_id,
amounts,
event_timestamp,
FROM `my_project.real_time.fct_sales`
WHERE product_id = 3
);

这个功能不只能在 BigQuery 内部进行实时的 pipeline 跟 ETL，譬如可以运用这个机制来即时监控输入的资料是否符合预期、有无风险等；也可以结合其他服务，像是 Pub/Sub, Bigtable，进行资料的导出，在导出后就有很高的活用空间，也许是用来即时的训练模型、或是 AI 服务等。在 continuous query 中可以运用 google 预先定义好 AI function。

Use AI functionsAdditional APIs, IAM permissions, and Google Cloud resources are required to use a supported AI function in a continuous query. For more information, see one of the following topics, based on your use case:

Generate text by using the ML.GENERATE_TEXT function
Generate text embeddings by using the ML.GENERATE_EMBEDDING function
Understand text with the ML.UNDERSTAND_TEXT function
Translate text with the ML.TRANSLATE function

不过还是有一些限制，这个功能中无法使用复杂的 query 语法，像是 join, aggregation, group by, distinct ...等，基本上可以先视为只有最入门学到的 select, from, where 可以用，再加上一些栏位本身的转换而已。

另外在费用上，看起来这个功能不使用 on-demant 的计价模式（per TiB），而是 capacity 的计价模式（per slot-hour），需要先建立 Reservation，才能将 query 建立在其中，而 Reservation 最基础的设置就需要 100 slots，并且服务是建立在 Enterprise 方案上，一个月最少就要烧 4,320 USD，不知道这样可以支持多少 continuous queries 运行 XD。

Ref:

Release Notes
Introduction to continuous queries
Pricing

BigQuery :: Continuonus Query

相关文章