A curated list of awesome Apache Flink frameworks, libraries, connectors, tools, and resources.
Apache Flink is an open-source unified stream and batch data processing framework with powerful state management, event-time semantics, and exactly-once guarantees.
- Flink Reactor DSL - Write streaming pipelines as TypeScript components. Compile to Flink SQL + Kubernetes CRDs.
- Stateful Functions (StateFun) - API for building distributed stateful applications on top of Flink.
- Flink Kubernetes Operator - Kubernetes operator for managing Flink application deployments and lifecycle.
- Ververica Platform - Enterprise platform for developing, deploying, and operating Flink applications.
Connectors maintained under the Apache Flink project or Apache umbrella.
- Flink CDC - Change Data Capture connectors for Flink, supporting MySQL, PostgreSQL, MongoDB, and more.
- Flink Kafka Connector - Apache Kafka source and sink connector.
- Flink JDBC Connector - Connector for reading from and writing to JDBC-compatible databases.
- Flink Elasticsearch Connector - Connector for indexing data into Elasticsearch.
- Flink OpenSearch Connector - Connector for indexing data into OpenSearch clusters.
- Flink AWS Connectors - Connectors for AWS services including Kinesis, DynamoDB, and Firehose.
- Flink GCP Connectors - Connectors for Google Cloud Platform services, maintained by Google.
- Flink MongoDB Connector - Official source and sink for MongoDB.
- Flink Cassandra Connector - Connector for Apache Cassandra.
- Flink HBase Connector - Connector for Apache HBase.
- Flink RabbitMQ Connector - Connector for RabbitMQ message broker.
- Flink Pulsar Connector - Connector for Apache Pulsar.
- Apache Paimon - Streaming data lake platform with native Flink integration for real-time analytics.
- Apache Iceberg Flink - Apache Iceberg integration for Flink, enabling table format operations on data lakes.
Connectors maintained by database vendors or independent developers.
- ClickHouse Connector - Flink connector for ClickHouse, maintained by the ClickHouse team.
- ClickHouse Connector (itinycheng) - Flink SQL connector for ClickHouse with catalog support, read/write for complex types.
- StarRocks Connector - Read/write connector for StarRocks with DataStream, Table API, SQL, and Flink CDC 3.0 support.
- Doris Connector - Flink connector for Apache Doris maintained by the Doris community.
- Redis Connector - Async Redis connector built on Lettuce, supporting SQL join/sink with query caching.
- HTTP Connector - Source and sink for REST APIs with DataStream, Table, and SQL support.
- Snowflake Connector - Flink connector for Snowflake, maintained by DeltaStream.
- NATS Connector - Connector for NATS messaging, maintained by Synadia.
- OceanBase Connector - Flink connector for OceanBase distributed database.
- NebulaGraph Connector - Flink connector for NebulaGraph graph database.
- QuestDB Connector - Flink sink for QuestDB time-series database using InfluxDB Line Protocol.
- TiBigData - TiDB connectors for Flink Table API, maintained by the TiDB incubator.
- Flink ML - Official machine learning library for Flink with algorithms for classification, regression, and clustering.
- Alink - Alibaba's machine learning platform built on Flink for batch and stream processing.
- dl-on-flink - Deep learning framework integration (TensorFlow, PyTorch) running on Flink.
- Flink CEP - Built-in Complex Event Processing library for detecting patterns in event streams.
- Flink CEP SQL - SQL-based pattern matching using the
MATCH_RECOGNIZEclause.
- RocksDB State Backend - Production-grade state backend for large state using embedded RocksDB.
- ForSt State Backend - Next-generation state backend (Flink on RocksDB over Storage) for disaggregated storage.
- flink-testing - Official testing utilities including MiniCluster, test harnesses, and test sources/sinks.
- Flink Reactor Console - FlinkReactor Console — a real-time dashboard and GraphQL server for managing Apache Flink clusters.
- Flink Metrics System - Built-in metrics system with reporters for Prometheus, Graphite, Datadog, and more.
- Flink Web UI - Built-in dashboard for monitoring job status, backpressure, checkpoints, and task managers.
- Flink Reactor DSL - Write streaming pipelines as TypeScript components. Compile to Flink SQL + Kubernetes CRDs.
- Flink SQL Gateway - REST service for submitting Flink SQL statements remotely over a standard API.
- Flink SQL Client - Interactive CLI for writing and executing Flink SQL queries against running clusters.
- Hive Catalog - Persistent catalog using Hive Metastore for managing Flink SQL metadata.
- Paimon Catalog - Native catalog integration for Apache Paimon streaming lakehouse tables.
- Iceberg Catalog - Catalog implementation for managing Apache Iceberg tables in Flink SQL.
- JDBC Catalog - Catalog for exposing existing relational database tables as Flink SQL tables.
- Official Flink SQL Documentation - Comprehensive SQL reference covering DDL, DML, queries, and built-in functions.
- Flink SQL Cookbook - Collection of common Flink SQL recipes and patterns from Ververica.
- Getting Started with Flink SQL - Official hands-on tutorial for building your first Flink SQL application.
- Exploring Watermarks in Flink SQL - Deep dive into understanding and configuring watermarks in Flink SQL.
- Exploring Joins and Changelogs in Flink SQL - Detailed exploration of join types and changelog semantics in Flink SQL.
- Sending Data to Apache Iceberg with Flink - Guide for using Flink SQL to write to Apache Iceberg on S3-compatible storage.
- Flink UDF Documentation - Official guide for implementing scalar, table, and aggregate user-defined functions.
- flink-faker - Table source for generating fake test data using SQL DDL with Datafaker expressions.
Flink 2.0 marked a major milestone — the biggest release in the project's history with 165 contributors, 25 FLIPs, and sweeping architectural changes including disaggregated state management, API modernization, and the removal of legacy APIs. Subsequent 2.x releases have continued the push toward unified real-time data and AI workloads.
Flink 2.0 is a breaking release. Key removals and changes to be aware of:
- Flink 2.0 Release Notes - Complete list of breaking changes and migration notes.
- Upgrading Applications and Flink Framework - Official guide for upgrading from 1.x to 2.x.
- FLIP-458: Long-Term Support for Final 1.x Release - LTS plan for the final Flink 1.x version to ease migration.
- Removed APIs: DataSet API, Scala DataStream/DataSet APIs, SourceFunction/SinkFunction/Sink V1, TableSource/TableSink, FsStateBackend, MemoryStateBackend, and per-job deployment mode — 210+ deprecated classes removed in total.
- Java version changes: Java 8 dropped. Java 17 is the new default. Java 11 (minimum) and Java 21 are supported.
- Configuration overhaul: Legacy
flink-conf.yamlreplaced by standard YAMLconfig.yamlwith a migration tool provided. - State compatibility: Recovery from 1.x savepoints may require migration strategies — state compatibility is not guaranteed across the major version boundary.
- Connector impact: Connectors depending on SourceFunction/SinkFunction/Sink V1 do not work on 2.x. Official connectors (Kafka, Paimon, JDBC, Elasticsearch) shipped 2.x-compatible versions at launch.
Released March 2025. A ground-up modernization of the Flink runtime and API surface.
- Apache Flink 2.0.0: A New Era of Real-Time Data Processing - Official release announcement.
- Preview Release of Apache Flink 2.0 - Early preview release with migration guidance.
- Apache Flink 2.0.1 Release - Bug fix release with 51 fixes.
- 2.0 Release Planning - FLIP tracker and planning wiki.
Headline features:
- Disaggregated State Management — Decouples state storage from compute using the new ForSt state backend over distributed file systems. Enables asynchronous, non-blocking state access and fast rescaling for jobs with hundreds of terabytes of state. Nexmark benchmarks show 75-120% throughput versus traditional local state stores.
- DataStream V2 API (experimental) — New DataStream replacement with ProcessFunction, partitioning primitives, state, time services, and watermark processing.
- Materialized Tables — Unified real-time and historical data management through a single pipeline with schema/query updates without reprocessing. Native Kubernetes/YARN submission and Paimon integration for ACID transactions.
- Adaptive Batch Execution — Dynamic broadcast join selection and automatic join skew optimization, achieving 8-16% TPC-DS benchmark improvements.
- AI/ML in CDC — Flink CDC 3.3 with dynamic AI model invocation (OpenAI chat/embedding models) and specialized SQL syntax for defining and invoking AI models.
- SQL enhancements —
QUALIFYclause for window function filtering, SQL Gateway in application mode, seven critical SQL operators with async state access.
Released July 2025. Focused on unified real-time data and AI, with major SQL and streaming improvements.
- Apache Flink 2.1.0: Unified Real-Time Data + AI - Official release announcement.
- Apache Flink 2.1.1 Release - Bug fix release with 25 fixes.
- Flink 2.1 Release Notes - Detailed release notes and migration notes.
- 2.1 Release Planning - FLIP tracker and planning wiki.
Headline features:
- Model DDLs & ML_PREDICT — Define AI models as catalog objects and invoke them via
ML_PREDICTfor real-time inference in SQL queries, with built-in OpenAI support. - Process Table Functions (PTFs) — Stateful transformations with managed state, event-time, timers, and changelog access — capabilities that previously required DataStream expertise, now accessible from SQL.
- Variant Type — Semi-structured data type for deeply nested or evolving schemas, with native Paimon integration.
- Delta Join — New streaming join operator requiring significantly less state than regular joins, enabled by default.
- StreamingMultiJoinOperator — Zero intermediate state for cascaded joins sharing common keys.
- SQL Connector for Keyed State — Query keyed state from checkpoints and savepoints directly via Flink SQL.
Released December 2025. The latest stable release, advancing AI integration and operational maturity.
- Apache Flink 2.2.0: Advancing Real-Time Data + AI - Official release announcement.
- Flink 2.2 Release Notes - Detailed release notes.
Headline features:
- ML_PREDICT in Table API — Model inference now available programmatically beyond SQL.
- VECTOR_SEARCH — Real-time vector similarity search within Flink SQL for AI-powered retrieval.
- Materialized Table Enhancements — Optional
FRESHNESSclause, bucketing viaDISTRIBUTED BY,SHOW MATERIALIZED TABLES, and customizable defaults viaMaterializedTableEnricher. - SinkUpsertMaterializer V2 — Fixes exponential performance degradation in changelog reconciliation.
- Delta Join Improvements — Expanded SQL pattern support, CDC source support, and caching to reduce external storage requests.
- Operational improvements — Balanced task scheduling across TaskManagers, time-based job history retention, RateLimiter for scan sources, and balanced splits assignment for addressing data skew.
- Apache Flink Website - Official project homepage with documentation and downloads.
- Flink Documentation - Comprehensive documentation for the latest stable release.
- Flink GitHub Repository - Source code and issue tracker.
- Flink Blog - Official blog with release announcements and technical deep dives.
- Flink Wiki - Apache Confluence wiki with FLIPs, design docs, and meeting notes.
- Flink Improvement Proposals (FLIPs) - Design documents for major Flink features and changes.
- Stream Processing with Apache Flink - Fabian Hueske and Vasiliki Kalavri. Foundational guide to Flink's architecture and programming model.
- Learning Apache Flink - Tanmay Deshpande. Practical introduction to building streaming applications with Flink.
- Apache Flink Training - Official self-paced training exercises covering core Flink concepts.
- Immerok Apache Flink Tutorials - Free Confluent course on Apache Flink fundamentals.
- Apache Flink: Stream and Batch Processing in a Single Engine - Foundational paper describing Flink's unified architecture.
- State Management in Apache Flink - VLDB paper on Flink's approach to consistent stateful stream processing.
- Lightweight Asynchronous Snapshots for Distributed Dataflows - The paper behind Flink's checkpointing mechanism (based on Chandy-Lamport).
- Robin Moffatt (rmoff.net) - In-depth Flink SQL tutorials covering watermarks, joins, changelogs, Iceberg integration, and CDC.
- Ververica Blog - Technical blog from the original Flink creators covering architecture, best practices, and ecosystem.
- Confluent - Apache Flink - Confluent's Apache Flink product page with resources on managed Flink and streaming architectures.
- Flink Community Blog - Official Apache Flink blog with release notes and community updates.
- Flink Forward - Annual conference dedicated to Apache Flink with talks from 2015 to 2025.
- Flink Forward YouTube - Recorded talks and presentations from Flink Forward conferences.
- Apache Flink YouTube - Official Apache Flink YouTube channel with tutorials and community talks.
- Flink Mailing Lists - Developer and user mailing lists for discussion and support.
- Flink Slack - Community Slack workspace for real-time conversations.
- Stack Overflow - Q&A tagged with
apache-flink. - Flink Meetups - Local meetup groups worldwide.
- Apache Kafka - Distributed event streaming platform commonly used as a source and sink for Flink.
- Apache Spark Structured Streaming - Alternative stream processing framework with micro-batch semantics.
- Apache Beam - Unified model for batch and stream processing that can use Flink as a runner.
- Apache Kafka Streams - Lightweight stream processing library built on Kafka.
- Materialize - Streaming SQL database powered by Timely Dataflow.
- RisingWave - Distributed SQL streaming database for real-time analytics.
Contributions are welcome! Please read the contribution guidelines before submitting a pull request.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
