Awesome Flink

Awesome Flink

A curated list of awesome Apache Flink frameworks, libraries, connectors, tools, and resources.

Apache Flink is an open-source unified stream and batch data processing framework with powerful state management, event-time semantics, and exactly-once guarantees.

Packages & Libraries

DSLs & Frameworks

Flink Reactor DSL - Write streaming pipelines as TypeScript components. Compile to Flink SQL + Kubernetes CRDs.
Stateful Functions (StateFun) - API for building distributed stateful applications on top of Flink.
Flink Kubernetes Operator - Kubernetes operator for managing Flink application deployments and lifecycle.
Ververica Platform - Enterprise platform for developing, deploying, and operating Flink applications.

Official Connectors

Connectors maintained under the Apache Flink project or Apache umbrella.

Flink CDC - Change Data Capture connectors for Flink, supporting MySQL, PostgreSQL, MongoDB, and more.
Flink Kafka Connector - Apache Kafka source and sink connector.
Flink JDBC Connector - Connector for reading from and writing to JDBC-compatible databases.
Flink Elasticsearch Connector - Connector for indexing data into Elasticsearch.
Flink OpenSearch Connector - Connector for indexing data into OpenSearch clusters.
Flink AWS Connectors - Connectors for AWS services including Kinesis, DynamoDB, and Firehose.
Flink GCP Connectors - Connectors for Google Cloud Platform services, maintained by Google.
Flink MongoDB Connector - Official source and sink for MongoDB.
Flink Cassandra Connector - Connector for Apache Cassandra.
Flink HBase Connector - Connector for Apache HBase.
Flink RabbitMQ Connector - Connector for RabbitMQ message broker.
Flink Pulsar Connector - Connector for Apache Pulsar.
Apache Paimon - Streaming data lake platform with native Flink integration for real-time analytics.
Apache Iceberg Flink - Apache Iceberg integration for Flink, enabling table format operations on data lakes.

Community Connectors

Connectors maintained by database vendors or independent developers.

ClickHouse Connector - Flink connector for ClickHouse, maintained by the ClickHouse team.
ClickHouse Connector (itinycheng) - Flink SQL connector for ClickHouse with catalog support, read/write for complex types.
StarRocks Connector - Read/write connector for StarRocks with DataStream, Table API, SQL, and Flink CDC 3.0 support.
Doris Connector - Flink connector for Apache Doris maintained by the Doris community.
Redis Connector - Async Redis connector built on Lettuce, supporting SQL join/sink with query caching.
HTTP Connector - Source and sink for REST APIs with DataStream, Table, and SQL support.
Snowflake Connector - Flink connector for Snowflake, maintained by DeltaStream.
NATS Connector - Connector for NATS messaging, maintained by Synadia.
OceanBase Connector - Flink connector for OceanBase distributed database.
NebulaGraph Connector - Flink connector for NebulaGraph graph database.
QuestDB Connector - Flink sink for QuestDB time-series database using InfluxDB Line Protocol.
TiBigData - TiDB connectors for Flink Table API, maintained by the TiDB incubator.

Machine Learning

Flink ML - Official machine learning library for Flink with algorithms for classification, regression, and clustering.
Alink - Alibaba's machine learning platform built on Flink for batch and stream processing.
dl-on-flink - Deep learning framework integration (TensorFlow, PyTorch) running on Flink.

Complex Event Processing

Flink CEP - Built-in Complex Event Processing library for detecting patterns in event streams.
Flink CEP SQL - SQL-based pattern matching using the MATCH_RECOGNIZE clause.

State Backends

RocksDB State Backend - Production-grade state backend for large state using embedded RocksDB.
ForSt State Backend - Next-generation state backend (Flink on RocksDB over Storage) for disaggregated storage.

Testing & Quality

flink-testing - Official testing utilities including MiniCluster, test harnesses, and test sources/sinks.

Monitoring & Observability

Flink Reactor Console - FlinkReactor Console — a real-time dashboard and GraphQL server for managing Apache Flink clusters.
Flink Metrics System - Built-in metrics system with reporters for Prometheus, Graphite, Datadog, and more.
Flink Web UI - Built-in dashboard for monitoring job status, backpressure, checkpoints, and task managers.

Flink SQL

Tools & Frameworks

Flink Reactor DSL - Write streaming pipelines as TypeScript components. Compile to Flink SQL + Kubernetes CRDs.
Flink SQL Gateway - REST service for submitting Flink SQL statements remotely over a standard API.
Flink SQL Client - Interactive CLI for writing and executing Flink SQL queries against running clusters.

Connectors & Catalogs

Hive Catalog - Persistent catalog using Hive Metastore for managing Flink SQL metadata.
Paimon Catalog - Native catalog integration for Apache Paimon streaming lakehouse tables.
Iceberg Catalog - Catalog implementation for managing Apache Iceberg tables in Flink SQL.
JDBC Catalog - Catalog for exposing existing relational database tables as Flink SQL tables.

Tutorials & Examples

Official Flink SQL Documentation - Comprehensive SQL reference covering DDL, DML, queries, and built-in functions.
Flink SQL Cookbook - Collection of common Flink SQL recipes and patterns from Ververica.
Getting Started with Flink SQL - Official hands-on tutorial for building your first Flink SQL application.
Exploring Watermarks in Flink SQL - Deep dive into understanding and configuring watermarks in Flink SQL.
Exploring Joins and Changelogs in Flink SQL - Detailed exploration of join types and changelog semantics in Flink SQL.
Sending Data to Apache Iceberg with Flink - Guide for using Flink SQL to write to Apache Iceberg on S3-compatible storage.

UDFs & Extensions

Flink UDF Documentation - Official guide for implementing scalar, table, and aggregate user-defined functions.
flink-faker - Table source for generating fake test data using SQL DDL with Datafaker expressions.

Flink 2.x

Flink 2.0 marked a major milestone — the biggest release in the project's history with 165 contributors, 25 FLIPs, and sweeping architectural changes including disaggregated state management, API modernization, and the removal of legacy APIs. Subsequent 2.x releases have continued the push toward unified real-time data and AI workloads.

What Changed from 1.x

Flink 2.0 is a breaking release. Key removals and changes to be aware of:

Flink 2.0 Release Notes - Complete list of breaking changes and migration notes.
Upgrading Applications and Flink Framework - Official guide for upgrading from 1.x to 2.x.
FLIP-458: Long-Term Support for Final 1.x Release - LTS plan for the final Flink 1.x version to ease migration.
Removed APIs: DataSet API, Scala DataStream/DataSet APIs, SourceFunction/SinkFunction/Sink V1, TableSource/TableSink, FsStateBackend, MemoryStateBackend, and per-job deployment mode — 210+ deprecated classes removed in total.
Java version changes: Java 8 dropped. Java 17 is the new default. Java 11 (minimum) and Java 21 are supported.
Configuration overhaul: Legacy flink-conf.yaml replaced by standard YAML config.yaml with a migration tool provided.
State compatibility: Recovery from 1.x savepoints may require migration strategies — state compatibility is not guaranteed across the major version boundary.
Connector impact: Connectors depending on SourceFunction/SinkFunction/Sink V1 do not work on 2.x. Official connectors (Kafka, Paimon, JDBC, Elasticsearch) shipped 2.x-compatible versions at launch.

Flink 2.0

Released March 2025. A ground-up modernization of the Flink runtime and API surface.

Apache Flink 2.0.0: A New Era of Real-Time Data Processing - Official release announcement.
Preview Release of Apache Flink 2.0 - Early preview release with migration guidance.
Apache Flink 2.0.1 Release - Bug fix release with 51 fixes.
2.0 Release Planning - FLIP tracker and planning wiki.

Headline features:

Disaggregated State Management — Decouples state storage from compute using the new ForSt state backend over distributed file systems. Enables asynchronous, non-blocking state access and fast rescaling for jobs with hundreds of terabytes of state. Nexmark benchmarks show 75-120% throughput versus traditional local state stores.
DataStream V2 API (experimental) — New DataStream replacement with ProcessFunction, partitioning primitives, state, time services, and watermark processing.
Materialized Tables — Unified real-time and historical data management through a single pipeline with schema/query updates without reprocessing. Native Kubernetes/YARN submission and Paimon integration for ACID transactions.
Adaptive Batch Execution — Dynamic broadcast join selection and automatic join skew optimization, achieving 8-16% TPC-DS benchmark improvements.
AI/ML in CDC — Flink CDC 3.3 with dynamic AI model invocation (OpenAI chat/embedding models) and specialized SQL syntax for defining and invoking AI models.
SQL enhancements — QUALIFY clause for window function filtering, SQL Gateway in application mode, seven critical SQL operators with async state access.

Flink 2.1

Released July 2025. Focused on unified real-time data and AI, with major SQL and streaming improvements.

Apache Flink 2.1.0: Unified Real-Time Data + AI - Official release announcement.
Apache Flink 2.1.1 Release - Bug fix release with 25 fixes.
Flink 2.1 Release Notes - Detailed release notes and migration notes.
2.1 Release Planning - FLIP tracker and planning wiki.

Headline features:

Model DDLs & ML_PREDICT — Define AI models as catalog objects and invoke them via ML_PREDICT for real-time inference in SQL queries, with built-in OpenAI support.
Process Table Functions (PTFs) — Stateful transformations with managed state, event-time, timers, and changelog access — capabilities that previously required DataStream expertise, now accessible from SQL.
Variant Type — Semi-structured data type for deeply nested or evolving schemas, with native Paimon integration.
Delta Join — New streaming join operator requiring significantly less state than regular joins, enabled by default.
StreamingMultiJoinOperator — Zero intermediate state for cascaded joins sharing common keys.
SQL Connector for Keyed State — Query keyed state from checkpoints and savepoints directly via Flink SQL.

Flink 2.2

Released December 2025. The latest stable release, advancing AI integration and operational maturity.

Apache Flink 2.2.0: Advancing Real-Time Data + AI - Official release announcement.
Flink 2.2 Release Notes - Detailed release notes.

Headline features:

ML_PREDICT in Table API — Model inference now available programmatically beyond SQL.
VECTOR_SEARCH — Real-time vector similarity search within Flink SQL for AI-powered retrieval.
Materialized Table Enhancements — Optional FRESHNESS clause, bucketing via DISTRIBUTED BY, SHOW MATERIALIZED TABLES, and customizable defaults via MaterializedTableEnricher.
SinkUpsertMaterializer V2 — Fixes exponential performance degradation in changelog reconciliation.
Delta Join Improvements — Expanded SQL pattern support, CDC source support, and caching to reduce external storage requests.
Operational improvements — Balanced task scheduling across TaskManagers, time-based job history retention, RateLimiter for scan sources, and balanced splits assignment for addressing data skew.

Resources

Official Resources

Apache Flink Website - Official project homepage with documentation and downloads.
Flink Documentation - Comprehensive documentation for the latest stable release.
Flink GitHub Repository - Source code and issue tracker.
Flink Blog - Official blog with release announcements and technical deep dives.
Flink Wiki - Apache Confluence wiki with FLIPs, design docs, and meeting notes.
Flink Improvement Proposals (FLIPs) - Design documents for major Flink features and changes.

Books

Stream Processing with Apache Flink - Fabian Hueske and Vasiliki Kalavri. Foundational guide to Flink's architecture and programming model.
Learning Apache Flink - Tanmay Deshpande. Practical introduction to building streaming applications with Flink.

Courses & Tutorials

Apache Flink Training - Official self-paced training exercises covering core Flink concepts.
Immerok Apache Flink Tutorials - Free Confluent course on Apache Flink fundamentals.

Papers

Apache Flink: Stream and Batch Processing in a Single Engine - Foundational paper describing Flink's unified architecture.
State Management in Apache Flink - VLDB paper on Flink's approach to consistent stateful stream processing.
Lightweight Asynchronous Snapshots for Distributed Dataflows - The paper behind Flink's checkpointing mechanism (based on Chandy-Lamport).

Blogs

Robin Moffatt (rmoff.net) - In-depth Flink SQL tutorials covering watermarks, joins, changelogs, Iceberg integration, and CDC.
Ververica Blog - Technical blog from the original Flink creators covering architecture, best practices, and ecosystem.
Confluent - Apache Flink - Confluent's Apache Flink product page with resources on managed Flink and streaming architectures.
Flink Community Blog - Official Apache Flink blog with release notes and community updates.

Videos & Talks

Flink Forward - Annual conference dedicated to Apache Flink with talks from 2015 to 2025.
Flink Forward YouTube - Recorded talks and presentations from Flink Forward conferences.
Apache Flink YouTube - Official Apache Flink YouTube channel with tutorials and community talks.

Community

Flink Mailing Lists - Developer and user mailing lists for discussion and support.
Flink Slack - Community Slack workspace for real-time conversations.
Stack Overflow - Q&A tagged with apache-flink.
Flink Meetups - Local meetup groups worldwide.

Related Projects

Apache Kafka - Distributed event streaming platform commonly used as a source and sink for Flink.
Apache Spark Structured Streaming - Alternative stream processing framework with micro-batch semantics.
Apache Beam - Unified model for batch and stream processing that can use Flink as a runner.
Apache Kafka Streams - Lightweight stream processing library built on Kafka.
Materialize - Streaming SQL database powered by Timely Dataflow.
RisingWave - Distributed SQL streaming database for real-time analytics.

Contributing

Contributions are welcome! Please read the contribution guidelines before submitting a pull request.

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
.lychee.toml		.lychee.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Awesome Flink

Contents

Packages & Libraries

DSLs & Frameworks

Official Connectors

Community Connectors

Machine Learning

Complex Event Processing

State Backends

Testing & Quality

Monitoring & Observability

Flink SQL

Tools & Frameworks

Connectors & Catalogs

Tutorials & Examples

UDFs & Extensions

Flink 2.x

What Changed from 1.x

Flink 2.0

Flink 2.1

Flink 2.2

Resources

Official Resources

Books

Courses & Tutorials

Papers

Blogs

Videos & Talks

Community

Related Projects

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages