Project Library
Discover and explore quality open source projects
Pachyderm - Data Version Control, Data Pipelines and Data Lineage Tools
Open Messaging - OpenMessaging, aims to establish industry guidelines and provide a common framework for messaging, streaming standards, financial, e-commerce, IoT and big data fields. The design principles are cloud-oriented, simple, flexible and language-independent in distributed heterogeneous environments. Compliance with these specifications will enable the development of heterogeneous messaging applications on all major platforms and operating systems.
CloudEvents - CloudEvents is a new open specification that provides a consistent description standard for event data. The open specification was proposed by the Serverless Working Group under CNCF, and CNCF has established partnerships with multiple cloud service and cloud providers.
Beam - Apache Beam is a unified model for defining batch and streaming data parallel processing pipelines, as well as a set of language-specific SDKs for building pipelines and Runners to execute them on distributed processing backends, including Apache Apex, Apache Flink, Apache Spark and Google Cloud Dataflow.
Storm - Apache Storm is a distributed real-time computing system. Similar to Hadoop providing a set of general primitives for batch processing, Storm processes real-time computing.
Spark - Apache Spark is a fast and general-purpose cluster computing system for big data. It provides high-level APIs in Scala, Java, Python and R, as well as an optimized engine for generic computation graphs that support data analysis.
NiFi - Apache NiFi is an easy-to-use, powerful and reliable system for processing and distributing data
Heron - Apache Heron (incubating) is a real-time, distributed, fault-tolerant stream processing engine from Twitter
High-Performance server for NATS.io, the cloud and edge native messaging system.
ActiveMQ - a high-performance message queue under Apache
RocketMQ - a distributed message and stream platform under Apache, with low latency, high performance, high reliability, trillions of capacity, and dynamic scalability.
Kafka - a distributed stream platform belonging to Apache
Ehcache - the most widely used Java cache
A Multithreaded Fork of Redis
Redis - can be used as a memory data structure storage tool for databases, caches, and message queues