Apache Wayang™

The first open-source cross-platform data processing system

Write your data pipeline once. Run it anywhere.

You write your pipeline against a single API, then decide how it runs. Point it at one engine and it runs there. Or hand Wayang's cost-based optimizer the choice and let it pick the best platform for each step across your laptop, Apache Spark, Apache Flink, or a database, even splitting a single job across several. Either way, when your data outgrows one machine you don't rewrite anything, you just make another engine available.

How it works

Most data processing systems are designed around a single execution engine. That keeps things simple, but your pipeline ends up tied to that engine's API. So combining engines, or moving to another, typically means rewriting and gluing together which is costly and time-consuming.

Wayang sits one level up. You write a pipeline against Wayang's API and register the engines you have. Then it's your call. Want control? Register one engine and it runs there. Want it handled? Register several and let the cost-based optimizer pick the best one for each step, even splitting a single job across engines.

Supported platforms today

Wayang's APIs

Java (Scala-like fluent builder)
Scala
SQL
Java native (low-level, we recommend the fluent scala-like)

The plugin architecture makes adding new operators and platforms straightforward without touching internals — see Adding operators.

Quickstart

We'll run a word count locally first — no cluster, nothing to install on a server — then make Spark available with a one-line change. The pipeline itself never changes; only the set of engines you register does.

1. Run locally

import org.apache.wayang.core.api.Configuration;
import org.apache.wayang.core.api.WayangContext;
import org.apache.wayang.api.JavaPlanBuilder;
import org.apache.wayang.basic.data.Tuple2;
import org.apache.wayang.java.Java;
import java.util.Arrays;

public class WordCount {
    public static void main(String[] args) {
        // Register ONLY the local Java engine → runs on your machine, no cluster needed.
        WayangContext wayang = new WayangContext(new Configuration())
                .withPlugin(Java.basicPlugin());

        new JavaPlanBuilder(wayang)
                .withJobName("WordCount")
                .withUdfJarOf(WordCount.class)
                .readTextFile("file:///path/to/input.txt")
                .flatMap(line -> Arrays.asList(line.split("\\W+")))
                .filter(word -> !word.isEmpty())
                .map(word -> new Tuple2<>(word.toLowerCase(), 1))
                .reduceByKey(Tuple2::getField0,
                             (t1, t2) -> new Tuple2<>(t1.getField0(), t1.getField1() + t2.getField1()))
                .writeTextFile("file:///path/to/output.txt", t -> t.getField0() + ": " + t.getField1());
    }
}

It executes locally. Good for development, tests, and small data.

2. Run it on Spark

Now run the exact same pipeline on Spark instead of locally. You don't touch the pipeline — you change which platform you register: comment out Java and register Spark.

import org.apache.wayang.spark.Spark;               // swap the import

// Same pipeline as before — only the registered platform changed.
WayangContext wayang = new WayangContext(new Configuration())
        // .withPlugin(Java.basicPlugin())           // comment out the local engine
        .withPlugin(Spark.basicPlugin());            // register Spark instead

Run it again. The same pipeline now executes on Spark. You changed where it runs without changing what it does. Switch to Flink or any other supported platform the same way: swap the import and the registered plugin.

Why register only Spark here? Wayang's real power is registering several platforms and letting the optimizer pick. But on small test data the optimizer will almost always pick the local engine (Spark's startup overhead isn't worth it for a tiny file) so you'd never actually see Spark run. Registering Spark alone forces the issue so you can confirm it works. Step 3 shows the production pattern.

3. Register both and let the optimizer choose

This is the point of Wayang. In practice you don't pick a platform at all: you register every engine you have and let the optimizer choose the best one for each step.

// Register BOTH platforms — Wayang's optimizer decides which to use per step.
WayangContext wayang = new WayangContext(new Configuration())
        .withPlugin(Java.basicPlugin())
        .withPlugin(Spark.basicPlugin());

Now Wayang owns the placement decision. For each operator it estimates the cost on every registered platform and picks the cheapest, keeping a small job entirely local, pushing a large one onto Spark, or mixing both within the same job as the data and query demands. On a tiny input you'll see it keep everything local (that's the optimizer working correctly, not ignoring Spark); cross-platform splits show up once the data is big enough to justify them.

Install

Replace WAYANG_VERSION with the latest Maven Central release.

From Maven Central

<dependency>
  <groupId>org.apache.wayang</groupId>
  <artifactId>wayang-core</artifactId>
  <version>WAYANG_VERSION</version>
</dependency>
<dependency>
  <groupId>org.apache.wayang</groupId>
  <artifactId>wayang-basic</artifactId>
  <version>WAYANG_VERSION</version>
</dependency>
<dependency>
  <groupId>org.apache.wayang</groupId>
  <artifactId>wayang-api-scala-java</artifactId>
  <version>WAYANG_VERSION</version>
</dependency>
<!-- add one artifact per engine you want available -->
<dependency>
  <groupId>org.apache.wayang</groupId>
  <artifactId>wayang-java</artifactId>
  <version>WAYANG_VERSION</version>
</dependency>
<dependency>
  <groupId>org.apache.wayang</groupId>
  <artifactId>wayang-spark</artifactId>
  <version>WAYANG_VERSION</version>
</dependency>

The available modules:

wayang-core — core data structures and the optimizer (required)
wayang-basic — common operators and data types (recommended)
wayang-api-scala-java — fluent Scala/Java API for building plans (recommended)
wayang-java, wayang-spark, wayang-flink, wayang-postgres, wayang-sqlite3, wayang-graphchi, wayang-tensorflow, wayang-kafka — per-platform adapters; include one per engine you want available
wayang-profiler — learns operator and UDF cost functions from historical executions

For snapshot builds, add Apache's snapshot repository:

<repositories>
  <repository>
    <id>apache-snapshots</id>
    <name>Apache Foundation Snapshot Repository</name>
    <url>https://repository.apache.org/content/repositories/snapshots</url>
  </repository>
</repositories>

Build from source

git clone https://github.com/apache/wayang.git
cd wayang
./mvnw clean install -DskipTests

The current snapshot version lives in pom.xml.

Runtime requirements

Java 17 — set JAVA_HOME to your Java 17 installation.
Apache Spark 3.4.4 with Scala 2.12 — set SPARK_HOME.
Apache Hadoop 3+ — set HADOOP_HOME.
Maven for building from source.

Important

Java 17 needs extra JVM flags. Running Wayang on Java 17 (especially with Spark) requires opening some internal Java modules, or you'll hit IllegalAccessError. Edit your wayang-submit script (under wayang-assembly/target/wayang-WAYANG_VERSION/bin/wayang-submit) so the runner invocation passes:

--add-exports=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED

On Windows, also set HADOOP_HOME to a directory containing winutils.exe (unofficial source).

Validate the install

After building, unpack the assembly and put Wayang on your PATH:

tar -xvf wayang-WAYANG_VERSION.tar.gz
cd wayang-WAYANG_VERSION

# Linux
echo "export WAYANG_HOME=$(pwd)" >> ~/.bashrc
echo "export PATH=${PATH}:${WAYANG_HOME}/bin" >> ~/.bashrc
source ~/.bashrc

# macOS
echo "export WAYANG_HOME=$(pwd)" >> ~/.zshrc
echo "export PATH=${PATH}:${WAYANG_HOME}/bin" >> ~/.zshrc
source ~/.zshrc

Then run the bundled WordCount on your local Java engine:

bin/wayang-submit org.apache.wayang.apps.wordcount.Main java file://$(pwd)/README.md

Running the tests

./mvnw test

Documentation

Getting started — the full tabbed walkthrough in Java, Scala, and Python.
How Wayang chooses a platform — what drives the optimizer's decisions.
Adding operators — extend Wayang with new operators or platforms.
Example applications — runnable apps in this repo.
Developing with Wayang — using Wayang in your own Java/Scala project.

Contributing

Contributions are welcome — bug reports, doc fixes, new platform adapters, new operators, optimizer improvements, anything. Start with CONTRIBUTING.md and the building guide, open an issue if you're not sure where to start, and introduce yourself on the dev mailing list — that's where active work gets discussed.

If you're looking for somewhere to begin, doc improvements, new operators, and additional examples are areas where a focused PR can land quickly.

Community

Mailing lists — https://wayang.apache.org/docs/community/mailinglist (user and dev)
LinkedIn — Apache Wayang

Authors

See the full list of contributors.

License

All files in this repository are licensed under the Apache License 2.0.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Acknowledgements

The logo was donated by Brian Vera.

Name		Name	Last commit message	Last commit date
Latest commit History 3,043 Commits
.github		.github
.mvn/wrapper		.mvn/wrapper
bin		bin
build		build
conf		conf
guides		guides
images		images
python		python
src/main/script		src/main/script
tools/test/config		tools/test/config
wayang-api		wayang-api
wayang-applications		wayang-applications
wayang-assembly		wayang-assembly
wayang-benchmark		wayang-benchmark
wayang-commons		wayang-commons
wayang-docs		wayang-docs
wayang-ml4all		wayang-ml4all
wayang-platforms		wayang-platforms
wayang-plugins		wayang-plugins
wayang-profiler		wayang-profiler
wayang-resources		wayang-resources
wayang-tests-integration		wayang-tests-integration
.asf.yaml		.asf.yaml
.dlc.json		.dlc.json
.gitignore		.gitignore
.gitmodules		.gitmodules
.licenserc.yaml		.licenserc.yaml
CONTRIBUTING.md		CONTRIBUTING.md
DISCLAIMER		DISCLAIMER
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
RELEASE_NOTES		RELEASE_NOTES
build.md		build.md
env_template_osx.sh		env_template_osx.sh
general-todos.md		general-todos.md
jenkins.pom		jenkins.pom
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Wayang™

The first open-source cross-platform data processing system

Table of contents

How it works

Quickstart

1. Run locally

2. Run it on Spark

3. Register both and let the optimizer choose

Install

From Maven Central

Build from source

Runtime requirements

Validate the install

Running the tests

Documentation

Contributing

Community

Authors

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Apache Wayang™

The first open-source cross-platform data processing system

Table of contents

How it works

Quickstart

1. Run locally

2. Run it on Spark

3. Register both and let the optimizer choose

Install

From Maven Central

Build from source

Runtime requirements

Validate the install

Running the tests

Documentation

Contributing

Community

Authors

License

Acknowledgements

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages