Skip to content

[TBD] - Implement Docker support with image creation and configuration updates#767

Open
novatechflow wants to merge 2 commits into
apache:mainfrom
novatechflow:docker-build
Open

[TBD] - Implement Docker support with image creation and configuration updates#767
novatechflow wants to merge 2 commits into
apache:mainfrom
novatechflow:docker-build

Conversation

@novatechflow
Copy link
Copy Markdown
Member

TBD - to be discussed

Summary

Add Docker image support for Apache Wayang without bundling external execution platforms.

Changes

  • Add a Dockerfile that packages the Wayang assembly with a Java 17 runtime
  • Add a Docker workflow that builds the assembly, builds the image, and runs a Java platform smoke test
  • Make SPARK_HOME, HADOOP_HOME, and FLINK_HOME optional in wayang-submit
  • Support external platform runtimes through mounted directories and environment variables
  • Make Hadoop-backed Java channel conversions conditional on Hadoop classes being available
  • Document Docker build and runtime examples in wayang-assembly/README.md

Rationale

The image should distribute Wayang only. Spark, Hadoop, Flink, JDBC drivers, and other platform runtimes remain user supplied so the project does not take ownership of maintaining third-party platform distributions inside the image.

Verification

  • bash -n bin/wayang-submit
  • JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-17.jdk/Contents/Home ./mvnw -pl :wayang-java,:wayang-assembly -am -Pdistribution -DskipTests package
  • JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-17.jdk/Contents/Home OTHER_FLAGS=-Duser.home=/private/tmp wayang-assembly/target/apache-wayang-assembly-1.1.2-SNAPSHOT-dist/wayang-1.1.2-SNAPSHOT/bin/wayang-submit org.apache.wayang.apps.pi.PiEstimation java 1

Notes

Built on Mac, path are different on Linux / Windows.

@zkaoudi
Copy link
Copy Markdown
Contributor

zkaoudi commented Jun 3, 2026

Thanks for bringing the conversation here Alex.
So this is for creating a lightweight standalone Wayang image that could only run jobs in Java. Is that correct?

@novatechflow
Copy link
Copy Markdown
Member Author

Yes, that's the whole release we publish, I changed the Wayang-submit part so we could add configs there.

@zkaoudi
Copy link
Copy Markdown
Contributor

zkaoudi commented Jun 4, 2026

Great, that's fine with me.

Another thing we could provide is a docker image for working with the python api which is more cumbersome as the Wayang REST server needs to be up and running. What do you think?
See here: https://github.com/apache/wayang/tree/main/python

@novatechflow
Copy link
Copy Markdown
Member Author

Dug into that - needs a few changes in code, so I just saw that we hard-coded the python client is hard-coded! to use localhost, which of course makes a docker container impossible. I also found a bunch of other hard-coded values which made Wayang seems clunky.

- Updated Dockerfile for Python API with necessary configurations.
- Created entrypoint script for Python container to ensure API availability.
- Added example word count script for demonstration.
- Enhanced Java integration with dynamic API URL configuration.
- Updated dependencies in setup.cfg for compatibility.
- Modified GitHub Actions workflow to include Python image build and tests.
@novatechflow
Copy link
Copy Markdown
Member Author

lot of changes:

  • bin/wayang-submit now adds the Java 17 --add-opens / --add-exports flags automatically.
  • base Dockerfile now builds from the assembly tarball, prunes sources/javadocs/README files, creates a smoke input, and enables FLAG_WAYANG=true using an empty config file.
  • Hadoop FileSystems registration is lazy, so clean Java-only images do not fail when Hadoop classes are absent. HDFS behavior should remain intact when Hadoop jars are present on the classpath at startup, tests passed on that.
  • Python API host is now configurable via wayang.api.python.url, wayang.api.python.host, wayang.api.python.port, or WAYANG_API_URL / WAYANG_API_HOST / WAYANG_API_PORT.
  • Added optional python/Dockerfile, python/docker/entrypoint.sh, and python/examples/wordcount.py.
  • Python-enabled image keeps Python deps out of the base Wayang image; it layers Python 3.11 and pywy on top of the assembled Wayang runtime.
  • Docker workflow now uses docker buildx build --load, runs Java WordCount smoke, builds the Python image, and smoke-tests Python against REST by hostname.

Verified:

  • bash -n bin/wayang-submit
  • python3 -m py_compile ...
  • Maven install/package with Java 17
  • docker buildx build --load -t apache-wayang:ci .
  • Java WordCount in clean container passed: found apache, test, smoke, wayang, docker
  • Base image prune check returned no README.md, *-sources.jar, or *-javadoc.jar
  • docker buildx build --load -f python/Dockerfile -t apache-wayang-python:ci .
  • Python REST/client smoke passed using WAYANG_API_HOST=wayang-rest-local; REST logs show Successfully executed WayangJob, and output was written in the REST container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants