Porting Lucene: Iteration 0

Iteration 0 is often used to create a product backlog or setup technical foundation (code repos, build pipelines, etc..). I frankly thought the concept was odd. In Agile you work to dates not scope. So to have an iteration that defines scope is odd. Thus I have a different take on it.

Iteration 0 is when you “start to know what you don’t” and with each subsequent iteration you learn more and the plan changes; So with that said it is time to prime the product backlog. Let’s start with defining goals.

Goal #1: Port Lucene from Java to Rust

While it seems straight forward we need to understand what Java parts are in the Lucene project. So let’s start start with a…

git clone https://github.com/apache/lucene.git
cd lucene
ls                                            
LICENSE		build.gradle	dev-docs	gradle		gradlew.bat	lucene		versions.lock
README.md	buildSrc	dev-tools	gradlew		help		settings.gradle	versions.props

OK, So right off the bat I can see it is a Gradle project. Gradle supports multiple languages out of the box and supports plugins to support others like Rust. Here we have our first decision… Do we port the Lucene library code or do we port the project? Let’s revisit that later after we are done exploring.

The interesting thing is there are a good number of files that are very specific to how Apache runs their projects and performs releases. For instance I haven’t seen an RDF document in probably 10 years yet here is a DOAP document defining the project and all the releases per Apache standards. The two items which I should focus on is the Lucene directory with all the java files and dev-tools/scripts/smokeTestRelease.py. The later will be useful for the next goal.

Goal #2: It should pass existing test scripts

So I think this is something which will set the project apart from other ports. Providing a way to reuse existing test scripts will ensure compatibility over time. However doing so will require the ability to do 2 things.

Leverage JNI from RUST

It has been a while since I’ve used JNI so this will be a good refresher. Based on existing documentation it seems this is done all the time for Android but still an experiment is required to ensure there are not any unforeseen gotchas.

The ability to sync Tests with the parent project

This is going to be a hard one. Reading through dev-tools/scripts/smokeTestRelease.py the Release verification is more than just Unit Tests. It also verifies digest mismatch, documentation, and missing metadata. Not all of these verifications will be applicable. For instance verifying the jar “Implementation-Vendor”metadata would not apply. So valid parts of this script will need to be ported and maintained.

At first glance majority of the verification is in the form of unit tests. In fact it looks like roughly 33% of all the Java source files are unit tests. That being said the Gradle build handles preparing data which may be used in those tests.

find lucene/ |grep ".java" |wc -l
5505
find lucene/ |grep ".java" | grep test | wc -l
1844

So we probably need a mechanism to

  • pull the latest project from Lucene
  • Clear out the non-test related Java source files
  • Update the build scripts to leverage an external JAR
  • Run the tests

Of course this means the code for the port needs to be managed separately from Lucene. This is turning into more of a complex project than I expected. Going to need to spend some time understanding a good project setup which will allow this. However it is currently the end of this iteration so that will have to wait for the next one.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s