How porting Lucene made me care about bit operations…

I am going to be honest… I haven’t touched binary operations since I attended a university assembly class about 20 years ago. So when I came across the writeVInt and readVInt methods from DataOutput and DataInput base classes I thought this would be a good to brush up. I lost a good few days because I did not consider the difference between arithmetic and logical shifts.

Unlike Java Rust does not have separate operators for arithmetic and logical operators. In java >> is arithmetic and >>> is logical. However in the rust documentation there was a footnote I completely glanced over stating.

** Arithmetic right shift on signed integer types, logical right shift on unsigned integer types.

https://doc.rust-lang.org/reference/expressions/operator-expr.html#arithmetic-and-logical-binary-operators

So what does this mean in practice is a signed data primitive like i8 when shifted gets a 0 or 1 based on it”s sign. So -127 (10000000) becomes -64 (11000000). So what is wrong with this logic? The implementation of variable length quantity in Lucene uses prefix 0’s to determine how many 7 bit bytes to write. For instance a typical VByte encoding is:

Valuebyte 1byte 2byte3
000000000
100000001
200000010
12701111111
1281000000000000001
1291000000100000001
1301000001000000001
163831111111101111111
16384100000001000000000000001

So for positive or unsigned numbers this logic transferred over easily. However a negative number would cause an infinite loop. To work around this I flip the sign after shifting the bits for the first run. So it now looks like this.

loop stepvaluebinary
1-21474836480b10000000000000000000000000000000
2167772160b00000001000000000000000000000000
31310720b00000000000000100000000000000000
410240b00000000000000000000010000000000
580b00000000000000000000000000001000

After spending all that time making negative numbers working with variable length integers I realized that use case may never be used. The documentation for those methods specifically states

Negative numbers are supported, but should be avoided.

https://lucene.apache.org/core/6_4_0/core/org/apache/lucene/store/DataOutput.html

When I asked the lucene mailing list about this I got

They are fully supported, so you can write and read them.
The problem with negative numbers is that they need lot of (disk) space, because in two’s complement they have almost all bits set. The largest number is kinds of disk space is -1. Negative numbers appear in older index formats, so they can’t be prevented. Just take the comment as given: all is supported, but if you want to store negative numbers use a different encoding, e.g. zigzag.

http://mail-archives.apache.org/mod_mbox/lucene-java-user/202109.mbox/%3cBEFCBB73-0848-4EB6-80E8-249D3D14EE30@thetaphi.de%3e

Nevertheless I now know the difference between arithmetic and logical shifts in Rust.

The most carrot pasta I have ever made.

During the year of Covid I started a garden. I’ve had plants in pots before but this was the first time I had a section of land dedicated to plants. It turned out to be a great education opportunity for my son. However now I need to actually do something with what I grew. I was amazed how much plant there was above the carrot and wanted to make a meal which used it.

Ingredients:

  • 1+ carrots with top. Enough to make 2 cups of greens once the stems are removed.
  • zucchini (optional)
  • 2 cups of baby spinach
  • 3-4 cloves of garlic.
  • 1 cup of roasted¬†unsalted cashews
  • olive oil
  • Salt & pepper
  • 1lb of pasta (penne works well)
  • Parmesan to taste

Instructions

  • Preheat oven to 425
  • Separate carrot(s) from top and wash/peel
  • Slice carrots and zucchini (optional) to somewhat equal size
  • Toss carrots with olive oil, salt, and pepper.
  • Place on sheet pan and put in the oven for about 20 minutes
  • Wash and separate the greens from the tough stems. Also remove anything that looks…ugly
  • Put greens, spinach, garlic, roasted unsalted cashews, and 1 cup of olive oil into a blender.
  • Pulse until smooth. Depending on your blender you may want to add the olive oil in parts.
  • Cook pasta until al-dente
  • Drain most the water but not all. roughly half a cup
  • Combine all and serve with parmesan.

Grill + Veggies + Pasta

Two of our new favorites summer pandemic activities come together in this dish which is gardening and grilling. Grilling because I got a new grill and I intend to use it. We also joined our community garden and now have more squash than we can count.

Ingredients

Pasta: Penne works best but others like rigatoni or elbow macaroni will work to.

Veggies:

  • 1-2 large Zucchini
  • 1-2 large Summer squash
  • 1 lb Asparagus
  • 1 ln Bell peppers
  • Kosher salt & pepper to taste

Sauce:

  • 1 package of goat cheese
  • 1/3 cup olive oil
  • 3 Tbsp balsamic vinegar
  • 2 Tbsp mayonnaise
  • 1/2 Tbsp Dijon mustard
  • 1 clove garlic, minced
  • 1/2 tsp dried basil
  • 1/2 tsp salt
  • Pepper to taste

Steps

  • Warm up the grill to 400-500 degrees F
  • Toss asparagus with olive oil, salt & pepper
  • Cut squash evenly and toss with olive oil, salt, & pepper.
  • Place veggies on grill. You may want to do this in batches. Asparagus and peppers are usually done first. I usually rotate them every 2 minutes.squash is usually closer to 4 depending on thickness.
  • Cook past to slightly Al dente. Reserve 1/3-1/2 cup of pasta water and drain the rest.
  • Combine reserved water with the rest of the sauce ingredients minus the goat cheese.
  • Roughly chop veggies.
  • Combine pasta, sauce, & veggies.
  • Crumble in goat cheese and combine.

Kubernetes useful tricks: Creating a secret from a file in an image

Recently I had an interesting problem. The product I was working on needed to create a secret from a file. Now on one hand this is an easy thing as you can just have a job run within an image

kubectl create secret generic mysecret --from-file=./file.txt

Ah… However Kubernetes command line client is only compatible within one minor version of the Kubernetes api server. So if you want to support all major version for Redhat Openshift currently supported you have to support Kubernetes 1.11 to 1.21. So to get around this problem you have to curl against the kubapi server directly. Here is a sample script I created to demonstrate this method.

apiVersion: batch/v1
kind: Job
metadata:
  name: readfiletosecret
  description: "Example of reading a file to a kubernetes secret."
spec:
  template:
    spec:
      serviceAccountName: account-with-secret-create-priv
      volumes:
      - name: local
        emptyDir: {}
      initContainers:
      - name: get-file
        image: registry.access.redhat.com/ubi8/ubi-minimal:latest
        command:
        - "/bin/sh"
        - "-c"
        env:
        - name: UPLOAD_FILE_PATH
          value: "/root/buildinfo/content_manifests/ubi8-minimal-container*.json"
        args:
        - |
          cat $UPLOAD_FILE_PATH
          cp -vf $UPLOAD_FILE_PATH /work/
        volumeMounts:
        - name: local
          mountPath: /work
      containers:
      - name: create-secret
        image: registry.access.redhat.com/ubi8/ubi-minimal:latest
        command:
        - "/bin/sh"
        - "-c"
        args:
        - |
          ls /work/
          i#
          export CONTENT=$(cat /work/* | base64 )
          echo $CONTENT
          #Set auth info
          export SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
          export NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
          export TOKEN=$(cat ${SERVICEACCOUNT}/token)
          export CACERT=${SERVICEACCOUNT}/ca.crt
          export APISERVER="https://kubernetes.default.svc"
          # Explore the API with TOKEN
          curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X POST -H 'Accept: application/json' -H 'Content-Type: application/json' -d @-  ${APISERVER}/api/v1/namespaces/$NAMESPACE/secrets <<EOF
          {
            "kind": "Secret",
            "apiVersion": "v1",
            "metadata": {
              "name": "example"
            },
            "data": {
              "file": "$CONTENT"
            }
          }
          EOF
          rm /work/*
        volumeMounts:
        - name: local
          mountPath: /work
      restartPolicy: Never
  backoffLimit: 4

Porting Lucene: Iteration 0

Iteration 0 is often used to create a product backlog or setup technical foundation (code repos, build pipelines, etc..). I frankly thought the concept was odd. In Agile you work to dates not scope. So to have an iteration that defines scope is odd. Thus I have a different take on it.

Iteration 0 is when you “start to know what you don’t” and with each subsequent iteration you learn more and the plan changes; So with that said it is time to prime the product backlog. Let’s start with defining goals.

Goal #1: Port Lucene from Java to Rust

While it seems straight forward we need to understand what Java parts are in the Lucene project. So let’s start start with a…

git clone https://github.com/apache/lucene.git
cd lucene
ls                                            
LICENSE		build.gradle	dev-docs	gradle		gradlew.bat	lucene		versions.lock
README.md	buildSrc	dev-tools	gradlew		help		settings.gradle	versions.props

OK, So right off the bat I can see it is a Gradle project. Gradle supports multiple languages out of the box and supports plugins to support others like Rust. Here we have our first decision… Do we port the Lucene library code or do we port the project? Let’s revisit that later after we are done exploring.

The interesting thing is there are a good number of files that are very specific to how Apache runs their projects and performs releases. For instance I haven’t seen an RDF document in probably 10 years yet here is a DOAP document defining the project and all the releases per Apache standards. The two items which I should focus on is the Lucene directory with all the java files and dev-tools/scripts/smokeTestRelease.py. The later will be useful for the next goal.

Goal #2: It should pass existing test scripts

So I think this is something which will set the project apart from other ports. Providing a way to reuse existing test scripts will ensure compatibility over time. However doing so will require the ability to do 2 things.

Leverage JNI from RUST

It has been a while since I’ve used JNI so this will be a good refresher. Based on existing documentation it seems this is done all the time for Android but still an experiment is required to ensure there are not any unforeseen gotchas.

The ability to sync Tests with the parent project

This is going to be a hard one. Reading through dev-tools/scripts/smokeTestRelease.py the Release verification is more than just Unit Tests. It also verifies digest mismatch, documentation, and missing metadata. Not all of these verifications will be applicable. For instance verifying the jar “Implementation-Vendor”metadata would not apply. So valid parts of this script will need to be ported and maintained.

At first glance majority of the verification is in the form of unit tests. In fact it looks like roughly 33% of all the Java source files are unit tests. That being said the Gradle build handles preparing data which may be used in those tests.

find lucene/ |grep ".java" |wc -l
5505
find lucene/ |grep ".java" | grep test | wc -l
1844

So we probably need a mechanism to

  • pull the latest project from Lucene
  • Clear out the non-test related Java source files
  • Update the build scripts to leverage an external JAR
  • Run the tests

Of course this means the code for the port needs to be managed separately from Lucene. This is turning into more of a complex project than I expected. Going to need to spend some time understanding a good project setup which will allow this. However it is currently the end of this iteration so that will have to wait for the next one.

Building a boat in the basement….

It goes without saying that 2020 was a tough year. While 2021 is getting better we are still not out of the woods yet. Sometimes the best way to get through it is to have a project that isn’t related to work but helps exercise the mind. Now I am not actually building a boat. My wife would kill me. However I am starting a project of similar ambition.

While granted I am a manager I still like to keep my development skills sharp. This is why I jump in to write code that helps my team reach their goals. This can range from everything from GO to Python. So for a personal project I don’t want to program in any of those.

Thus I am going to take a few open source projects and port them to Rust. I am gong document my experience and what works/doesn’t. Now I intent to do something a bit different for this port. I am going to use JNI to ensure the port is 100% compatible with the old one.

Choosing the right project

It goes without saying that there are plenty of open source projects. So here is the criteria I am using for selecting the right project to port.

  • It should be an established project which is not radically changing. This will make maintenance of the port easer over time.
  • It should have existing ports and the community should be open to new ports.
  • It should be large but not so large that it can never be completed.
  • It should be a project I am familiar with.

Based on this criteria I have decided to port the Lucene project. I have worked with Lucene on multiple projects back in the day. It will be good to revisit it and understand better how it works under the hood.

Applying Behavior Driven Development practices to infrastructure. 

Earlier this summer I worked with my team to apply Behavior Driven Development practices to infrastructure that we deployed our products to. Prior to this the DevOps team simply identified toil and implemented solutions. Unfortunately this meant the reasons behind the changes would get lost over time. Having GHERKIN files with your solution means that information would not get lost. Interesting enough we were able to reduce the total size of the source code considerably because some of the user stories were no longer relevant.

I wrote a tutorial based on this experience that can be found here.