How porting Lucene made me care about bit operations…

I am going to be honest… I haven’t touched binary operations since I attended a university assembly class about 20 years ago. So when I came across the writeVInt and readVInt methods from DataOutput and DataInput base classes I thought this would be a good to brush up. I lost a good few days because I did not consider the difference between arithmetic and logical shifts.

Unlike Java Rust does not have separate operators for arithmetic and logical operators. In java >> is arithmetic and >>> is logical. However in the rust documentation there was a footnote I completely glanced over stating.

** Arithmetic right shift on signed integer types, logical right shift on unsigned integer types.

So what does this mean in practice is a signed data primitive like i8 when shifted gets a 0 or 1 based on it”s sign. So -127 (10000000) becomes -64 (11000000). So what is wrong with this logic? The implementation of variable length quantity in Lucene uses prefix 0’s to determine how many 7 bit bytes to write. For instance a typical VByte encoding is:

Valuebyte 1byte 2byte3

So for positive or unsigned numbers this logic transferred over easily. However a negative number would cause an infinite loop. To work around this I flip the sign after shifting the bits for the first run. So it now looks like this.

loop stepvaluebinary

After spending all that time making negative numbers working with variable length integers I realized that use case may never be used. The documentation for those methods specifically states

Negative numbers are supported, but should be avoided.

When I asked the lucene mailing list about this I got

They are fully supported, so you can write and read them.
The problem with negative numbers is that they need lot of (disk) space, because in two’s complement they have almost all bits set. The largest number is kinds of disk space is -1. Negative numbers appear in older index formats, so they can’t be prevented. Just take the comment as given: all is supported, but if you want to store negative numbers use a different encoding, e.g. zigzag.

Nevertheless I now know the difference between arithmetic and logical shifts in Rust.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s