Bit Packing

Camera ADCs typically output 10, 12, or 14-bit data, which exceeds a single byte. The simplest way to store such data is using 16-bit integers (uint16). However, this wastes some space: 10-bit data wastes 6 bits, 12-bit data wastes 4 bits, and 14-bit data wastes 2 bits.

Even without any advanced compression, bit packing alone can save considerable space. The principle of bit packing is to arrange data from multiple pixels tightly together, avoiding the wasted space of higher bit-depth containers. For example, four 10-bit pixels can be packed into 5 bytes (40 bits), whereas storing them as 16-bit integers would require 8 bytes (64 bits), saving 37.5% of space.

12-bit and 14-bit data can be packed similarly:

  • Two 12-bit pixels can be packed into 3 bytes (24 bits), saving 25% of space.
  • Four 14-bit pixels can be packed into 7 bytes (56 bits), saving 12.5% of space.

DNG is an extension of the TIFF format, and its specification requires bit packing to store pixel data other than 8-bit and 16-bit to save space.

Bit Packing with Numpy

The PiDNG library provides functionality to convert Numpy arrays to DNG format, which involves bit packing.

The 14-bit packing code in PiDNG contains an error, attempting to pack 6 14-bit pixels into 7 bytes. I have used this code to test the coding capabilities of LLMs. Additionally, the bit operations used for 12-bit packing have a subtle flaw that no model has yet identified. For details, see Using LLMs for Colour Science: Best Practices and Test.

Taking the simplest 12-bit packing as an example, here is how to implement bit packing using Numpy.

def pack12(data: np.ndarray) -> np.ndarray:
    out = np.zeros((data.shape[0], int(data.shape[1] * 3 // 2)), dtype=np.uint8)
    out[:, ::3] = (data[:, ::2] & 0x0FF0) >> 4
    out[:, 1::3] = (data[:, ::2] & 0x000F) << 4 | (data[:, 1::2] & 0x0F00) >> 8
    out[:, 2::3] = data[:, 1::2] & 0x00FF
    return out

This function takes a two-dimensional Numpy array data representing pixel values of an image, where each pixel is a 16-bit unsigned integer in the range 0 to 4095 (12 bits), meaning only the lower 12 bits are used.

First, a new array out is created with a width 1.5 times that of the original array (since every 2 pixels are packed into 3 bytes), with uint8 data type corresponding to one byte.

Then, using bitwise operations and Numpy slicing, the data from every two 12-bit pixels is split and stored in the out array:

  • The first byte stores the upper 8 bits of the first pixel, obtained using an AND operation and right-shifting by 4 bits.
  • The upper 4 bits of the second byte store the lower 4 bits of the first pixel, whilst the lower 4 bits store the upper 4 bits of the second pixel, achieved through AND operations and bit shifting.
  • The third byte stores the lower 8 bits of the second pixel, obtained using an AND operation.

These bitwise and slicing operations are vectorised, allowing efficient processing of the entire array.

Padding

The code above looks quite elegant, but it assumes that the input image width is even. For odd-width images, the created out array would have an incorrect size, causing the slicing operations to fail. Similarly, 10-bit and 14-bit images require widths that are multiples of 4.

Reviewing the principles of bit packing, the following requirements must be met:

  • Continuous bit stream: All pixel data in a row is treated as a continuous bit stream. For example, in a 10-bit image, pixel 1 occupies bits 0-9, pixel 2 occupies bits 10-19, and so on.
  • Byte alignment: The minimum unit for file storage is a byte (8 bits).
  • End-of-row padding: When the total number of bits in a row is not divisible by 8, zeros are padded at the end of the row until the next byte is filled.

To solve this problem whilst maintaining the efficiency of bitwise operations, padding is introduced when the input image width is odd. Specifically, extra columns are added to the right side of the input array, filled with zeros. After packing, the excess bytes are removed.

For a 10-bit image with width 101, the padding and packing process is as follows:

  1. The 10-bit packing method packs every 4 pixels into 5 bytes, so the width needs to be padded to the nearest multiple of 4, which is 104.
  2. Pad with 3 columns of zeros, making the input array width 104.
  3. Perform bit packing, packing each row into 104 / 4 * 5 = 130 bytes.
  4. Calculate the actual number of bytes needed: 101 pixels occupy 1010 bits, which is 126.25 bytes, rounded up to 127 bytes.
  5. Finally, extract the first 127 bytes from the packed array, discarding the excess 3 bytes.

This approach satisfies the requirements of bit packing, enables packing of images with arbitrary widths, and maintains efficiency in practice.

Improvements on PiDNG: numpy2dng

The bit packing in the PiDNG library does not account for padding and can only correctly handle images with widths that are multiples of 4 (10-bit), multiples of 2 (12-bit), and multiples of 4 (14-bit). For images with other widths, errors occur during slicing.

Besides bit packing, PiDNG also supports lossless compression using JPEG92, implemented in C via ljpeg92. Although more efficient, this requires on-site compilation during installation. In complex environments, this may lead to installation failures:

Resolved 15 packages in 15ms
  × Failed to build `pidng==4.0.9`
  ├─▶ The build backend returned an error
  ╰─▶ Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit code: 1)

There are also some minor issues, such as Raspberry Pi-specific camera metadata in the code, a rather large RAW sample file in the Git repository, and the error in 14-bit packing.

Therefore, with the collaboration of MiMo-V2-Flash and GPT-5.2, building upon PiDNG, we temporarily removed JPEG92 compression, fixed the bit packing issues, switched to the more modern uv and hatchling for packaging, removed unnecessary files, and released a new library called numpy2dng. It can be installed directly via pip or uv:

pip install numpy2dng
# or
uv add numpy2dng

The interface remains largely consistent with PiDNG, making it convenient to save Numpy arrays as DNG files.