How to Correctly Read RAW Files

Rotating Images

The examples below assume the image was shot in landscape orientation. If you need to align portrait shots, rotate them according to the EXIF Orientation tag. For example, using Python’s exifread:

with open(image_path, 'rb') as f:
    tags = exifread.process_file(f)
    orientation = tags.get('Image Orientation', 'Unknown')
    print(f"Image Orientation: {orientation}") # e.g. Rotated 90 CCW

Tools for Reading RAW Data

Dcraw

The most famous tool for reading RAW data is undoubtedly dcraw. Dcraw can convert RAW files of various encoding formats into TIFF or PPM format.

Using the command line dcraw -4 -T -D file_name yields a 16-bit TIFF file that records the direct RAW values, without any processing such as demosaicing, white balancing, or black level subtraction.

Unfortunately, dcraw’s last update was on 1 June 2018. Consequently, it no longer includes additional parameters (like white balance, colour matrices, etc.) for cameras released after that date, and even its core function of extracting RAW data may not work reliably.

For instance, with the lossless compressed RAW format introduced by Sony in their fourth-generation cameras (e.g., ILCE-7M4), dcraw reports a “cannot decode file” error, even though the file extension is still .arw. In contrast, it can still decode the older uncompressed RAW format.

dcraw.c is the core file of dcraw, consisting of over ten thousand lines of pure C code.

Rawpy/LibRaw

Rawpy is a Python wrapper for LibRaw. LibRaw provides a unified interface for accessing RAW data to extract pixel values. It is based on dcraw, having refactored dcraw.c into a more modern and modular library, and has continued to be supported after dcraw ceased development.

Differences Between Methods and Brands

Sony

Uncompressed RAW

On newer bodies (likely from the ZV-E10 II onward), Sony no longer offers an uncompressed RAW option.

Test model: ILCE-7CM2:

The following methods yield identical results when reading uncompressed RAW files:

Converting with Dcraw to TIFF and then reading with OpenImageIO
Reading directly with Rawpy
Converting with Adobe DNG Converter and then reading with Rawpy
Converting with Adobe Camera Raw and then reading with Rawpy (same as above, although the dimensions appear different when viewed in ACR)
Converting with Adobe DNG Converter, then converting to TIFF with dcraw, and finally reading with OIIO

The dimensions read by all the methods above are (4688, 7040); in short, every approach produces the same result because the file is uncompressed.

In practice, the DNG files converted by Adobe DNG Converter and Adobe Camera Raw are completely identical, so this will not be detailed further. Reading DNG files with Rawpy and dcraw is also equivalent.

Lossless Compressed RAW

Sony’s lossless compressed RAW pads the data with zeros so both dimensions are multiples of 512, then splits it into blocks, and applies differential and Huffman coding to four Bayer sub-images.

For lossless compressed RAW, things are more complex because there is currently no way to convert between uncompressed ARW and lossless compressed ARW files. The tested scenarios are:

Dcraw does not support lossless compressed RAW (as Sony introduced it after dcraw was no longer being updated).
Reading the ARW with Rawpy yields dimensions of (5120, 7168). This is due to the block-based compression (a multiple of 512). Only the top-left (4688, 7040) area contains image data; the rest is filled with zeros (not the black level), and the values range from 0-16383.
After converting to DNG with Adobe DNG Converter and then reading with Rawpy, the resulting dimensions are (4686, 7038), which is two pixels smaller in each dimension than the uncompressed RAW. If you crop 2 pixels from the bottom and 2 from the right of the padded image from the previous scenario, the results match perfectly, also with a range of 0-16383.

Compressed RAW: Compressed HQ

Sony’s old lossy compressed RAW was infamous for its artifacts. Starting with the a7M5, Sony introduced an improved lossy mode called Compressed HQ, which aims to balance quality and file size. At present, only Sony’s own IEDT can decode this format. Once third-party tools add support, we can evaluate it more closely.

Regarding DNGs from Capture One and Other Software

Theoretically, a codec that decodes RAW and encodes to DNG should not introduce complex errors. However, DNGs exported from Capture One not only have different dimensions but also stretch the original 14-bit data to 16-bit, and they do not perfectly match the RAW data read by other methods.

With the help of Gemini and DeepSeek, a more detailed analysis was conducted. Regarding the conversion from 14-bit to 16-bit, Capture One appears to perform a left bit shift by two places, which is equivalent to multiplying by 4. After left-shifting the ARW data read by Rawpy and then comparing it with the C1-exported DNG (by division and subtraction), the resulting quotient is 1.000004, and the difference is on the order of e-7. The R and B channels match perfectly; all errors come from the two G channels and are content-dependent. In some images, the maximum error in the G channels can reach up to 10%, though in most cases, it does not exceed 5%.

DNG is a specification-driven format and should not modify the RAW data itself, so it’s best to avoid using non-Adobe tools to convert RAW images to DNG.

Best Practices for Sony RAW

In summary, the recommended approach for using Sony RAW files is to shoot in uncompressed RAW and then read them directly with Rawpy. You can use Adobe DNG Converter to conveniently convert uncompressed RAW files into lossless compressed DNGs to reduce file size without any loss. Alternatively, you can read lossless compressed RAW files with Rawpy and crop them, but be aware that converting lossless compressed RAW to DNG will result in the loss of two rows and two columns of pixels.

Canon

Test model: a 600D pulled from a dataset, which outputs CR2 files.

Reading with Rawpy and reading after conversion to DNG yield identical results. The image dimensions are (3516, 5344). The 142 pixels on the left and 51 pixels on the top appear to be the optical black area (the part physically masked for black level calibration), which reads out values close to the black level. The remainder is the image.

Dcraw can process files from the 600D. It reads out the part without the optical black area, which matches the cropped data from Rawpy or a DNG conversion.

The CR3 output from an R6 Mark II (read with Rawpy or via DNG conversion) is similar. The left 154 pixels and top 96 pixels constitute the optical black area, and there is also a white area of 8 pixels on the right.

For the R6 Mark III, the optical black areas seem to be gone; the entire readout is image data. The CR3 format is impressively advanced—it’s lossless while still achieving a high compression ratio.

Hasselblad

The test model is the Hasselblad X2D-100C, which directly outputs RAW files in 3FR format. The sensor model can be confirmed directly from the 3FR file as the Sony IMX461-BQR.

Hasselblad’s historical RAW workflow involved two file formats: 3FR and FFF.

With the release of the new version of Phocus, the FFF file has been removed from the RAW workflow, and it is now unnecessary to convert to FFF before processing RAW images.

In the older workflow, users could convert 3FR to FFF via the Phocus software. There were some optional adjustment settings during conversion, but these did not affect the FFF file’s original data (for instance, the results read using rawpy were the same). From the file header, it can be seen that 3FR (49 49 2A 00) follows the little-endian TIFF specification, while FFF (4D 4D 00 2A) is big-endian TIFF.

The public specifications for the IMX461-BQR show a total pixel count of 11760×8896 and an effective pixel count of 11664×8750. However, directly parsing the 3FR file using tools like dcraw or rawpy yields an oversized image of 11904×8842. This image contains the following areas:

Image Content Area: Dimensions are 11664×8750, consistent with the 461’s effective pixels.
Optical Black: Surrounding the image content area, 48px on the left and right, and 90px on the top.
Additional Content: On the outermost periphery, including 76px on the left, 68px on the right, and 2px on the top of non-image data.

Analysis shows that the image width including the Optical Black field is 11760px (which matches the sensor’s total pixel width), but the height is 8840px, showing a slight discrepancy.

Both 3FR and FFF can be converted to the DNG format. The size of the converted DNG file will be cropped to match the effective pixel area. In terms of content, although the image content is aligned, there are some minor numerical differences.

Therefore, the current best practice for processing Hasselblad 3FR files appears to be directly using libraw to read and extract the effective image area, and utilizing the Optical Black data within the image for precise black level correction.

Fujifilm

Test model: X-T5.

Fujifilm’s RAW files are rather unique because their colour filter array is not a standard Bayer pattern but an X-Trans pattern, which has a minimum repeating unit of 6x6 pixels. Fortunately, this does not affect our analysis of the raw image data itself.

Feeding the RAF file directly into Rawpy yields an image with a width of 7872 and a height of 5196. After conversion to DNG, the width is 7728 and the height is 5152. The distribution of the extra 144 pixels in width and 44 pixels in height is as follows:

12 pixels of image data on the left; 12 pixels of image data and 120 black pixels on the right.
16 pixels of image data and 5 black pixels at the top; 16 pixels of image data and 7 black pixels at the bottom.

The pixel values in the overlapping areas are identical.

Additionally, Rawpy reads the RAF’s raw pattern incorrectly, whereas the raw pattern in the DNG is correct.

One caution: LibRaw provides a halfsize option that merges RGGB into a single pixel to output a half-resolution three-channel image. In theory, this should not be used on X-Trans sensors, but LibRaw still allows it, seemingly running the same pipeline as Bayer sensors and producing incorrect results.

Nikon

Nikon is the simplest case. The test model is a Z6 outputting NEF files. Reading directly with LibRaw or after conversion to DNG yields images of (6064, 4040); there is no extra padding, and the content is identical.

Summary

The easiest workflow is usually to convert to DNG: the files get smaller, the format is unified, and non-image regions are cropped away for easier processing. If you want to preserve the original format, metadata, and cropped-out regions, read the original RAW with LibRaw or Rawpy. Across most tests, the overlapping image areas have identical pixel values.

Avoid using non-Adobe tools to convert RAW to DNG. Since dcraw is no longer maintained, migrate to LibRaw/Rawpy.

Rotating Images#

Tools for Reading RAW Data#

Dcraw#

Rawpy/LibRaw#

Differences Between Methods and Brands#

Sony#

Uncompressed RAW#

Lossless Compressed RAW#

Compressed RAW: Compressed HQ#

Regarding DNGs from Capture One and Other Software#

Best Practices for Sony RAW#

Canon#

Hasselblad#

Fujifilm#

Nikon#

Summary#