Automated Reverse Engineering Documentation

Automating the First Half of Clean Room Reverse Engineering for GNU/Linux Device Drivers


A device driver is a system component that contains code specific to a particular device (mouse, keyboard, network card, etc). The device drivers are, in general, developed by the device manufacturers or by work teams within the operating system developers (for device drivers that abide to well documented standards, such as keyboards and mice).

A major problem in the acceptance of operating systems based on Free, Libre and Open Source Software is the lack of manufacturer-provided device drivers. Due to this shortcoming, it is usually easier to employ newer devices with closed-source operating systems.

An usual way to solve this conundrum is to resort reverse engineering, a process in which the practitioner will study the device driver provided by the manufacturer in order to get insights about its functioning and then write a similar device driver based on Free Software.

In clean-room reverse engineering, to avoid legal issues, the process of reverse engineering is divided into two independent teams. One team works with the closed-source operating system and the device driver as supplied by the manufacturer. This team generates documentation regarding the data flow between the computer and the device, using special-purpose snooping programs or even special hardware devices. The second team will then use the documentation produced by the first team to write a device driver for a specific Free Software-based operating system. The produced documentation, of course, can be used to produce device drivers for multiple operating systems.

In this project, we are looking into the feasibility of automating the first stage of clean-room reverse engineering. We are looking into interacting with the closed-source operating system in an automated manner and analyze the data flow between the device and the computer. We have so far decided to focus on USB-based wireless network devices under GNU/Linux. This combination of devices and operating system are of practical importance due to their widespread availability and somewhat lacking support under the operating system of choice.

In the system we are building, an inference module decides which type of interaction to engage automatically within the closed-source operating system and then records the result of the said interaction as the data flow between the device driver and the physical device. With a collection of such data flows, we seek to infer an explaining model of the functioning of the device driver and then use such model as input to a natural language generation system.




Present: DrDub, Gainlo

Looking at REnouveau.

We should define an input language for the commands sent to the graphics card plus a language for the modifications made to the memory of the device.

These languages can be obtained automatically, using the type of techniques for unsupervised grammar induction.

Two options:

  • all tests for one graphics card
  • same test across different graphic cards
    • this task might produce an output of value to the community


  • We will do the same-test-across-cards first
  • We then contact nouveau and see if there's value continuing that line of work
  • We will acquire a WN111 Netgear Wireless USB concurrently and see if we pursue that path if the nouevau approach doesn't work well