For best experience this site requires Javascript to be enabled. To enable on your browser, follow our accessibility instructions.

Securing Machine Learning Models Against Copying

January 24, 2023
by Ali Ors

If you were to look up Lilian Virginia Mountweazel in the 1975 edition of the New Columbia Encyclopedia, you would find this entry:

“Mountweazel, Lillian Virginia, 1942-1973, American photographer, b. Bangs, Ohio. Turning from fountain design to photography in 1963, Mountweazel produced her celebrated portraits of the South Sierra Miwok in 1964. She was awarded government grants to make a series of photo-essays of unusual subject matter, including New York City buses, the cemeteries of Paris and rural American mailboxes. The last group was exhibited extensively abroad and published as Flags Up! (1972). Mountweazel died at 31 in an explosion while on assignment for Combustibles magazine.”

Ms. Mountweazel appears to be quite an interesting individual, but she never existed. Lillian Virginia Mountweazel was a fictitious character created as a copyright trap by the New Columbia Encyclopedia to detect if any other publishers were infringing on their intellectual property rights (IPR). Fake entries and fictitious facts for the purpose of detecting misappropriation are referred to as “Mountweazels”.

Intellectual property (IP) is a critical part of machine learning. Explore the Intellectual Property Aspects of Machine Learning and methods of protecting your IP.

IP rights are legal rights that protect non-tangible business assets against various types of misuse. Copyright is a well-known type of IPR and forbids the copying and dissemination of a protected work. Traditionally the protected work is of a creative nature such as music, books, photographs, etc, but it also applies to business works such as software, manuals, whitepapers and blogs. The only real requirement for copyright protection is that some form of creativity is present in the work. Copyright protection is limited to actual copying. An independent recreation of the same work is not infringement.

When it comes to IPR for machine learning models, it is not clear if copyright could be claimed on a training dataset if the classification is based on factual elements like “cat/dog” , “car/pedestrian/traffic light” as this does not impart any creativity. A copyist can then easily argue they merely collected the same or highly similar data from the original source location.

Watermarking is the process of embedding information in the content, and the embedded information may not be apparent upon normal observation. With the eIQ^® Model Watermarking tool into the NXP Semiconductor eIQ Toolkit for machine learning development, watermarking has also found its way into ML. The tool provides a workflow for the developer to extend the original training data with so-called trigger images that are generated by combining images from a given class with a secret drawing provided by the developer. These trigger images get labeled as a “watermark class,” which is a user selected class different from the actual class of the underlying image, think labeling some trigger images of actual cats in them with the label of “dog.” Training with this extended training set results in a model with a unique functionality on trigger images, the so called “Mountweazels.” This functionality is the watermark of the ML model. When trigger images are presented to an independently trained model, the resulting classification is of the actual class underlying the trigger images, but both the originally trained ML model as well as a system that copied the watermarked ML model would return the “watermark class” as classification. This would show that the model was copied from its original.

Figure 1. Detecting IP copying with eIQ Model Watermarking

An additional benefit of the NXP eIQ Model Watermarking tool is that the watermark is based on a creative element — the secret drawing — thus adding a piece of copyright-protected information to the ML model. The secret drawing helps strengthen a copyright claim towards any copyist. The copyist could counter-argue that they employed the same watermark independently, or actually created the watermark themselves to reverse the allegation of copying. To thwart such arguments, copyright owners must keep clear records of dates and times when the watermarks were chosen and inserted. Without specific records, a copyright holder would not be able to establish a claim of infringement. With NXP’s eIQ Model Watermarking tools, the necessary records associated with the inserted watermark are captured and further instructions on creating the necessary date and time records are provided to the developer. Furthermore, the NXP eIQ Model Watermarking tool is optimized to incur no performance or accuracy penalty on the model.

The NXP eIQ Model Watermarking tool is part of the eIQ Toolkit download.

The feature use is demonstrated in a training video.

Reference:

The incredible story of Lillian Virginia Mountweazel and dictionary tomfoolery | Grammar Party (grammarpartyblog.com)

Download the eIQ Watermarking Model Protection Tool

Tags: AI/ML, Technologies

Author

Ali Ors

Director, AI ML Strategy and Technologies, Edge Processing, NXP Semiconductors

Ali specializes in leading cross-functional teams to deliver innovative products and platforms in the domains of ML and vision processing. He currently leads the global AI ML strategy and technologies for NXP. Ali previously led the AI strategy, strategic partnerships and platform designs for the ADAS and autonomous products in the Automotive business at NXP. Prior to joining NXP, Ali was VP of Engineering for CogniVue Corp and in charge of R&D teams developing vision SoC solutions and Cognition processor IP cores. Ali holds an engineering degree from Carleton University in Ottawa, Canada.

Securing Machine Learning Models Against Copying

Reference:

Author

Ali Ors

Related Articles

Security Beyond the Edge

Jump Start Product Development with NXP Application Software Packs

NXP’s Machine Learning Academy Offers (Cutting) Edge ML Training