If you were to look up Lilian Virginia Mountweazel in the 1975 edition of the New Columbia Encyclopedia, you would find this
entry:
“Mountweazel, Lillian Virginia, 1942-1973, American photographer, b. Bangs, Ohio. Turning from fountain design to photography in
1963, Mountweazel produced her celebrated portraits of the South Sierra Miwok in 1964. She was awarded government grants to make
a series of photo-essays of unusual subject matter, including New York City buses, the cemeteries of Paris and rural American
mailboxes. The last group was exhibited extensively abroad and published as Flags Up! (1972). Mountweazel died at 31 in an
explosion while on assignment for Combustibles magazine.”
Ms. Mountweazel appears to be quite an interesting individual, but she never existed. Lillian Virginia Mountweazel was a
fictitious character created as a copyright trap by the New Columbia Encyclopedia to detect if any other publishers were
infringing on their intellectual property rights (IPR). Fake entries and fictitious facts for the purpose of detecting
misappropriation are referred to as “Mountweazels”.
IP rights are legal rights that protect non-tangible business assets against various types of misuse. Copyright is a
well-known type of IPR and forbids the copying and dissemination of a protected work. Traditionally the protected work is of a
creative nature such as music, books, photographs, etc, but it also applies to business works such as software, manuals,
whitepapers and blogs. The only real requirement for copyright protection is that some form of creativity is present in the
work. Copyright protection is limited to actual copying. An independent recreation of the same work is not infringement.
When it comes to IPR for machine learning models, it is not clear if copyright could be claimed on a training dataset if the
classification is based on factual elements like “cat/dog” , “car/pedestrian/traffic light” as this does not impart any
creativity. A copyist can then easily argue they merely collected the same or highly similar data from the original source
location.
Watermarking is the process of embedding information in the content, and the embedded information may not be apparent upon
normal observation. With the eIQ® Model Watermarking tool into the NXP Semiconductor eIQ Toolkit for machine learning
development, watermarking has also found its way into ML. The tool provides a workflow for the developer to extend the original
training data with so-called trigger images that are generated by combining images from a given class with a secret drawing
provided by the developer. These trigger images get labeled as a “watermark class,” which is a user selected class different
from the actual class of the underlying image, think labeling some trigger images of actual cats in them with the label of
“dog.” Training with this extended training set results in a model with a unique functionality on trigger images, the so called
“Mountweazels.” This functionality is the watermark of the ML model. When trigger images are presented to an independently
trained model, the resulting classification is of the actual class underlying the trigger images, but both the originally
trained ML model as well as a system that copied the watermarked ML model would return the “watermark class” as classification.
This would show that the model was copied from its original.
Figure 1. Detecting IP copying with eIQ Model Watermarking
An additional benefit of the NXP eIQ Model Watermarking tool is that the watermark is based on a creative element — the secret
drawing — thus adding a piece of copyright-protected information to the ML model. The secret drawing helps strengthen a
copyright claim towards any copyist. The copyist could counter-argue that they employed the same watermark independently, or
actually created the watermark themselves to reverse the allegation of copying. To thwart such arguments, copyright owners must
keep clear records of dates and times when the watermarks were chosen and inserted. Without specific records, a copyright holder
would not be able to establish a claim of infringement. With NXP’s eIQ Model Watermarking tools, the necessary records
associated with the inserted watermark are captured and further instructions on creating the necessary date and time records are
provided to the developer. Furthermore, the NXP eIQ Model Watermarking tool is optimized to incur no performance or accuracy
penalty on the model.
The NXP eIQ Model Watermarking tool is part of the eIQ Toolkit download.
The feature use is demonstrated in a training video.
Reference:
The incredible story of Lillian Virginia Mountweazel and dictionary tomfoolery | Grammar Party (grammarpartyblog.com)
Download the eIQ Watermarking Model Protection Tool