Steve Chan

Artificial Intelligence,
Machine Learning,
Numerical Algorithms,
Numerical Methods,
Hyper-Heuristics,
Metaheuristics,
Data Analytics,
Information Science,
Decision Science

Experimentation for Current IEEE Paper

Over the years, an innumerable number of hours have been spent on the study, exploration, and experimentation with anomaly detection tools for various fields (e.g., power systems engineering, a field of electrical engineering). Like many, a substantial portion of those research hours were allocated towards investigating Object Detection Models/Methods (ODM). These ODM are used in a variety of sectors, and in the power engineering sector, it is prevalently used for Condition Monitoring (CM) and Fault Detection and Classification (FDC). One such well-known ODM is You Only Look Once (YOLO), which is often leveraged for automated visual inspection. YOLO can readily identify Areas of Interest (AOI) (e.g., damaged connectors, conductors, insulators, etc.), as YOLO can be trained to detect known anomalies as classes (e.g., burned connectors, exposed conductors, cracked insulators, etc.). Besides this direct detection capability, YOLO can also facilitate baselining by detecting components under normal states; hence, when components are in an abnormal state, the associated anomalies can be inferred. In essence, YOLO can undertake its ODM responsibilities, ascertain AOIs, and can then invoke the execution of an anomaly model. Therein lies an opportunity. With the advent of these Artificial Intelligence (AI) times, AI-facilitated image analysis (which is often based upon multimodal transformer models) is increasingly being looked to as an alternative for classical anomaly autoencoders, which are often utilized for infrastructure monitoring and fault detection. The reason for this is axiomatic. As infrastructure ages, it is advantageous to engage in more proactive CM before matters become catastrophic. While autoencoder-based anomaly detection is quite good for both known and unknown faults, it is somewhat challenged by very small anomalies; in turn, while patch-based anomaly detection is quite good for those very small anomalies, the performance speed is much slower than ODMs, such as YOLO. In contemporary times, diffusion-based anomaly detection is being looked to for addressing the more subtle anomalies, but again, the performance speed and computational cost are prospective issues. Accordingly, transformer-based vision models are being looked to for potentially complementing/supplementing the anomaly detection pipeline.

Anomaly Detection (AD) & Anomaly Generation (AG)

(as referenced in Section IIIC. Experimentation)

Currently, there is interest in exploring how vision-language models might be leveraged in complementing/supplementing the anomaly detection pipeline by identifying visual anomalies (e.g., damaged wires, discolored components) and positing prospective causes (e.g., natural and/or accelerated degradation resulting in wire damage, overheating and/or burning as causing component discoloration). However, the envisioned efficacy does not necessarily reside in arena of the automated inspection pipelines at scale. Rather, the exploration centers upon the creation of synthetic anomaly images/datasets (via generative AI models) for training the underpinning Machine Learning (ML) systems of anomaly detection systems. In essence, generative models can robustly create representative anomalies, which anomaly detection systems can be trained on. By leveraging diffusion-based capabilities, the referenced anomalies can be inserted into images depicting normal states. On the surface, this may seem to be a trite matter, but it turns out the controlling the anomaly location and subtlety is a bit on the non-trivial side. This involves consideration on the Prompt Engineering (PE) side. In addition, as the prompts are specifically structured to ensure valid Contextual Engineering (CE) references (e.g., introduce exactly one power-engineering wiring anomaly that is physically realistic and consistent with faults observed during infrastructure inspections) as well as realism constraints (e.g., the anomaly must represent early-stage degradation, not severe damage), the counterpoising of the involved PE/CE is carefully considered. Hence, the Section III Experimentation section of the paper addresses matter in a particular fashion/sequence that is emblematic of prior Generative Adversarial Network (GAN), Deep Convolutional Generative Adversarial Network (DCGAN) (a particular GAN architecture for images), et al. studies and implementations. From the GAN/DCGAN studies, in a fashion similar to training the Discriminator (D) and Generator (G) concurrently in a minimax game, the Anomaly Detection (AD) and Anomaly Generation (AG) are both considered within the paper.

Experimentation with AD & AG

(as referenced in Section IIIC. Experimentation)

As noted in the prior section, AD (the “D” side) and AG (the “G” side) were treated concurrently in a minimax/adversarial fashion. Issues emerged early on during the preliminary experimentation. The issue of False Positives (FPs)/False Negatives (FNs) became prevalent for AD amidst CE, and the issue of AI hallucinations and non sequitur/spurious results emerged for AG amidst CE. Constraints were implemented for PE, and the ensuing experimental progression is delineated in the Section III Experimentation results section. The utilized experimentation PE variants for AD and AG are available at the links Experimentation Prompts for Anomaly Detection (AD) and Experimentation Prompts for Anomaly Generation (AG), respectively. The associated experimental results are shown in Figs. 6 and 8 within the paper.

First, for AD on the D side, as can be seen in Fig. 6 within the paper, AD #15, #10, #13, #9, and #14 exhibited the highest efficacy; this alluded to the notion that Facets #9, #10, #6, #11, and #5, among others of Table V within the paper, better facilitated matters. The facets and thematics of the referenced table are provided below.

Facet	Thematic
1	Visual Attention Cues (when detecting anomalies and precise localization is needed)
2	Chain-of-Thought (CoT) Prompting/Structured Visual Reasoning
3	Knowledge Grounding (with verified knowledge)
4	Major Features and Candidates
5	Multi-Agent
6	Multi-Pass/Multi-Path (when confronted with ambiguous and complex visual reasoning)
7	Stage-Based Prompt
8	Strategic Repetition (when detecting small and subtle anomalies)
9	Triangulated Evidence
10	Uncertain Detection Re-affirmation
11	Visual-Attention

For a more granular understanding of AD #15, a number of variants (AD #17 - #25), which are available at the link Derivative Experimentation Prompts for AD, were devised and tested. The results are shown in Figure 7 within the paper. In essence, for these AD #15 Derivatives on the D side, as can be seen in Fig. 7 within the paper, AD #23, #17, #20, #19, and #24 exhibited the highest efficacy. This seemed to reaffirm the import of Facets #9, #10, #6, and #11; of note, for the experimentation pertaining to this paper, #5 did not necessarily seem to impact matters (likely due to a paucity of trusted, well provenanced materials with the varied Multi-Agent vantage points for the CE stage), and this will be treated separately in future work. However, three items of import did emerge as being pertinent for achieving the higher efficacy: (1) strategic repetition of key instructions, (2) prepended (and/or sometimes postpended) strict instructives, (3) utilizing the aforementioned to handle the issues of FPs/FNs and uncertainty, and (4) more optimized (e.g., clear and concise, reduced tokens) instruction set. As should be axiomatic, there is a seeming contradiction in the set of (1)+(2)+(3) and (4). For the former, re-articulation via re-affirmation (more tokens) seems to be central while for the latter, conciseness/clarity along with compactness (less tokens) seems to be at play. For the observed highest efficacy AD #15, #10, #13, #9, and #14, the tokens ranged from 299 to 477 (with AD #15 at 477 tokens). Delving into this further, for the AD #15 Derivatives of AD #23, #17, #20, #19, and #24, the tokens ranged from 197 to 278 (with AD #23 at 197 tokens having the highest efficacy for the experimentation herein). The finding for AD #23 at 197 tokens seems to be consistent with the notion that prompts exceeding the 150-200 tokens range often do not enhance matter and may even be counterproductive (e.g., by reducing clarity and introducing prospective conflicting priorities). However, the 197 tokens is somewhat higher than the typical effective token ranges of 50-120 tokens for technical inspection/anomaly analysis and 80-150 tokens for highly structured anomaly analysis. The scripts for AD #15 and AD #23 are available at the links Experimentation Scripts for AD #15 and Derivative Experimentation Scripts for AD #23, respectively.

Second, for AG on the G side, as can be seen in Fig. 8 within the paper, AG #15, #17, #12, #13, and #16 exhibited the highest efficacy; this alluded to the notion that Facets #10, #7, #4, #1, and #6, among others of Table VII within the paper, better facilitated matters. The facets and thematics of the referenced table are provided below.

Facet	Thematic
1	Copy-Edit Framing
2	Explicit Edit Region
3	Explicit Editing Framing
4	Hierarchical Instruction Ordering
5	Negative Constraints
6	Pixel Preservation
7	Reference Image/Scene Anchoring
8	Scene Lock
9	Size Constraints
10	Task Definition (e.g., for the generation of anomaly images, "edit" is preferred to "generate")
11	Texture Preservation

In a fashion a bit different than the D side, the particular PE facets have a great deal of interwoven commonality, as they seem to revolve around certain foundational prompt principles/thematics, such as: clear task definition (e.g., “this is an image editing task, not a new image generation task”), reference image/scene anchoring (e.g., “treat the input image as a fixed reference scene”), hierarchical instruction ordering (e.g., “scene lock, pixel preservation, editing principle, edit region, modification”), etc. The constituent thematics of Table 2 revolve around these foundational prompt principles and serve as core editing constraints/prompt control techniques. For a more granular understanding of AG#15, a number of variants (AG #18 - #23), which are available at the link Derivative Experimentation Prompts for AG, were devised and tested. The results are shown in Figure 9 within the paper. In essence, for these AG #15 Derivatives on the G side, as can be seen in Fig. 9 within the paper, AG #21, #22, #19, and #20 exhibited the highest efficacy. This seemed to reaffirm the import of Facets #10, #7, and #4. For the observed highest efficacy AG #15, #17, #12, #13, and #16, the tokens ranged from 160 to 336 (with AG #15 at 197 tokens). Delving into this further, for the AG #15 Derivatives of AG #21, #22, #19, and #20, the tokens ranged from 112 to 286 (with AG #21 at 200 tokens). The finding for AG #21 at 200 tokens seems to be consistent with the notion that prompts exceeding the 150-200 tokens range may be counterproductive. The scripts for AG #15 and AG #21 are available at the links Experimentation Scripts for AG #15 and Derivative Experimentation Scripts for AG #21, respectively.

In brief:

(as referenced in Section IIIC. Experimentation)

For a listing of acronyms utilized within the paper, please visit this link: Acronyms. So that reviewers and readers can reproduce the results for this IEEE paper, the various initially utilized AD and AG experimentation prompts reside on the following Github pages:

Experimentation Prompts for Anomaly Detection (AD) and
Experimentation Prompts for Anomaly Generation (AG).

The ensuing derivative AD and AG experimentation prompts (based upon the pertinent AD or AG experimentation prompt with the highest efficacy) reside on the following Github pages:

Derivative Experimentation Prompts for AD and
Derivative Experimentation Prompts for AG.

The associated scripts for the pertinent AD or AG experimentation prompts with the highest efficacy resides on the following Github pages:

Experimentation Scripts for AD #15,
Derivative Experimentation Scripts for AD #23,
Experimentation Scripts for AG #15, and
Derivative Experimentation Scripts for AG #21.

In brief, the Python script is useful for prototyping (but slower) while Go (Golang) has high-throughput advantages and Rust can be ultra-high performance (and is well suited for bulk processing).