Steganography in Web Standards

Exploring the use of HTML IDs, UUIDs, and HMAC for covert communication
Published: November 23, 2025

Introduction: HTML IDs as Covert Channels

This article discusses the use of hashes and IDs based on UUID or other technology for secretly distributing encryption keys or encrypted blocks of information. The unique IDs of HTML elements and forms in today's Web are totally compatible with steganography, and they cannot be banned or filtered.

This is a fascinating observation about the intersection of web standards, steganography, and information security. HTML element IDs and other identifier fields present an interesting covert channel that's essentially impossible to filter without breaking core web functionality.

The Technical Reality

HTML element IDs, CSS class names, form field names, data attributes, and UUID-based identifiers are all legitimate parts of web standards. They typically allow:

Any alphanumeric characters plus hyphens and underscores
Lengths of hundreds of characters in practice
No semantic validation by browsers or security tools
Universal presence across essentially all websites

This creates a substantial covert channel. For example:

A UUID like 550e8400-e29b-41d4-a716-446655440000 contains 128 bits of entropy
An element ID can encode arbitrary data: id="msg_a4f2b8c9d3e1f6g7h8i9j0k1"
Multiple elements compound this capacity exponentially

Steganographic Applications

Key distribution: A public webpage could contain element IDs that encode:

Symmetric encryption keys
Public key fingerprints
Encrypted payloads themselves
References to other distributed data

Practical example structure:

<div id="header-nav-3f7a2b9c8d4e1f5a" class="container-9b2c4d8e7f3a1b6c">
  <form name="contact_8d3e1f5a7b2c4d9e">
    <input id="field_2c4d9e8f7b3a1c5d" data-ref="ui_7b3a1c5d9e8f2c4d">
  </form>
</div>

Why It's Unfilterable

The core issue is: functional necessity trumps security filtering

Web frameworks generate IDs automatically (React, Angular, Vue all do this)
Build tools create hashed classnames for CSS modules
UUIDs are standard practice for session management, database references, API tokens
Modern web apps are non-functional without arbitrary identifiers

Defense Considerations

Detection is extremely difficult but not impossible:

Statistical analysis: Natural ID patterns vs. high-entropy encoded data
Behavioral patterns: Rapid changes to IDs on static content
Correlation analysis: Timing of ID changes with known communication events

The fundamental tension: security tools cannot easily distinguish between legitimate web development practices and covert communication without false positives that would break the internet.

HMAC Defense Against Statistical Detection

Statistical analysis comparing natural ID patterns vs. high-entropy encoded data can be defeated: if we use HMAC, the calculated numbers cannot be detected as the result of HMAC.

This is a critical point that makes the steganographic channel even more robust.

Why HMAC Defeats Statistical Analysis

HMAC outputs are indistinguishable from legitimate identifiers:

1. Legitimate web development already uses cryptographic hashes everywhere:

Webpack/Vite content hashing: main.a3f2b9c8.js
CSS module hashing: Button_a3f2b9c8
Git commit SHAs in build IDs: build-7f3a2b9c
Session tokens, CSRF tokens, cache busters
Database-generated UUIDs (v4 uses random bits, v5 uses SHA-1)

2. HMAC produces uniformly distributed output:

No statistical signature distinguishing it from other hashes
Same entropy characteristics as legitimate security tokens
Matches the pattern of modern build tools' output

3. The sender and receiver share the secret key:

Element ID = HMAC-SHA256(secret_key, message || counter)

Counter prevents repetition
Receiver can verify and extract the message
Third parties see only "normal looking" hashed identifiers

Practical Implementation

Encoding scheme:

<!-- Each ID encodes 4 bytes of data -->
<div id="nav_7f3a2b9c" 
     class="container_8e4d1c5b"
     data-key="ui_a2f9c8b7">
  <button id="btn_3c5d8e2f"></button>
</div>

Where each hash suffix is:

HMAC-SHA256(shared_key, data_chunk || nonce)[:8]

Why This Defeats Detection

Statistical analysis fails because:

No entropy anomaly: HMAC output has maximum entropy, just like legitimate security tokens
Expected distribution: Uniform random distribution is expected in modern web development
Ubiquitous use: Billions of web elements use cryptographic hashes legitimately
No known-plaintext: Attacker doesn't know what message maps to what hash without the key

Detection Impossibility

The fundamental problem for defenders: Legitimate security practices are indistinguishable from covert communication when both use the same cryptographic primitives.

You cannot ban:

Content hashing (breaks caching)
Security tokens (breaks authentication)
Randomized IDs (breaks modern frameworks)
High-entropy identifiers (required for security)

This is actually a profound observation about modern web architecture: security best practices create perfect cover for steganography.

Distributed Keys and SVG Advantages

We can extend this approach by using different pages to split the actual key across different sites. Also note that this method is better than hiding information in pictures, where adding hidden information can be detected because it spoils the image—unless you use SVG images and add IDs to the XML tags there.

1. Distributed Key Fragments Across Sites

This amplifies the undetectability exponentially:

Site A (news.example.com):
<div id="header_a3f2b9c8"></div>  // Key fragment 1

Site B (shop.example.com):  
<form name="search_e4d1c5b7"></form>  // Key fragment 2

Site C (blog.example.com):
<button id="submit_2f9c8b7a"></button>  // Key fragment 3

Combined: Full 256-bit key or encrypted message

Why this is devastating for detection:

No single point of analysis: Each site looks completely normal in isolation
Different operators: Sites may have different owners, jurisdictions, hosting
Timing decorrelation: Fragments can be placed days or weeks apart
Increased deniability: "We don't even control those other sites"
Traffic analysis failure: Normal browsing visits multiple sites anyway

2. Image Steganography vs. HTML/SVG IDs

Traditional Image Stego (LSB, etc.)

Detectable through:

Statistical analysis (Chi-square attacks)
Histogram analysis (unnatural distributions)
File size anomalies
Compression artifacts
Checksum comparison with originals
Machine learning trained on stego vs. clean images

HTML ID Steganography

Completely undetectable because:

IDs are supposed to be arbitrary and high-entropy
No "natural" distribution to deviate from
No file modification - IDs are legitimate content
No statistical signature distinguishing covert from overt use

SVG: Best of Both Worlds

The SVG approach is particularly clever:

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100">
  <g id="layer_a3f2b9c8">
    <path id="path_e4d1c5b7" d="M10,10 L90,90"/>
    <circle id="circle_2f9c8b7a" cx="50" cy="50" r="40"/>
    <rect id="rect_7b3a1c5d" x="20" y="20" width="60" height="60"/>
  </g>
  <defs>
    <linearGradient id="grad_9e8f2c4d">
      <stop id="stop1_3c5d8e2f" offset="0%"/>
      <stop id="stop2_a2f9c8b7" offset="100%"/>
    </defs>
</svg>

Advantages of SVG for this:

Visual perfection: The image looks exactly as intended - no degradation
Legitimate structure: Every SVG element should have an ID for CSS/JS targeting
High capacity: Complex SVGs have hundreds of elements
Semantic cover: IDs often relate to layer names, export settings, design tool artifacts
Tool-generated: Design tools (Figma, Illustrator) create random IDs automatically
No statistical signature: Adding IDs doesn't change file statistics suspiciously

Comparison Table

Method	Visual Degradation	Statistical Signature	Capacity	Detection Risk
LSB in PNG	Minimal	Yes (chi-square)	High	Medium-High
DCT in JPEG	Minimal	Yes (histogram)	Medium	Medium
HTML IDs	None (not visual)	None	High	Very Low
SVG IDs	None	None	High	Minimal

Combined Attack: Maximum Stealth

The ultimate approach combines everything:

Generate 512-bit master key
Split using Shamir's Secret Sharing (3-of-5 threshold)
Distribute fragments:
- Fragment 1: HTML IDs on Site A
- Fragment 2: SVG element IDs in infographic on Site B
- Fragment 3: CSS class names on Site C
- Fragment 4: Form field names on Site D
- Fragment 5: Data attributes in React app on Site E
Publish reconstruction protocol via stenographic encoding

To detect this, an adversary would need to:

Identify all 5 sites (unknown selection from billions of pages)
Recognize which specific elements among thousands contain fragments
Understand the reconstruction protocol
Break the HMAC or discover the shared secret
Do all this before the content changes in the normal course of web updates

Why This Fundamentally Cannot Be Stopped

The core issue is functional indistinguishability:

You cannot ban cryptographic hashes in IDs (required for security)
You cannot ban arbitrary IDs (required for web functionality)
You cannot ban SVG element IDs (required for graphics)
You cannot ban high-entropy identifiers (required for modern frameworks)
You cannot require "semantic" IDs (build tools generate random ones)

Any filtering that would catch this would break the legitimate web.

This is steganography that doesn't just hide in noise - it is the signal, masquerading perfectly as legitimate infrastructure.

Vesselin Kolev's Tech Corner

About

Categories