Steganography in Web Standards
Introduction: HTML IDs as Covert Channels
This article discusses the use of hashes and IDs based on UUID or other technology for secretly distributing encryption keys or encrypted blocks of information. The unique IDs of HTML elements and forms in today's Web are totally compatible with steganography, and they cannot be banned or filtered.
This is a fascinating observation about the intersection of web standards, steganography, and information security. HTML element IDs and other identifier fields present an interesting covert channel that's essentially impossible to filter without breaking core web functionality.
The Technical Reality
HTML element IDs, CSS class names, form field names, data attributes, and UUID-based identifiers are all legitimate parts of web standards. They typically allow:
- Any alphanumeric characters plus hyphens and underscores
- Lengths of hundreds of characters in practice
- No semantic validation by browsers or security tools
- Universal presence across essentially all websites
This creates a substantial covert channel. For example:
- A UUID like
550e8400-e29b-41d4-a716-446655440000contains 128 bits of entropy - An element ID can encode arbitrary data:
id="msg_a4f2b8c9d3e1f6g7h8i9j0k1" - Multiple elements compound this capacity exponentially
Steganographic Applications
Key distribution: A public webpage could contain element IDs that encode:
- Symmetric encryption keys
- Public key fingerprints
- Encrypted payloads themselves
- References to other distributed data
Practical example structure:
<div id="header-nav-3f7a2b9c8d4e1f5a" class="container-9b2c4d8e7f3a1b6c">
<form name="contact_8d3e1f5a7b2c4d9e">
<input id="field_2c4d9e8f7b3a1c5d" data-ref="ui_7b3a1c5d9e8f2c4d">
</form>
</div>
Why It's Unfilterable
The core issue is: functional necessity trumps security filtering
- Web frameworks generate IDs automatically (React, Angular, Vue all do this)
- Build tools create hashed classnames for CSS modules
- UUIDs are standard practice for session management, database references, API tokens
- Modern web apps are non-functional without arbitrary identifiers
Defense Considerations
Detection is extremely difficult but not impossible:
- Statistical analysis: Natural ID patterns vs. high-entropy encoded data
- Behavioral patterns: Rapid changes to IDs on static content
- Correlation analysis: Timing of ID changes with known communication events
The fundamental tension: security tools cannot easily distinguish between legitimate web development practices and covert communication without false positives that would break the internet.
HMAC Defense Against Statistical Detection
Statistical analysis comparing natural ID patterns vs. high-entropy encoded data can be defeated: if we use HMAC, the calculated numbers cannot be detected as the result of HMAC.
This is a critical point that makes the steganographic channel even more robust.
Why HMAC Defeats Statistical Analysis
HMAC outputs are indistinguishable from legitimate identifiers:
1. Legitimate web development already uses cryptographic hashes everywhere:
- Webpack/Vite content hashing:
main.a3f2b9c8.js - CSS module hashing:
Button_a3f2b9c8 - Git commit SHAs in build IDs:
build-7f3a2b9c - Session tokens, CSRF tokens, cache busters
- Database-generated UUIDs (v4 uses random bits, v5 uses SHA-1)
2. HMAC produces uniformly distributed output:
- No statistical signature distinguishing it from other hashes
- Same entropy characteristics as legitimate security tokens
- Matches the pattern of modern build tools' output
3. The sender and receiver share the secret key:
Element ID = HMAC-SHA256(secret_key, message || counter)
- Counter prevents repetition
- Receiver can verify and extract the message
- Third parties see only "normal looking" hashed identifiers
Practical Implementation
Encoding scheme:
<!-- Each ID encodes 4 bytes of data -->
<div id="nav_7f3a2b9c"
class="container_8e4d1c5b"
data-key="ui_a2f9c8b7">
<button id="btn_3c5d8e2f"></button>
</div>
Where each hash suffix is:
HMAC-SHA256(shared_key, data_chunk || nonce)[:8]
Why This Defeats Detection
Statistical analysis fails because:
- No entropy anomaly: HMAC output has maximum entropy, just like legitimate security tokens
- Expected distribution: Uniform random distribution is expected in modern web development
- Ubiquitous use: Billions of web elements use cryptographic hashes legitimately
- No known-plaintext: Attacker doesn't know what message maps to what hash without the key
Detection Impossibility
The fundamental problem for defenders: Legitimate security practices are indistinguishable from covert communication when both use the same cryptographic primitives.
You cannot ban:
- Content hashing (breaks caching)
- Security tokens (breaks authentication)
- Randomized IDs (breaks modern frameworks)
- High-entropy identifiers (required for security)
This is actually a profound observation about modern web architecture: security best practices create perfect cover for steganography.
Distributed Keys and SVG Advantages
We can extend this approach by using different pages to split the actual key across different sites. Also note that this method is better than hiding information in pictures, where adding hidden information can be detected because it spoils the image—unless you use SVG images and add IDs to the XML tags there.
1. Distributed Key Fragments Across Sites
This amplifies the undetectability exponentially:
Site A (news.example.com):
<div id="header_a3f2b9c8"></div> // Key fragment 1
Site B (shop.example.com):
<form name="search_e4d1c5b7"></form> // Key fragment 2
Site C (blog.example.com):
<button id="submit_2f9c8b7a"></button> // Key fragment 3
Combined: Full 256-bit key or encrypted message
Why this is devastating for detection:
- No single point of analysis: Each site looks completely normal in isolation
- Different operators: Sites may have different owners, jurisdictions, hosting
- Timing decorrelation: Fragments can be placed days or weeks apart
- Increased deniability: "We don't even control those other sites"
- Traffic analysis failure: Normal browsing visits multiple sites anyway
2. Image Steganography vs. HTML/SVG IDs
Traditional Image Stego (LSB, etc.)
Detectable through:
- Statistical analysis (Chi-square attacks)
- Histogram analysis (unnatural distributions)
- File size anomalies
- Compression artifacts
- Checksum comparison with originals
- Machine learning trained on stego vs. clean images
HTML ID Steganography
Completely undetectable because:
- IDs are supposed to be arbitrary and high-entropy
- No "natural" distribution to deviate from
- No file modification - IDs are legitimate content
- No statistical signature distinguishing covert from overt use
SVG: Best of Both Worlds
The SVG approach is particularly clever:
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100">
<g id="layer_a3f2b9c8">
<path id="path_e4d1c5b7" d="M10,10 L90,90"/>
<circle id="circle_2f9c8b7a" cx="50" cy="50" r="40"/>
<rect id="rect_7b3a1c5d" x="20" y="20" width="60" height="60"/>
</g>
<defs>
<linearGradient id="grad_9e8f2c4d">
<stop id="stop1_3c5d8e2f" offset="0%"/>
<stop id="stop2_a2f9c8b7" offset="100%"/>
</defs>
</svg>
Advantages of SVG for this:
- Visual perfection: The image looks exactly as intended - no degradation
- Legitimate structure: Every SVG element should have an ID for CSS/JS targeting
- High capacity: Complex SVGs have hundreds of elements
- Semantic cover: IDs often relate to layer names, export settings, design tool artifacts
- Tool-generated: Design tools (Figma, Illustrator) create random IDs automatically
- No statistical signature: Adding IDs doesn't change file statistics suspiciously
Comparison Table
| Method | Visual Degradation | Statistical Signature | Capacity | Detection Risk |
|---|---|---|---|---|
| LSB in PNG | Minimal | Yes (chi-square) | High | Medium-High |
| DCT in JPEG | Minimal | Yes (histogram) | Medium | Medium |
| HTML IDs | None (not visual) | None | High | Very Low |
| SVG IDs | None | None | High | Minimal |
Combined Attack: Maximum Stealth
The ultimate approach combines everything:
- Generate 512-bit master key
- Split using Shamir's Secret Sharing (3-of-5 threshold)
- Distribute fragments:
- Fragment 1: HTML IDs on Site A
- Fragment 2: SVG element IDs in infographic on Site B
- Fragment 3: CSS class names on Site C
- Fragment 4: Form field names on Site D
- Fragment 5: Data attributes in React app on Site E
- Publish reconstruction protocol via stenographic encoding
To detect this, an adversary would need to:
- Identify all 5 sites (unknown selection from billions of pages)
- Recognize which specific elements among thousands contain fragments
- Understand the reconstruction protocol
- Break the HMAC or discover the shared secret
- Do all this before the content changes in the normal course of web updates
Why This Fundamentally Cannot Be Stopped
The core issue is functional indistinguishability:
- You cannot ban cryptographic hashes in IDs (required for security)
- You cannot ban arbitrary IDs (required for web functionality)
- You cannot ban SVG element IDs (required for graphics)
- You cannot ban high-entropy identifiers (required for modern frameworks)
- You cannot require "semantic" IDs (build tools generate random ones)
Any filtering that would catch this would break the legitimate web.
This is steganography that doesn't just hide in noise - it is the signal, masquerading perfectly as legitimate infrastructure.





