Every year, hundreds of millions of posts, comments, and accounts are removed, restricted, or quietly downranked across major social media platforms. Most users who experience this have no idea how the system actually decided to act on their content — or even that it did. The mechanics are deliberately opaque. This is a factual explanation of how they work.

Layer 1: Automated Algorithmic Flagging

The first line of content moderation is automated — no human reviews the vast majority of flagged content before action is taken. Machine learning classifiers scan text, images, video, audio, and even metadata in real time as content is uploaded or posted. These models are trained on billions of examples of previously moderated content and assign each new piece a policy-violation probability score.

When a score crosses a platform-defined threshold, the system takes automatic action. Below one threshold, the post continues normally. Above a higher threshold, it's auto-removed or account-actioned. In the range between, it's queued for human review. The specific thresholds are not published and shift constantly as models are retrained.

Hash matching runs in parallel. Platforms maintain databases of cryptographic hashes for known-violating content — particularly child sexual abuse material, but increasingly for other categories including terrorist propaganda and, during elections, specific political content. An exact hash match triggers immediate removal regardless of context. Perceptual hashing (PhotoDNA and its successors) extends this to visually similar but not bit-identical content.

Layer 2: Network-Level Signals

Automated systems don't evaluate content in isolation. They factor in signals about the account posting it and the network around that account. A post from a two-week-old account with 12 followers that receives 50,000 interactions in two hours triggers different model responses than the same post from a five-year-old verified account with established engagement patterns.

Coordination detection systems look for clusters of accounts posting similar content, using similar timing patterns, or exhibiting engagement patterns inconsistent with organic human behavior. Accounts flagged as potentially coordinated have their content deprioritized in distribution — often before any specific policy violation is identified. The behavior of the network around an account becomes part of the moderation signal for that account.

This is one mechanism behind what users commonly call shadow banning: reduced algorithmic distribution without explicit account action and without notification. The content remains accessible via direct link but is suppressed in feeds, search results, and recommendation systems. Platforms have publicly acknowledged various forms of "visibility filtering" and "search result deemphasis" — the specific implementations differ by platform.

Layer 3: Human Review Queues

Content that the automated system isn't confident enough to auto-action enters human review queues. This work is distributed across content moderation contractors operating in dozens of countries, with review decisions made in seconds to minutes per item. Reviewers work from decision trees — policy flowcharts that map content characteristics to specified outcomes.

The policies themselves are codified in internal documents that platforms occasionally publish in edited form (Meta's Community Standards, YouTube's Community Guidelines) but more fully describe in internal policy guides that rarely become public. Leaked versions have revealed granular rules: specific phrases that auto-escalate, specific account types that receive different treatment, and specific geographies where enforcement priorities differ from global defaults.

Human reviewers can: leave content as-is, add a label or interstitial warning, reduce distribution, remove content, suspend the account temporarily, or permanently disable the account. The available actions and their reversibility depend on which enforcement track the content entered — different violation categories have different consequence ladders.

Layer 4: Appeals and the Asymmetry of Reversal

Most platforms provide an appeal mechanism. In practice, the asymmetry is significant: the system that initially actioned content has massive scale advantages over any individual appeal. Automated systems process millions of decisions daily; appeal reviews are handled by a smaller human team working through a larger backlog.

Appeal outcomes skew toward upholding original decisions — not necessarily because the original decisions were correct, but because review incentives favor consistency over correction. Reversals create additional work (restoring content, unwinding account actions) and set precedents that automated systems aren't designed to absorb. The path of least resistance in an appeal review is to confirm the original action.

Meta created an independent Oversight Board with authority to overturn specific content decisions and issue binding policy recommendations. As of this writing, the Board has reversed individual decisions in dozens of cases and issued policy guidance in others. Its structural influence on the automated systems processing millions of decisions daily is limited — it functions as a high-visibility exception process, not a reform mechanism for the underlying infrastructure.

Layer 5: Platform-Level Distribution Controls

Beyond individual content moderation, platforms operate distribution controls at the account and topic level. Recommendation algorithms determine what content appears in "For You" pages, trending sections, and suggested feeds. These are distinct from moderation systems but have similar practical effects on visibility.

An account can have all its content remain technically published while receiving near-zero algorithmic distribution. From the outside, this is indistinguishable from shadow banning. From the inside, it requires traffic analysis to detect — organic reach collapse that doesn't coincide with any explicit policy action or notification. The platform's position is that recommendation algorithms aren't moderation tools; critics note the distinction is irrelevant if the practical effect is suppression.

Topic-level suppression operates at scale: during specific events (elections, public health crises, geopolitical conflicts), platforms have implemented blanket distribution reductions for entire keyword clusters, hashtags, or content categories — applied automatically across all accounts, regardless of individual post quality or policy compliance.

What the System Is Optimized For

Understanding the architecture clarifies what it's optimized to do. Automated systems at platform scale can't optimize for accuracy — the volume is too high. They optimize for consistency, legal defensibility, and minimizing the specific error types that generate the most costly consequences for the platform (regulatory action, advertiser backlash, congressional testimony).

This means the error distribution is not random. False positives — legitimate content incorrectly removed — are systematically underweighted against false negatives — violating content left up. Both error types happen at massive scale. Which gets more internal attention depends on which error type is currently generating the most external pressure on the platform.

The result is a system that is simultaneously too aggressive (removing content that doesn't violate policy) and not aggressive enough (leaving up content that clearly does) — in different categories, at the same time, invisibly to most users experiencing it.

The decisions made within this system — and their downstream effects on specific speech that the documented evidence shows was suppressed — are examined in the book.