Illustration by Satenik Guzhanina/Shutterstock

When Content Is Code: How Modern Digital Experiences Fit into Drupal, and Why They Sometimes Don’t

Data@Urban
14 min readApr 21, 2020

We use Drupal to publish engaging digital experiences. And also we don’t. In fact, some of our most interesting and engaging content lives outside of the CMS entirely. Is that wrong? Depends.

In this post, I will explain some of the technical challenges we face as content publishers that often need to break free of traditional content models, design templates, and authoring environments. I’ll also touch on a few things we’ve done to make Drupal more flexible today and a few things we are working on now to make it even more so. The goal is to frame the challenge and provide a peek into the mechanics of how the Urban Institute’s content strategy affects and drives our content technology strategy. In order to get there, we’ll need some background. Let’s start with “digital first.”

Digital first and content models

Most of the work we feature on Data@Urban is published digitally, and most of it could accurately be categorized as “digital first.” Digital first is a buzz phrase you hear a lot at CMS industry meetups, and it’s far from a new concept. It refers to a strategy where content is created from the beginning with digital distribution in mind. Most news you read today, which you probably read online, is written and composed to target your screen. Your screen, which could be on a phone, laptop, desktop, or even a refrigerator door, is a moving target. In fact, your “screen” may not be a screen at all. It could be a screen reader or a smart speaker.

What’s often missed in the digital first conversation is that, before digital anything, there was no consideration of “first.” Print publishing was not “print first,” it was “print only.” When we talk about digital first, we are talking about a scenario where the content creator has very little control over the device or context used by the content consumer.

The reason this is such a popular topic at those CMS industry meetups is because digital first workflows have had a profound impact on the evolution of web publishing workflows. Without going too far into it, the underlying concept of most modern digital publishing workflows is a separation of “content” (e.g., words, pictures, and links) from “presentation” (e.g., colors, fonts, and layout). From a CMS perspective, this means you want to store the “words” separate from the style and markup information wherever possible. Why? Because site-specific styles and layouts don’t translate to other contexts (while words always do), and as digital publishers, we not only expect our content to appear outside the confines of our own site, we rely on it.

To make that happen, CMS platforms, such as Drupal, depend on content models to structure and organize your words and images into a logical architecture. Site builders and developers use the content model to (among other things) define which “parts” of your web page should appear when syndicated formally (RSS) or informally (social media share).

Drupal’s superpower is content modeling. A content model is a schema, or a description of the types of fields that make up a given content type.

A simple example is this blog post on Medium. On the edit screen I have fields for title, body (more on that later), header image, and tags. Those four fields (plus post date) represent the content model of a Medium blog post. When I enter a title as an editor, I can expect the entered value to appear in Medium’s style at the top of the post. When I upload an image, it will appear in the designated spot in the designated size. I know the body text of this post will display in a serif font (which font exactly depends on your browser), and I know that the main textual content area will be no wider than 680px. I have no control over these style and layout choices, and as I work to peck this post into something cohesive and useful, I’m thankful for exactly that.

Drupal makes assembling new content models easy. In most cases, implementing front-end templates to display those models is easy as well. With a CMS, most of the actual markup (html) is provided by a template. This allows for noncoders to create content without having to bother with layout or style decisions. What’s more, by keeping the stored content (the data) free of presentation markup (or as free as possible), it can be efficiently repurposed for use in a different context. When we talk about digital first, this is what we really mean.

Everyday examples of online content displayed outside of the original site’s design framework abound. Previews created when you share a link on social media demonstrate this concept well. But it also comes into play when syndicating to AMP or Apple News, or even an old-school RSS feed. These third-party services are important channels for publishers like Urban, and they each have strong opinions about what kind of html tags they accept. If your content is loaded with highly presentation-specific tags or, worse, uses inline styles, it likely won’t be useful anywhere other than the original site. That’s bad for syndication.

A thoughtful content model supported by an authoring environment that limits html is the CMS architect’s lodestar. In many scenarios, that’s really all there is to it. Model, template, author filling out forms. Even at Urban, that describes 80 percent of our actual web content. The other 20 percent is staunchly nonconformist and demands a different treatment.

Blobs and body fields

You might not be sure what I mean by “limits html.” It sounds counterintuitive, doesn’t it? A “good” CMS would have fewer limits, right? Not exactly. A “good” CMS is often an expression of what we (“we” being CMS architects and builders) don’t want you to do. Not because we have arbitrary opinions about how your content needs to look (we do, but still), but because we are tasked with providing you with authoring tools that reliably result in a useful end-user experience regardless of device or context.

In a traditional “article” content model (like this Medium post), the “body” is the text of the article itself. As much as I’d love to avoid storing any html markup as content, it’s not a realistic expectation. Even the most basic post will have a link or two, not to mention headers, bolded words, and lists. There is no getting around it, digital content is going to include some inline markup. For this, we rely on a special kind of field widget called a What You See Is What You Get (WYSIWYG) editor (pronounced whiz-ee-wig). It usually looks something like this:

From an authoring standpoint, it works just like a word processor. You select some text, click B for bold and the text turns bold. Though usually hidden from the author, the WYSIWYG editor is quietly inserting html tags into the content. When the item is saved, those html tags are stored in the database. If you know even the smallest bit of html, you probably know what a bold tag looks like. Clearly, it’s my bolded text, right?

It’s just semantics

Or is it? And aren’t we getting a little granular for a post about technology strategy? Perhaps, but in order to appreciate the challenges we face when publishing sophisticated user experiences and narratives, you need some understanding of what semantic markup is, and why it’s crucial. Back to the pop quiz answer. It’s not <b>, it’s <strong> (in other words, <strong>my bolded text</strong>).

A strong tag is semantically more descriptive than a b tag (“The <strong> element is for content that is of greater importance, while the <b> element is used to draw attention to text without indicating that it’s more important.” — Mozilla.org)

When a browser or, for a more illustrative example, a screen-reader used by a user with visual impairment, encounters your bolded text, the two tags are interpreted differently. The <b> tag tells the interpreter to decorate the enclosed text. The <strong> tag tells the interpreter the enclosed text is more important than the words around it; it’s the more semantic choice because it adds a distinct meta value to a string of words. It’s also more accessible.

In every journey through semantic html, accessible html, and search-friendly html, all roads lead back to this time-tested truth. The better your markup describes the content it contains, the better chance it (the content and the markup) will work in the wild.

Traditionally, the CMS has served a dual role. In addition to managing content, a CMS was expected to provide intuitive tools to author and even compose content. That’s still true today for most (and still true for most of Urban’s web content), but what happens when you need more than a few bolded words and links? What happens when the content is an interactive narrative or data tool — something really cool that transitions on scroll? Where does that go in the content model? What about JavaScript? Can that go in the WYSIWYG?

And therein lies the rub. Our content strategy depends largely on publishing content that does not always “fit” into the confines of the CMS.

Many Urban features are fully functional data tools that provide users an opportunity to find their own stories. Others utilize photography, video, and layout variation to move the reader through the story and emphasize a human component that is so easily lost while munching on aggregated data.

Such storytelling tactics are not unique to Urban, but we’ve embraced them and, in the process, we’ve learned a lot about what it takes to design, write, code, revise, publish, and manage digital content. In most cases, a single feature requires a dedicated designer, developer, and writer. The workflow for this type of content is not unlike the process to produce a small, self-contained website. In fact, if you look at the project credits in any Urban feature, you’ll see designer and developer roles represented alongside authors and researchers.

As it turns out, many Urban features are self-contained websites. We’ll get to those in a moment. First, let’s talk about what we’ve done to push the content-composition limits of our current Drupal site.

Unlimited HTML

The simplest and most potentially sloppy way of allowing more fine-grained control over the content area (or, the body field) of a post is to remove html restrictions that may exist on your site. We still have some, but not many. It’s important to mention here that we have a small, well-trained staff who enter and manage content. Allowing script tags (which we do) is a security weakness (inline JavaScript can be used for nefarious purposes). Allowing iframes (which we also do) can have security implications as well. Allowing inline styles poses no security risk, but the aesthetic risk is obvious.

There are also practical concerns. Allowing “raw” html often means disabling some of the handy WYSIWYG functions described above. We still have nontechnical editors who manage content, and creating opportunities for advanced users to directly use html, JavaScript (JS), and CSS is often at odds with the need for point-and-click experiences for noncoders.

Those issues aside, allowing unfettered entry of HTML and JavaScript (by a trained hand) does work — until it doesn’t. Treating the body field as a repository for an unwieldy blob of unstructured HTML, JS, and CSS violates the important digital first principle of separation of content and presentation. Content created this way is unlikely to repurpose well. It may not render at all in some contexts. We have a lot of this content, and it will be challenging to migrate into our upgraded site when the time comes. The good news is we’ve dramatically reduced our reliance upon unlimited html blobs!

Paragraphs

In the past few years, we’ve relied heavily on Drupal Paragraphs to compose intricate layouts for narrative content on urban.org. I talked a bit about paragraphs in a previous post.

The Paragraphs module spawned from a similar Drupal module called “Field Collection.” That name is much more useful for our purposes here. A “Paragraph” is a collection of structured fields that can be repeated within a parent page. What does that mean?

Using Paragraphs, a site architect can define any number of Paragraph types that act as little content models within the larger structure. For example, here is a Paragraph type we use on several sites. It’s called a callout box. The fields you see here represent potential parts of an inline callout box that will float left or right depending on placement and settings.

When we use paragraphs, we remove the old body field entirely. Rather than one blob of html, we use paragraphs as building blocks (we call them “content blocks”) to assemble and compose an engaging narrative, complete with variable width elements, contextually relevant callout boxes, and well-formed accessible markup. Here’s what that looks like on the back end (and here is the front-end result).

By using fields, rather than freeform html, we have CMS-level control over markup and style. This ensures display consistency across the site as well as structural consistency. Every callout box is unique, in terms of content, but they mostly share the same markup.

For a site builder, paragraphs are a dream. They provide a great deal of structure, which makes responsive theming and styling considerably easier, but the ability to mix and match paragraphs within the same content area means content authors and composer can find their own structures using preformed building blocks. It’s almost perfect.

Complex layouts can get unwieldy. Paragraphs are managed on the edit screen and each item can be dragged and dropped for arrangement. Dragging along a long list of components can get funky. The editor experience suffers from having to toggle between front and back “ends” of a page. It’s easier than previously available alternatives, but the lack of a visual layout component is noteworthy.

As for the name, it speaks to the concept that each chunk of structured content is analogous to a traditional paragraphs (small p).

In 2019, the Drupal Community released an exciting new toolset called Layout Builder, which is a true visual content composition tool. We’ve used Layout Builder in some of our smaller Drupal 8 sites and have plans to lean on it in future iterations of urban.org.

Some features still prefer remote work, and that’s OK

Interactive features, calculators, data portal API front ends, to name a few, tend to be full-screen (full browser) experiences with their own functionality and application dependencies or open source frameworks. With few exceptions, the author, composer, developer, production team, and the end user are best served when we resist the urge to force such products into Drupal. So we don’t. The nature of Urban’s work is such that the standalone site remains an important tool in our kit. We are lucky to have talented DevOps resources in-house that allow the rest of us to focus on user experience and functionality, even when released from the bounds of the CMS.

So what’s the problem?

For all the benefits provided by the use of free-standing sites (aka: micro-sites, apps, data tools), there are some downsides:

Domains and routes and servers oh my
We do a lot of features. They are organized very well as static assets on a web server. This requires a certain amount of manual oversight to keep it from losing structure. Path changes and redirects are manual. It would be better the routes (even if not the files) were managed by the CMS, which is great at automatically setting up redirects when paths change.

Single content inventory
The urban.org site currently has almost 20,000 live “nodes,” which we can think of right now as pages. As publishers, it’s obviously useful to maintain an active inventory of your online content in one place. Drupal (or any CMS) is very good at this. When content lives outside of Drupal, we workaround the problem by adding a pseudo-record that links to the external property. It serves the inventory purpose well enough, however . . .

Search
The pseudo-record is somewhat useful for search results as well, but still lacking. In addition to the link to the external feature, we include a small amount of meta data. It’s enough to get the feature to appear in very targeted keyword searches on our site, but with most of the textual content living outside the site, our search engine is not able to index it, which reduces the search visibility of external feature content on our site.

With liberty comes responsibility
It goes without saying we have an exceptionally talented and well-trained staff at Urban. We trust and are entrusted to make good decisions, write clean code, and use semantic markup. We are also human. Freeform content has no built-in guardrails and, accordingly, requires more testing. Also, in these cases, the developer/producer is often solely responsible for implementing semantic, accessible markup. The good news is that there is no shortage of modern tools to help anybody write perfect code.

Drupal 8 and the ways forward

I’d be remiss if I didn’t mention that urban.org is a Drupal 7 site that will soon be Drupal 8. The differences between Drupal 7 and 8 are profound, and the newer version provides better design patterns and tools that address many of the challenges described above. I’ve already mentioned Layout Builder, which introduces drag and drop content composition, as one way to address our challenges.

Drupal 8 also comes with JSON:API “out-of-the-box,” which provides a REST API and fully-documented specification. This means the content model (including things like Paragraphs) is automatically expressed in a JSON structure, complete with auto-generated field documentation. In other words, Drupal 8 supports and even encourages “decoupled” site architecture. Decoupling is, quite literally, the separation of content from presentation.

We are already using Layout Builder and JSON:API for targeted use-cases, and both are important components in our ongoing efforts to continually improve and modernize our digital publishing capabilities. I look forward to sharing our experiences with these solutions here.

Scratching the surface

In this post, I’ve tried to provide some insight into the day to day challenges we encounter when publishing and managing content that pushes the bounds of or breaks free from our Drupal CMS entirely. I’ve also shared some of the workarounds and solutions we’ve found over the years, as well as a glimpse of what could be the future of content composition, if not the bleeding-edge present.

I’ve left out quite a bit. Like how we are using more JavaScript frameworks such as React to build self-contained apps, and how we can turn those into Drupal blocks and include them right alongside “regular” Drupal blocks (aka “selective de-coupling”). Or how we use R Markdown to generate semantic, print-friendly HTML for certain publications (a true “digital first” workflow). And how did I get this far without mentioning Design Systems even once?

The ongoing modernization and improvement of our publishing systems and processes is exciting to be a part of. Sometimes, keeping our current tools up to speed is a challenge, but we’ve been able to find workable solutions both with Drupal and without.

Working as a CMS architect at Urban presents plenty of challenges. As we experiment with and publish more complex and engaging content, the limits of our designs, workflows, and systems are constantly tested. Urban researchers, data scientists, and content creators are not bound by whatever our CMS can or cannot do at any given time. The evidence/data/story leads the way, always, as it should be. If that means I need to re-think a content model or two, so be it. Nothing in technology stays static for long. Without these challenges, this job would be a whole lot easier. And a whole lot less interesting.

-Mark Sutton

Want to learn more? Sign up for the Data@Urban newsletter.

--

--

Data@Urban
Data@Urban

Written by Data@Urban

Data@Urban is a place to explore the code, data, products, and processes that bring Urban Institute research to life.

No responses yet