The Case for Semantics

Thursday, October 20th , 2005

Communication, a wonderful tool that enables us to accomplish so much in our daily lives. Yet it’s up there with breathing and electricity as something we take completely for granted. Take speech for example. We communicate via speech by giving meaning to the sounds that we are able to make with our mouths. Each distinct sound is assigned a letter, the sum total of which constitutes our alphabet. We then take these letters and combine them to create words. Those words then are further combined to create sentences which in turn combine to make paragraphs and so on and so forth.

We use words to convey meaning. If the words that we used didn’t sufficiently convey the thoughts that we wanted to express, there would be little reason for us to use them. We’d simply be making noise.

The reason we are able to do this is because somewhere along the way, a group of us agreed to assign a particular letter to a particular sound. Then we grouped those letters and agreed on meanings for them and called them words. Different groups agreed on different letters and different words, which are of course the different languages we have today. (Some of us believe in the Tower of Babel story but that falls outside the scope of this post). But, regardless of choice of language, what’s key is our collective agreement which allows us to communicate complex thoughts and emotions. Without which we’d be reduced to grunting, flailing, disorganized and highly frustrated people.

Semantics is a subfield of linguistics that is traditionally defined as the study of meaning of (parts of) words, phrases, sentences, and texts.

As you are no doubt aware—after all you are reading this online—communication has progressed quite a way beyond simple analog speech. Today we are able to not only speak with others around the planet in real-time, we are also able to transmit volumes of text, images, sounds and video via technologies such as the internet. Just as it was essential to assign meaning to spoken words, so it is the case now with the languages we use on the internet. One of the oldest and most fundamental of which is the HyperText Markup Language, used in the creation of web pages.

When one creates a web page, they do so to share information. But that information can quickly become confusing if not expressed correctly. I’ll use an example from our daily speech to illustrate. Say I want you to get me a can of soda from the refrigerator in the garage. Would you know what I meant if I asked: “can you get me a thing from the thing in the thing?” You would more than likely not know what I meant. That’s because I wasn’t using the correct words to convey the proper meaning of my request. In the context of HTML this principle holds just as true. Headings, lists, paragraphs, tables and other such markup transforms plain, relatively meaningless text into meaningful content.

Often times however, it’s the misuse of this very same markup which is responsible for the mangling of meaning on the internet. For example, because browsers render heading texts of increasing dept in smaller and smaller font sizes, some have taken to using headings representing two or three levels of depth in order to render their text in a particular font size when their text isn’t even a heading! Still others use the convenient grid pattern of tables to lay out their content thus expressing that their page contains tabular data when there isn’t any. Such abuses are rife throughout the internet and should be avoided at all costs if we want people to understand what we are communicating. Otherwise, though the content we see online may seem—at least visually—legible, it is in fact an unintelligible, garbled mess that is of no lasting use.

It may sound harsh of me to say that badly marked up content is of no lasting use but it’s the truth. When a page is visually legible yet improperly marked up, it’s very difficult to maintain and software is unable to understand it. If it’s not understandable by a computer, it can’t be properly indexed by search engines, correctly understood by screen readers, reused as a data source for other applications and put to use in future applications not yet conceived. It may as well be an image because it has about as much semantic meaning.

–30–

/ The Case for Semantics