The Future of Search Series is supported by SES New York Conference & Expo, the search and social marketing conference helping brands, agencies, and professionals connect, share and learn what’s next for the interactive industry.
Semantics, the study of meaning, is playing an increasingly important role in the development of knowledge management tools across a variety of industries, and some of the most interesting developments are coming from the media world.
Semantic search is one broad area within the higher realm of semantic technologies, which also includes knowledge storage, information extraction and reasoning, among other topics. The goal of semantic search is to improve search result accuracy by understanding the searcher’s intent and the contextual relationships between the terms used in the search.
We spoke with Evan Sandhaus, lead architect of semantic platforms at The New York Times Company, and Jeff Catlin, CEO of text analytics company Lexalytics, to better understand how semantic search is affecting news and social media.
The New York Times morgue, a collection of topical and biographical clippings and photographs from The Times and other publications, once existed in the old Times headquarters on West 43rd Street, but has since been relocated.
“All websites are in the business of capturing people’s attention,” said Sandhaus, recalling a recent presentation he had attended. This is especially true for news organizations and blogs, which push out piles upon piles of online articles each day. In the end, the news isn’t exactly useful if no one reads it. So, the goal is to make content as findable as possible.
The fundamentally challenging structure of the web, Sandhaus says, isn’t exactly helping the cause, though. The web is predominantly written in HTML, a markup language that focuses on expressing how information on a webpage should look, not what it means. As a result, important pieces of information within webpages, such as headlines, bylines and publish dates in news articles, are formatted within HTML, but aren’t explicitly labeled as “headline, “byline” and “publish date.” “As a consequence,” Sandhaus explains, “it makes it difficult for a wider web ecosystem to have an idea of the structured nature of content.” That is, while webpages are formatted for humans to easily read them, machines can’t easily determine the underlying meaning of content on a page if it doesn’t follow a consistent structure. Thus, devaluing the utility of data.
So, what is being done to combat web content from falling into the great abyss that is the web? Many communities are working on this problem, with the concept Linked Data being a central part of the conversation. Linked Data is a best practice for exposing, sharing and connecting pieces of data, information and knowledge on the Semantic Web.
Since its inception, The New York Times has set itself up nicely to participate in the Semantic Web. Since the late 1800s, it has maintained an authoritative and controlled news vocabulary to archive clippings from its and others’ publications, which were then stored in “the morgue” at its old New York City headquarters on 43rd Street. These archives were originally created so that reporters could easily research historical documentation on a certain topic in the reporting process. Little did anyone know, this organized structure would set The Times up for having an amazing amount of useful data once semantic technologies would evolve more than a century later.
In 2009, The Times began publishing its indexing vocabulary, which includes people, organizations, locations and descriptors, as linked open data, enabling other datasets to interact with it, opening up a world of possibilities for useful applications, based on Times data. As of September 2010, there are 203 datasets — including data from The Times — published in Linked Data format. These datasets combined are more powerful together than any one dataset could ever be alone.
Creating standards is the next step in the process towards building a more connected web. Working to further connect information on the web, The World Wide Web Consortium (W3C), among other communities, continues to develop standards for the Semantic Web, explained Sandhaus, including RDFa, which enables users to embed rich metadata, such as title, author and date information, within web documents. This allows users to call out meanings for specific portions of a webpage, making the information more usable on the greater web.
The problem with RDFa, though, is that different organizations can use it to develop different naming systems for the same pieces of data, says Sandhaus. In the media world, for example, a “headline” could also be called a “title,” or even “Schlagzeile” (in German) or “intestazione” (in Italian).
The New York Times is hoping to alleviate this problem. As of October 2010, The Times, in collaboration with the International Press Telecommunications Council, is working on creating a standard within the publishing industry to express structural metadata within HTML — this framework is called rNews. With this standard, search engines, aggregators and social sites, for example, will have access to the data, making it more useful to the web at large. The project has only just begun, but Sandhaus expects to have more details about its direction in coming months.
Leading innovation in the publishing industry, The New York Times continues to reimagine what is possible within the world of semantic technologies, making its data (and the data that interacts with it), more useful as more technological developments surface.
Social media is another area of the web where data seems infinitely powerful — Twitter for example, logs more than 110 million tweets per day, and 50% of Facebook‘s 500 million active users log in daily.
As users continue to spend more time on social networks, brands are finding it more important to maintain presences on social platforms. Analytics haven’t been a huge focus for early adopter brands, but as companies try to measure the ROI of being active on social networks, analytic tools are taking a prominent position in the discussion.
Over the course of the past year, brands have begun to add sentiment analysis to the list of must-have features in their social media monitoring tools, says Lexalytics CEO Jeff Catlin.
There are two sides to brand-oriented communication on social platforms — while brands are sending out marketing messages via their social channels, consumers are chatting about brands and products. As a result, there are two main ways that brands are currently using semantics:
- Consumer sentiment analysis: Brands want to know what consumers are saying about them. Using text analytics, an increasing number of services are able to analyze a user’s grammar usage and determine the meaning behind his or her mention of a brand or product. In some cases, this may simply mean determining if a user is using a positive or negative tone when discussing a product or service. In other more advanced cases, this could mean determining a user’s specific intent behind a statement. Viralheat, for example, aims to pinpoint social media users on the cusp of making purchasing decisions. This type of service enables brands to weed out irrelevant social updates and access those with the most potential return.
- Messaging consistency: Monitoring customer sentiment is a bit obvious, but another use for text analytics in the social realm is for monitoring a brand’s messaging consistency. Catlin notes that it’s important for a brand to “sound like it has a common voice and a consensus of opinion in how it communicates to the world.” Historically, it’s always been a priority for brands to make sure their messaging was consistent and clean — social media is another channel where this is important. Using semantic technologies, brands are now able to analyze what they’ve said and whether those messages were consistent. That information can then be used to determine future messaging strategies.
While semantics is having a clear affect on social media monitoring, Catlin feels that it will also soon play a role in social search from a user’s perspective. Search engines and search features within social sites will have to integrate semantic technologies to stay relevant, says Catlin.
He posed an example: if a user is searching for “Indian food,” a keyword search isn’t as useful as a semantics-driven search could be. “Let’s say you have an interest in Indian food,” explained Catlin. “Imagine that a tweet came out that happened to say, ‘This was the best chicken tikka I’ve ever had.’ The search tool would in fact lump that into your interest in Indian food. Even though the tweet never mentioned the term ‘Indian food’ anywhere, it can semantically understand that ‘chicken tikka’ belongs in ‘Indian food.’”
Lexalytics is developing semantic technologies that do just that, and Catlin expects to unleash them later this year. “Imagine digesting all of Wikipedia, if you will,” Catlin supposed, explaining the technology, “If you digested all of the knowledge out there, you would start to see relationships. You would start to see things like ‘chicken tikka’ referenced in things about ‘Indian food.’ We hold onto that knowledge historically, so that we can use it later on.”
As semantic technologies continue to evolve, data on the web will become more meaningful and useful. Traditional media outlets, like The New York Times
, are already seeing the benefits of participating in the Semantic Web, as they are able to use other people’s data to reason about their own archives. Likewise, social search stands to gain much from incorporating semantic understanding in order to create better user experiences and enhance analytics for brands.
Which industries are you most interested in seeing adopt semantic technologies? Let us know in the comments below.
Series Supported by SES New York Conference & Expo
The Future of Search Series is supported by SES New York Conference & Expo, the search and social marketing conference helping brands, agencies, and professionals connect, share and learn what’s next for the interactive industry. Learn why more than 5,000 brands and agencies from the enterprise level to brick and motor businesses choose SES for their online marketing education.
Image courtesy of iStockphoto, thesuperph & Flickr, Jennifer Brook, KEXINO