Author Archives John Walker

John Walker

About John Walker

As Business Analyst in the Sales and Marketing group of NXP Semiconductors I am responsible for data/content models, processes, and tooling for product information at NXP Semiconductors. I am originally from the UK, but have been living and working in the Netherlands since 2000.

How we’re using DITA at NXP

Recently DITAWriter (AKA Keith Schengili-Roberts) posted a series of interviews (here, here and here) about the uptake of DITA in the Semiconductor industry. For those of you who have never heard of DITA, in short, it is an OASIS XML standard for authoring and publishing modular topic-based content.

One of the interviews was with our very own Colin Maudry, so I thought it would be useful to expand on how NXP is using DITA and give some facts and figures.

As mentioned in the interview, we primarily use DITA as the source format for what we call our “value propositions”. This content is essentially the marketing narrative about our products and consists of a description, features and applications plus a few other optional bits and pieces. This content ends up in a few different outputs including: data sheets, product web pages and mobile apps as shown in the following diagram.

Each of the circled bits of content is a topic, which is a discrete content resource that can be re-used across multiple documents (or maps in DITA-speak). This is particularly useful for applications lists, for example, where many products share that same list of ‘standard’ applications. This pays dividends when translating as we do not end up translating the same content time and time again, which results in more consistent translated content and lower costs.

Here are a few of the reasons we have chosen DITA:

  • Provides an ‘agnostic’ source format for multiple output formats
  • Separates content structure (meaning) from style (look and feel)
  • Topic-based approach enables content to be easily re-used across documents
  • Offers native support for translation/localization
  • As it’s XML based, can be integrated to existing XML publication processes
  • It’s an open standard so better tool support and a broader user community

Of course we have not been using DITA since the dawn of time so we had to migrate the content from legacy formats into DITA. Luckily for us all of our legacy source formats were XML, so it was relatively simple to transform this to the DITA target format using XSLT. The migration took place in two main rounds. During the first round we ignored any potential re-use and migrated each document with it’s own set of topics, whereas during the second round we tried to de-duplicate any topics where the content was identical.

Roughly 50% of our new data sheets are still created using proprietary (SGML) EDDs in Adobe FrameMaker. For these we still have to (manually) extract the content into DITA following publication, but this is something we are seeking to automate in the near future with the longer-term goal to move to DITA as the source format for all natural language content. For the other 50% of data sheets the DITA value propositions are used when we generate the data sheet along with other topics generated on-the-fly from our product information database (more on that in another post).

Having all this content in DITA enabled us to easily translate large volumes of content to Chinese and Japanese (see figures below). We have set up a partially automated workflow to deal with the translation requests which natively supports DITA and required minimal setup to set a bridge between our content and our translation providers. Without using DITA and related technologies such as XLIFF, such large-scale translation simply would not have been feasible or cost-effective.

Finally here are some stats reflecting the status today:

  • Number of DITA maps: 10,000+
    • English (source): 6,281
    • Chinese: 536
    • Japanese: 2,991
    • Other: 392
  • Number of DITA topics: 40,000+
  • On average a map is re-used for 2.27 products
  • Maximum number of times a map is re-used is for 223 products!
  • As of December 2012, our source DITA content contained 1,018,093 words including 489,705 repetitions
Posted by John Walker  /   April 18, 2013  /   Posted in Innovation, Innovation Featured, Slider Sub  /   No Comments

Is Linked Data the future of data integration in the enterprise?

In today’s multi-screen world we need to be able to get relevant content to the customer, at the right time in their format and language of choice. Life is no longer as simple as making PDF documents, we need to feed consistent information to all our multi-channel publications (web/mobile/print) and those of our partners. This blog post explains how NXP is tackling these challenges.

At NXP, similar to many other large organizations, we have what you might call a ‘brownfield’ information landscape. Data and content is scattered and duplicated across numerous applications and databases leading to inconsistent information, complex systems and inefficient processes. The outcome is that people are unable to find what they are looking for, or find conflicting information. Common approaches like Data Warehousing, Master Data Management and Service-Oriented Architecture all tackle parts of the problem, but none provide a total solution.

Our aim is to provide a single, up-to-date ‘canonical’ source of information that is easy to use and that data consumers (be they human or machine) can trust. By doing this we can make sure all our publications are kept up-to-date with the minimum amount of fuss.

The answer we found was to take the best of the previously mentioned approaches and combine with a liberal sprinkling of “Linked Data”. The result is a giant graph of data. Think of it like a social network, but with things (products, places, technologies, applications, documents, web pages) as well as people.

Linked Data… what’s that?

The idea of the Linked Data has been around for a while already, as Sir Tim Berners-Lee put it “The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.” In the past few years this approach has been used in publishing government data sets and has been steadily gaining traction within media and enterprise sectors.

We have been eager to jump into this approach ourselves. For the the past few years we have been making use of data dictionaries to ensure high-quality, structured product data based on canonical property definitions. So far we have mostly been using this data to reproduce existing publications like data sheets and product web pages in a more efficient manner. However we always felt “spoofing” these publications was not really showing off the data to its full potential. So in the past year we have been looking at new and innovative ways to publish our product data. When looking at Linked Data and RDF, we recognised an immediate affinity with our dictionary-based strategy.

For those readers unfamiliar with RDF and Linked Data, I will not go into details here, rather advise you to read a quick intro to RDF and the excellent Linked Data Book.

Business drivers

The main business drivers for this have, so far, been mostly internal: how to ensure product data is available across internal systems in an easy-to-use and efficient manner. In many areas simple XML messages and other B2B standards are applicable, but within a portfolio of over 20,000 products across a broad variety of functions and technology it simply is not possible to describe the complex properties of an electronic component with a handful of standard XML elements. Also the tree-based XML DOM soon starts to become a limitation when dealing with a highly-interlinked web of information. The answer (as it turns out) is quite simple: think graph.

The benefits of the Linked Data approach are clear: we provide data consumers a single, trustworthy, easy to use source of information about our products. The Linked Data is the API. The benefit to our end-customer is that the information published on our website, data sheets, selection guides and partner websites is consistent and up-to-date.

Progress so far

Following the basic Linked Data principles we have assigned HTTP URIs as names for things (resources) providing an unambiguous identifier. Next up we have converted data from a variety of sources (XML, CSV, RDBMS) into RDF.

One of the key features of RDF is the ability to easily merge data about a single resource from multiple sources into a single “supergraph” providing a more complete description of the resource. By loading the RDF into a graph database, it is possible to make an endpoint available which can be queried using the SPARQL query language. We are currently using Dydra as their cloud-based database-as-a-service model provides an easy entry route to using RDF without requiring a steep learning curve (basically load your RDF and you’re away), but there are plenty of other options like Apache Jena and OpenRDF Sesame. This has made it very easy for us to answer to complex questions requiring data from multiple sources, moreover we can stand up APIs providing access to this data in minutes.

By using a Linked Data Plaform such as Graphity we can make our identifiers (HTTP URIs) dereferencable. In layman’s terms when someone plugs the URI into a browser, we provide a description of the resource in HTML. Using content negotiation we are able to provide this data in one of the standard machine-readable XML, JSON or Turtle formats. Graphity uses Java and XSLT 2.0 which our developers already have loads of experience with and provides powerful mechanisms with which we will be able to develop some great web apps.

Next steps

For now, as mentioned above, the main drivers are internal. However, where possible/relevant, we are planning to make data publicly accessible under an open license. We’re currently experimenting with making test data available via a beta public SPARQL endpoint and linked data app as we develop it. These are by no means finished, but represent our current progress.

Over the coming months we will be making more and more data available as linked data and gradually reworking our existing publications to use the new data source. Alongside this we are confident that development of new and exciting ways to explore and visualize information will be made much easier by having a great easy-to-use source of information. We already have some cool ideas for faceted search type interfaces that we will be launching in beta as soon as we can. Keep your eyes peeled!

Feedback

We’d really appreciate your input and thoughts. If you have any cool ideas about how you might want to use our data, or would like to see more of our data published using other vocabularies, we’d love to hear from you. Also we’d be very interested to hear from anyone else in the electronics industry interested in making product data more open and aligned. Please leave a comment below or tweet us at @nxpdata.

Posted by John Walker  /   January 07, 2013  /   Posted in Slider Main  /   21 Comments