Is Linked Data the future of data integration in the enterprise?

Posted by John Walker  /   January 07, 2013  /   Posted in Slider Main  /   21 Comments

In today’s multi-screen world we need to be able to get relevant content to the customer, at the right time in their format and language of choice. Life is no longer as simple as making PDF documents, we need to feed consistent information to all our multi-channel publications (web/mobile/print) and those of our partners. This blog post explains how NXP is tackling these challenges.

At NXP, similar to many other large organizations, we have what you might call a ‘brownfield’ information landscape. Data and content is scattered and duplicated across numerous applications and databases leading to inconsistent information, complex systems and inefficient processes. The outcome is that people are unable to find what they are looking for, or find conflicting information. Common approaches like Data Warehousing, Master Data Management and Service-Oriented Architecture all tackle parts of the problem, but none provide a total solution.

Our aim is to provide a single, up-to-date ‘canonical’ source of information that is easy to use and that data consumers (be they human or machine) can trust. By doing this we can make sure all our publications are kept up-to-date with the minimum amount of fuss.

The answer we found was to take the best of the previously mentioned approaches and combine with a liberal sprinkling of “Linked Data”. The result is a giant graph of data. Think of it like a social network, but with things (products, places, technologies, applications, documents, web pages) as well as people.

Linked Data… what’s that?

The idea of the Linked Data has been around for a while already, as Sir Tim Berners-Lee put it “The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.” In the past few years this approach has been used in publishing government data sets and has been steadily gaining traction within media and enterprise sectors.

We have been eager to jump into this approach ourselves. For the the past few years we have been making use of data dictionaries to ensure high-quality, structured product data based on canonical property definitions. So far we have mostly been using this data to reproduce existing publications like data sheets and product web pages in a more efficient manner. However we always felt “spoofing” these publications was not really showing off the data to its full potential. So in the past year we have been looking at new and innovative ways to publish our product data. When looking at Linked Data and RDF, we recognised an immediate affinity with our dictionary-based strategy.

For those readers unfamiliar with RDF and Linked Data, I will not go into details here, rather advise you to read a quick intro to RDF and the excellent Linked Data Book.

Business drivers

The main business drivers for this have, so far, been mostly internal: how to ensure product data is available across internal systems in an easy-to-use and efficient manner. In many areas simple XML messages and other B2B standards are applicable, but within a portfolio of over 20,000 products across a broad variety of functions and technology it simply is not possible to describe the complex properties of an electronic component with a handful of standard XML elements. Also the tree-based XML DOM soon starts to become a limitation when dealing with a highly-interlinked web of information. The answer (as it turns out) is quite simple: think graph.

The benefits of the Linked Data approach are clear: we provide data consumers a single, trustworthy, easy to use source of information about our products. The Linked Data is the API. The benefit to our end-customer is that the information published on our website, data sheets, selection guides and partner websites is consistent and up-to-date.

Progress so far

Following the basic Linked Data principles we have assigned HTTP URIs as names for things (resources) providing an unambiguous identifier. Next up we have converted data from a variety of sources (XML, CSV, RDBMS) into RDF.

One of the key features of RDF is the ability to easily merge data about a single resource from multiple sources into a single “supergraph” providing a more complete description of the resource. By loading the RDF into a graph database, it is possible to make an endpoint available which can be queried using the SPARQL query language. We are currently using Dydra as their cloud-based database-as-a-service model provides an easy entry route to using RDF without requiring a steep learning curve (basically load your RDF and you’re away), but there are plenty of other options like Apache Jena and OpenRDF Sesame. This has made it very easy for us to answer to complex questions requiring data from multiple sources, moreover we can stand up APIs providing access to this data in minutes.

By using a Linked Data Plaform such as Graphity we can make our identifiers (HTTP URIs) dereferencable. In layman’s terms when someone plugs the URI into a browser, we provide a description of the resource in HTML. Using content negotiation we are able to provide this data in one of the standard machine-readable XML, JSON or Turtle formats. Graphity uses Java and XSLT 2.0 which our developers already have loads of experience with and provides powerful mechanisms with which we will be able to develop some great web apps.

Next steps

For now, as mentioned above, the main drivers are internal. However, where possible/relevant, we are planning to make data publicly accessible under an open license. We’re currently experimenting with making test data available via a beta public SPARQL endpoint and linked data app as we develop it. These are by no means finished, but represent our current progress.

Over the coming months we will be making more and more data available as linked data and gradually reworking our existing publications to use the new data source. Alongside this we are confident that development of new and exciting ways to explore and visualize information will be made much easier by having a great easy-to-use source of information. We already have some cool ideas for faceted search type interfaces that we will be launching in beta as soon as we can. Keep your eyes peeled!

Feedback

We’d really appreciate your input and thoughts. If you have any cool ideas about how you might want to use our data, or would like to see more of our data published using other vocabularies, we’d love to hear from you. Also we’d be very interested to hear from anyone else in the electronics industry interested in making product data more open and aligned. Please leave a comment below or tweet us at @nxpdata.

Tags

John Walker

About John Walker

As Business Analyst in the Sales and Marketing group of NXP Semiconductors I am responsible for data/content models, processes, and tooling for product information at NXP Semiconductors. I am originally from the UK, but have been living and working in the Netherlands since 2000.

21 Comments

  1. Leyli Salmanova January 8, 2013 12:39 pm Reply

    Good read, John!
    Very interesting concept too, we are excited to see it working in 2013.

  2. Pingback: Linked Data: The Future of Data Integration - semanticweb.com

  3. Mike Austin January 9, 2013 9:46 pm Reply

    This is an excellent presentation. Your data graph looks to me like a data flow diagram used in structured analysis. Very thought provoking! Thank you.

  4. Edwin van de Merbel January 11, 2013 9:21 am Reply

    Great article John, an innovative approach to help NXP’s engineers slice and dice their own product data the way they want it! Really forward looking!

  5. Ivan Lagunov January 11, 2013 12:17 pm Reply

    Nice to see this overview here, John! Well done!

  6. Susan Self January 11, 2013 10:29 pm Reply

    John, your dictionary-based approach provides a nice controlled vocabulary for mapping subjects to RDF. Are you using RDFS? Are you using or considering using OWL to provide a higher semantic level so that you can also use machine reasoning with your RDF? I’m just curious as to how ambitious your approach might become, or whether you have good reasons not to put that extra effort in, such as because the ROI is not there.

    • John Walker
      John Walker January 14, 2013 12:45 pm Reply

      Hi Susan, the data dictionaries we used are based on the ISO 13584 data model (also known as PLIB) and provide a language to describe an ontology i.e. the types of things (classes) we want to describe and the data element (properties) that can be used to describe things of a certain types (individuals / items). We also used the IEC 61360 data dictionary [1] as a starting point specifically for the electric/electronic components and have extended and refined this dictionary over the past few years – currently in the order of a few hundred classes and around 1000 properties.

      So far the mapping we have done from ISO 13584 model to the RDF domain does include elements from RDFS and OWL. Furthermore it would make sense to expect that an ontology described using the ISO 13584 could be converted to an OWL-compliant ontology. Also that the additional entailment regimes offered by OWL would be worth the effort in terms of getting additional insights and value from the data.

      In other areas where we do not have such a well-defined data dictionary, one could imagine schema being created directly using RDFS and OWL.

      [1] http://std.iec.ch/iec61360

  7. Andreas Blumauer January 14, 2013 10:43 am Reply

    John, it´s good to see how linked data is appearing more and more as a potential solution for enterprise data integration problems. I wonder what exactly the business case is for NXP, when you talk about ‘more open and aligned’ product data. Which other parties do you have in mind who will reuse this kind of data? I also wonder if you have thesauri and SKOS on your agenda to being able to link structured and unstructured data. Kind Regards! Andreas

    • John Walker
      John Walker January 14, 2013 12:58 pm Reply

      Hi Andreas.

      I would say the primary business case at NXP is to reduce the costs and effort associated in communicating the data between internal systems and publication to web, mobile and print. By providing a high-quality up-to-date source of data we feel that we can reduce the operational costs AND improve speed and efficiency.

      External parties that we expect to be interested in this data are direct end-customers of NXP, our distribution channel and vertical search engines.

      Thesauri and SKOS are indeed on the agenda.

  8. Jamshaid Ashraf January 14, 2013 1:45 pm Reply

    John, thanks for sharing. we need more and more such practical example where RDF/LD/SPARQL are used.
    BTY, how are you planning to reflect the updates to the data? I mean is there any interface to update values or you are using scripts to pull actual data from back end systems.

    • John Walker
      John Walker January 15, 2013 4:07 pm Reply

      Currently things are mostly based around dumps from existing systems. We are looking to move to a more real-time (messaging) way of working where a source system can update the information about an individual resource. Something along the lines of a ‘graph per aspect’ pattern [1]. Where the RDF is the source, you could imagine that there is an interface (web forms) that would allow the values to be updated.

      [1] http://patterns.dataincubator.org/book/graph-per-aspect.html

  9. Bernadette Hyland January 14, 2013 4:03 pm Reply

    Hi John,
    Thank you for your contribution to the discussion on Linked Data in the enterprise. Hopefully you’re aware of the work happening in the W3C Linked Data Platform working group, see http://www.w3.org/2012/ldp/charter that is focused on use of linked data enterprise use for data interoperability. IBM, Oracle, EMC, as well as smaller tech companies are involved in this very productive group. Also, have you seen the Linking Enterprise Data book (Springer 2010), freely available in HTML at http://3roundstones.com/linking-enterprise-data/ or in hardback. A new developers guide to Linked Data written by serious practitioners in leading edge enterprises is available from Manning (early release version available now, summer 2013 for print release), see http://3roundstones.com/2012/12/04/new-book-linked-data/. Thank you again & please keep up the posts about your progress — it helps us advance this worthwhile approach to data interoperability.

    • John Walker
      John Walker January 16, 2013 12:18 pm Reply

      Hi Bernadette, Yes I have read through the LDP info and it sounds really interesting. As mentioned we are using Graphity which is implementing the LDP specs. I read the Linking Enterprise Data book a while back and found it to be really useful and interesting, especially the success stories… hopefully NXP can make it into the next edition!

  10. Pingback: Is Linked Data the future of data integration in the enterprise? « Another Word For It

  11. Pingback: Distributed Weekly 190 — Scott Banwart's Blog

  12. Pingback: is Linked Data the future of data integration in the enterprise? - Neo Technology

  13. Luis Criado January 20, 2013 6:34 pm Reply

    Interesting article. I think one of the applications of the Semantic Web is the integration between companies. I know the family DesFire and Semantic Web technology. http://purl.org/crtm/TTPDesfire

  14. Robby Pelssers February 3, 2013 6:56 pm Reply

    Hi John,

    great article…

    As a side note for the ones interested in learning more about Linked Data, a free course is starting the 4th of february: https://openhpi.de/

    Robby

  15. Pingback: Open LinkedData Solutions | Pearltrees

  16. Gefrierschrank no frost February 10, 2013 10:35 pm Reply

    I would say yes, linked data is the future of data integration in the enterprise and i think it would adopt a latest form of linking very soon.

  17. Erwin Folmer May 14, 2013 12:04 am Reply

    Hi John,

    Great article! Would you be interested to do a talk on this subject at our “Pilot Linked Open Data the Netherlands” event to be held on July 3rd at the RCE, Amersfoort.

    Let me know if you are interested and then I would be more than happy to share more information!
    (Btw, you may chose the language for your talk: Both English and Dutch are fine)

    Best regards, Erwin

Post a Comment

Your email address will not be published. Required fields are marked *

*