Newsletters Welcome, Guest Log In | Register

Subscribe

Sign up now and get the best business technology insights direct to your inbox.

  • Daily Edge
  • CTO Edge Update
  • Business Tools & Templates
  • Aligning IT & Business Goals
  • Maximizing IT Investments

0

Beyond ETL – Company Offers New Approach to Data Integration

by Loraine Lawson, IT Business Edge
Oct 20, 2008 12:00:00 AM

Taking a semantic data integration approach instead of the more traditional ETL (extract, transform, load) approach to data integration, expressor software launched its initial product offering, expressor 1.0. Loraine Lawson asked Bob Potter, CEO of expressor software, to explain.

 

Lawson: Tell me a little bit about expressor. How do you position yourself in the market? I assume its data integration?
Potter: To give you the basics, we’re in the data integration software market. Our direct competitors are Informatica and Ab Initio – the last two independent suppliers. The core differentiator that we have is what we call smart semantics, which gives us the ability to first rationalize or conform physical metadata to common business definitions before semantically integrating data sources and targets. Our approach is fundamentally different from the traditional ETL approaches, where the entire data integration process is built directly upon the physical metadata descriptions that reside in the sources and the targets.

 

Our smart semantics technology is very sophisticated and has been developed over many years; we brought it to market earlier this year. We think the ability to use semantics in the data integration process is really important because it gives you a level of metadata abstraction that does not currently exist anywhere else. In today’s ETL technology, everything has to be mapped, you know, using physical metadata with arcane names like "pnum" or "part_no," and business rules have transformation logic embedded in them that is tied directly to this physical metadata. As a result, there is limited reusability of business rules. So our semantic capability gives us reusability and largely eliminates data mappings, which we handle under the covers. That’s the core of what we do. Then there are two other things that we think are very important, such as data processing throughput and the ability to collaboratively develop a data integration application through our role-based UIs; but the core of what we do is semantic data integration.

 

Lawson: Is there a lot of confusion over what semantic data integration means?
Potter: Well, there’s the semantic Web. The only confusion comes when people think that what we do is somehow related to the semantic Web, which is using different concepts like OWL and RDF and some other things to make semantic queries out there on the Internet and come up with results. But that’s not what we do. The use of the word "semantics" for us is in the metadata rationalization, the reconciliation of metadata that have the same meaning but have been given different names in the various data sources that exist within an enterprise. Our business is directed at the IT professional, as opposed to end users that have different use cases.

 

We're about trying to get control over the data infrastructure, as the industry analysts call it, the data services layer in an enterprise services bus infrastructure.

 

Lawson: Can you explain why someone would choose to use the expressor semantic data integration system over more traditional ETL tools?
Potter: First of all, with respect to semantic data integration, no one does it yet except for us. The traditional suppliers have good products. They’ve been around a long time and they're big companies; e.g., Informatica is a $450 million company. So obviously people are buying their products, right?

 

But they're architecturally locked into their approach, which is heavy use of physical metadata mappings, and are not able to easily reuse business rules from one project to the next. They have also broadened their product portfolios by acquiring data quality companies or acquiring metadata management companies and then attempted to integrate those tools and products together in a suite.

 

Our approach is, "Let's start at the core with a semantic metadata foundation, with a metadata repository in the middle. Let's build our own tools that are targeted for the different IT user roles on a data integration project, such as data stewards, data architects, and ETL developers. Let's develop a uniform platform that incorporates data profiling, data quality, data transforming, and so on and base this platform on the fastest parallel processing engine in the market that can deal with the increasing data volumes and complexities that exist in the enterprise."

 

So ours was an organic-from-the-ground-up approach. The larger vendors have taken an acquisition and assembly approach. We think that we have taken the right approach to deliver a true next-generation platform that is easier to use, fosters reuse, is faster, and offers the end-to-end functionality in a fully integrated fashion.

 

Now, what is in it for the CIO? That’s basically what it comes down to, it’s not about cool technology, it’s about why would you even bother using a new technology when you’ve been using the old technology for so many years?

 

For us, it comes down to cost. It’s just too expensive to do data integration the current way. Every industry analyst I’ve ever talked to, bar none – and I’ve talked to over two dozen, including Gartner - say the best case analysis is that there’s 30 percent penetration of data integration tools in the enterprise. That’s the best case. Some other analysts say it’s more like 20 percent. So why is that? Why hasn’t this been adopted the way relational database technology has been adopted?

 

Well, the reason is because it’s too expensive. These projects cost hundreds of thousands of dollars to buy the software and then it can cost several million dollars to actually deploy one of these projects. It’s just too hard. Gartner wrote a report in August of 2007 on data integration. It basically said that data integration is too complex and too inefficient. And that’s why we think we have a powerful message for the CIO.

 

Lawson: How do you address those concerns?
Potter: First and foremost, by reusing what you’ve already done. Through the semantic capability, we don’t name things like PNUM or P_# when it’s product number. We just make everything "product number." So when you write a business rule about a product number, you can reuse it regardless of whether you’re integrating SAP with Teradata or you’re integrating Oracle with Netezza. Whatever your data integration project is, we abstract the metadata definitions to what we call business definitions. We allow people to work at the business level as opposed to the specific, arcane technical level. This has another advantage in that the analyst, steward, developer, and business user are all using the same terminology to discuss the data being manipulated, which reduces the confusion normally encountered when the business and technical folks communicate.

 

We also understand the physical layout of the data. I don’t know if I’m getting too technical or not for you, but data is stored in various systems in a specific format like a packed decimal or an integer or a string; it’s got a physical layout. We know what all those physical layouts are so we don’t require you to do basic transformations of one physical layout in the source system to another in the target system – it’s all handled internally. There’s no other product that does that.

 

Lawson: And how can you automate that? Can you give us a little peek under the hood?
Potter: Yes, because in our product, everything is stored in our semantic metadata repository. Our product knows what all of these physical layouts are. The product looks at the database and says, "Okay, I recognize what that is." And we’ll just automate that particular function, whereas the old products don’t do that, so you have to have a person who says, "Okay, this is an integer in this system, I’m going to make it a string in the target system and I’m going to write that transformation. I’m going to add it to the data integration process." It’s not required in our system. At our Web site, (http://www.expressor-software.com), there are some fairly high-level explanations of what we do.

 

Businesses want to get closer to having non-technical users build these data integrations and, essentially, that’s at the core of what we do. We allow normal people and lower-cost consultants and contractors to come in and set up these jobs or these data integration flows, as opposed to requiring you to have very highly technical people who can figure out how to develop and make these integrations work. It’s not required in our system.

 

Lawson: And is there a particular size company that you target?
Potter: There is for now, because we’re a new company. We just introduced products in July of this year and our initial target market is to go after the very high-end data integration opportunities that need the strength of the parallel processing engine that we have. We’re looking for companies that have high volumes of data that can be with very complex indeed structures – sometimes called the deep end of the data integration gene pool. So, it’s not just relational data sources, it’s XML, it’s hierarchical mainframe data. They're trying to integrate various forms of data and bring it together into some kind of an application, whether it’s a data warehouse or a data mart or it’s a data migration project. That’s our initial target audience.

 

The way that we price the software — which is on what we call a channel-basis or a usage-basis — we think positions our product to be applicable to all vertical markets and all sizes of companies. But we’ve got to start somewhere, so we’re starting at the high end of the market while rounding out the product offering.

Add a comment Leave a comment on this blog post.

There are no comments on this post

A Complete View of the Enterprise: Linking Operational and Financial Planning in Global Organizations

Read this white paper from CFO Research Services that examines why and how chief financial officers are looking to create "highly integrated" organizations by moving from standalone spreadsheets to integrated planning, budgeting, and forecasting systems.

A Dynamic Solution to Processing Paper

Discover a document management system that preserves company profits by reducing the overhead of paper invoices, utilizing employee time effectively, and significantly reducing collection cycle times.

Budget & Finance Toolkit for IT - 2010 Edition

Download a comprehensive collection of templates, forms, instruction and advice that will help you to plan and submit your 2010 IT Budget.

Learn more >

Windows 7 Upgrade Project Kit

Moving to Windows 7? The Windows 7 Upgrade Project Kit is the ideal support tool for managing all phases of an organizational upgrade to Windows 7. The tools and templates in this kit will help you develop a strategy and map out the implementation tactics which link your Windows 7 deployment to your company's bottom line.

Learn more >