October 2005

Conversions of the content kind

Exegenix Canada, a wholly-owned subsidiary of Tata Infotech, has created a breakthrough software product that converts documents with opaque structures into information with clear constructions

"Information is data plus context," says Dr Bill Clarke, president and chief executive officer of Exegenix Canada, a wholly-owned subsidiary of Tata Infotech*. And context gives meaning: "If I say 14 or 400, it means nothing, unless I say 14 degrees or 400mg of something."

Bill Clarke"Structure," Dr Clarke adds, "provides a key component of context. Information requires structure. If you want to, for example, research some topic on the internet, you'll go to one of these search engines and say, 'OK, I want to find everything there is out there on some topic or the other.' And it comes back and tells you that there are 15,000 hits. But 14,900 of these are really not what you needed at all. That's because, most of the time, documents themselves don't have structure."

Which is why modern organizations, thirsty for knowledge in a sea of data, understand the value of what Exegenix offers: a tool that can convert — with speed, efficiency and minimal human intervention — documents with opaque structures into information with clear constructions.

The Exegenix Conversion System (ECS) is a breakthrough software product that translates documents into a format, such as XML, or extensible mark-up language, that provides structure in a more transparent manner. "XML gives you a standardized way of representing the structure of documents for a whole lot of reasons. It unlocks the true value of information for data mining and use in a wide variety of forms and formats, including print, screen and even voice," explains Dr Clarke.

The biggest innovation Exegenix brings to the conversion process is near-complete automation. "There are three ways of doing this," says Dr Clarke. "First, there's manual conversion, which involves people keying in all these codes. Right now I suspect this is the way the bulk of conversion is done. Then there is scripted conversion, which will work provided your documents are rigidly laid out. Then there is our solution."

But automating a process like this isn't easy. To be able to achieve complete automation, a piece of software would need the ability to pick out patterns in the documents it processes, just as the human brain does. Says Dr Clarke, "Human beings look at documents and know intuitively what the different elements in them are: headings, paragraphs, tables, illustrations, captions, footnotes, etc. But to make software that approaches an electronic document in that way is extremely difficult."

Automation isn't the only innovation ECS provides. Its core structure identification technology has the ability to translate documents into any format. If another format replaces XML as the standard technology, the Exegenix solution can convert documents to it with minimal changes to its core technology. Additionally, this product can handle any European language besides English.

ECS is not tied to any particular industry and can provide solutions to any industry that requires it. Dr Clarke gives some examples: "We do legislation for the Dutch parliament. We do patent documents for the Polish patent office, we do the British budget, aircraft technical manuals for a company in California, and publications of the law courts in Canada."

Exegenix serves two kinds of companies. "There are content-centric organizations and content-heavy enterprises, people who are in the business of publishing content," says Dr Clarke. "They may be re-packagers, accumulators or aggregators of information, people who have a tremendous amount of content that they need to get marked up. Then there are content-heavy enterprises that are not in the business of distributing content per se but produce it. These include sectors such as the government and industries such as pharmaceuticals, automobiles and aerospace."

Exegenix has a flexible pricing strategy for its standout product. "We offer our customers transaction-based pricing because we have flexibility in terms of our business model," explains Dr Clarke. "You pay for what we do for you and you pay only for what's acceptable; there is no up-front cost. If you own a laptop computer already, you just download the application and that's the user interface application. It's a value-based or utility-based pricing. Again, there is a lot of flexibility and this has been recognized."

Given that it deals with its clients' sensitive documents, security is very important to Exegenix. "We have in the last month [April 2005] set up our first sensitive-documents site in the US; we have installed it on a computer in a secure location. This is a powerful point because it's an area in which we have a unique advantage. Solutions that other people offer are either highly priced or they involve outsourcing to China, India, Philippines, etc. With classified documents, that's not an option."

Dr Clarke is confident about the future of the company and is clear about the areas into which it needs to move. "We have some unique opportunities that no one else can match," he says. "We can operate in a secure environment. About 60 per cent of our business is in North America, but our work in Europe is growing rapidly and I won't be surprised if over the next couple of years our business should be a 50-50 division between North America and Europe. We've had some interest from Japan, but we are not into the up-and-down languages yet."

Though it currently deals directly with its clients, there are plans to work with original equipment manufacturers and other service companies. "Our business model is to act in support of partners of various kinds," says Dr Clarke. "For instance, with Tata Consultancy Services we want to act in support of service offerings which may include content management offerings."

Big things are on the horizon for Exegenix. "Two years down the line I see the market becoming extremely hot," says Dr Clarke. "New customers are going to show up. We're already beginning to see some very big projects."

Blending experience and vision
Exegenix, established in May 2001, may be a relatively young company, but the key minds behind it are pioneers with more than a decade of experience in the field. "We're the people who brought out the first SGML editing software and the first HTML software," says Dr Bill Clarke, the Tata Infotech subsidiary's president and chief executive officer.

"We are known in the XML world for the background that we have, our experience and our professionalism," says the man who has a PhD in astrophysics from the University of California, and still teaches the subject at the University of Toronto. More importantly, Dr Clarke brings to Exegenix decades of experience in the computer industry.

Exegenix's origins can be traced back to 1996, when chief technology officer, David Slocombe — having worked with Dr Clarke and other Exegenix staff in a company called SoftQuad, a pioneer in publishing technology — arrived in India to work with Tata Infotech to create a 'proof of concept' for the ideas that eventually led to the formation of Exegenix.

Tata Infotech's reputation as an esteemed company serves Exegenix well. "Exegenix's relationship with Tata Infotech is an important one, because it gives us greater credibility with large customers who are not prepared to deal with organizations that may not be there tomorrow," says Dr Clarke. "Having that tremendous legacy, that strength behind us, coupled with our acknowledged experience, understanding and proven track record in this area, is an important factor for a young company like ours."

*Tata Infotech was merged with TCS in 2005