Mining for Text Gold

Joseff Betancourt
As an entrepreneur and technology project manager, I'm always on the hunt for new technology. Add to this my passion for open-source technology and bleeding-edge builds, and you have yourself a serious technophile on your hands.

Recently, I was looking for a text mining engine. I woke up one day and said, 'Joseff, you should play around with a text mining engine. You need the stress of training it to do what you want it to do, you need the monotonous hours of creating your dictionaries, you need to be hammered by multiple glossaries at once." I then said to myself, 'now that you've finished your MBA, you need a place to place all those hours of free time, and you know what? Training a text mining engine seems like the right fit. Who knows? It might be fun!' That's when I really woke up, sweat covering my brow, and really said, 'Shoot!'

The fact of the matter is that I really did need a text mining engine. I needed a fast, reliable, affordable and most importantly, easy-to-implement text mining engine. I had a new ChannelToday site to launch that would track everything in the IT Channel universe.

So I set out to create my own version of Google News with B2B and Channel in mind. I quickly ran into issues. I didn't have a million dollars to devote to this project (and I wouldn't even if I had it), I couldn't freely utilize any open-source text mining solutions due to hard restraints and I didn't have time. This idea was already nine years past due (honestly, you'd think someone would have jumped on the idea of mimicking Google news to make their audience's life a little easier), so time was wasting.

By chance, I found an Indian startup company called Dhiti that provided exactly what I needed as a software-as-a-service application. Dhiti is aimed at small- to medium-sized businesses that have a need to collect, organize and analyze their corporate data. Dhiti has a demonstration available in the form of its Dhiti widget-simply copy the code into your site and it works. I ended up using its impressive API structure, but implementation only took two days. Other text mining projects that I know of have taken considerably longer.

Dhiti's support is the most impressive part-the company actually listens to you, helps you with your process and (most important) when something breaks, it gives you the insight to help you fix it.

So ChannelToday.com is up and running. And you know what? It really was fun!
 



Add Comment      Leave a comment on this blog post

Post a comment

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


 

Resource centers

Business Intelligence

Business performance information for strategic and operational decision-making

SOA

SOA uses interoperable services grouped around business processes to ease data integration

Data Warehousing

Data warehousing helps companies make sense of their operational data