Friday, September 21, 2007

Master Data Extractor / Web Data Miner at SearchSpark

Location: United States

OVERVIEW:

We are building next generation intelligent web search applications. We are looking for a brilliant software engineer with strong expertise in text mining, information extraction, information retrieval and natural language processing to help us build a product database that will allow us to dominate the "long tail" of products.

RESPONSIBILITIES:
  • Responsible for building the biggest and most heavily "meta-tagged" product catalog in our domain.
  • Find little known "off the beaten path" products on little known "out of the way" websites.
  • Work with our team to deliver the best possible experience for our users.
  • Reports to Director of Engineering.

EXPERIENCE:
  • Experience in information extraction and integration. Harvesting and extracting information from structured and unstructured data.
  • Familiar with statistical methods for data analysis, such as PMI, HMM, Naïve Bayes etc. Natural language processing skills and experience is a big plus.
  • Experience in machine learning algorithms related to search and personalization, large scale web clustering, classification and summarization.
  • Experience in large scale recommendation system, content based recommendation and collaborative filtering.
  • 5+ years experience in Java development, strong programming skills.
  • Experience in Java technology, such as JDBC, servlet, web service. Knowledge of Ruby a plus.
  • Expert knowledge of relational database, performance tuning of large data size, and experience in MySQL is preferred.
  • Experience in test driven and agile software development.
  • Feel comfortable with fast-paced development in the environment of small startup.
  • Experience in large scale crawling, deep web crawling, knowledge of tools like nutch, hadoop a plus.
Link

No comments: