Clipit - Elastic Search Cluster

ABOUT THIS PROJECT

Our client accumulated over 2.3 billion news articles and posts from various sources, over the course of more than 15 years. They wanted to expose that rich dataset to analyse data for their clients through an easy to use and familiar web solution, but had no way of accomplishing satisfying performance with the existing technology. They were looking for a big data solution which would provide the needed storage capacity, performance considerations and custom searching language that will offer all the needed options for the end clients.

THE GOAL

Realtime searching and analysis on a large dataset

Serving large number of clients simultaneously

Stable and reliable search platform

Provide statistics, reporting, and charts

 

THE CHALLENGE

Over 2.3 billion news articles and posts from various sources, for a period of more than 15 years with varying size and structure

Achieve satisfying performance results

Possibility to scale for the next several years

The process
RESEARCH, COMPARISON OF TECHNOLOGIES

Selection of best suited technology: Elasticsearch

Design and sizing for an ElasticSearch cluster

Custom solution for parsing and indexing existing archive and live data

SPECIFICATION

Use cases and scenarios

System architecture

DEVELOPMENT

REST search API with custom query language

User interface for searching, filtering, aggregation and analysis of result data

Integration with existing portal

TESTING

Automated unit and integration tests

Deployment & Configuration

Monitoring & Audit logs

THE RESULT

Service for searching and analysis of data that is easy to use and integrated in a familiar portal

Standard format and categorization the complete dataset

Enriched dataset with sentiment, language and gender analysis

Distributed and scalable system with high performance