Our client accumulated over 2.3 billion news articles and posts from various sources, over the course of more than 15 years. They wanted to expose that rich dataset to analyse data for their clients through an easy to use and familiar web solution, but had no way of accomplishing satisfying performance with the existing technology. They were looking for a big data solution which would provide the needed storage capacity, performance considerations and custom searching language that will offer all the needed options for the end clients.
Realtime searching and analysis on a large dataset
Serving large number of clients simultaneously
Stable and reliable search platform
Provide statistics, reporting, and charts
Over 2.3 billion news articles and posts from various sources, for a period of more than 15 years with varying size and structure
Achieve satisfying performance results
Possibility to scale for the next several years
RESEARCH, COMPARISON OF TECHNOLOGIES
Selection of best suited technology: Elasticsearch
Design and sizing for an ElasticSearch cluster
Custom solution for parsing and indexing existing archive and live data
SPECIFICATION
Use cases and scenarios
System architecture
DEVELOPMENT
REST search API with custom query language
User interface for searching, filtering, aggregation and analysis of result data
Integration with existing portal
TESTING
Automated unit and integration tests
Deployment & Configuration
Monitoring & Audit logs
Service for searching and analysis of data that is easy to use and integrated in a familiar portal
Standard format and categorization the complete dataset
Enriched dataset with sentiment, language and gender analysis
Distributed and scalable system with high performance