Principal Software Engineer, Big Data Architect
Working on the Linked Data Repository project, a Big Data platform that integrates Elsevier's internal data with external third party data sources, everything deployed into AWS.
- Improving data processing and storage throughput by using Hadoop framework for distributed computing across a cluster of up to twenty-five nodes.
- Building customized memory indexes for high performance information retrieval using Apache Lucene and Apache Solr, as well as an optimized Graph Database with up to 10Billion edges.
- Applying machine learning algorithms in order to identify the most significant features across different datasets.
- Creating Proof of Concepts from scratch illustrating how these data integration techniques can meet specific business requirements reducing cost and time to market.