At Causaly we are building the biggest knowledge platform in the world to empower people working on the most pressing issues in human health. To achieve this, we are teaching computers to read all knowledge ever published and develop an interface that allows humans to answer questions they can't ask anywhere else.
The technology is self-developed and proprietary, powering a large Biomedical Causal Knowledge Graph. It helps researchers and decision-makers to discover insights from millions of academic publications, clinical trials, patents and other data sources, in minutes. Causaly is used by Pharmaceutical companies in Research and Commercial departments, for Drug Discovery, Safety and Competitive Intelligence.
Read how Causaly is used in Target Identification here https //www.causaly.com/blog/ai-supported-target-i...
We are a VC-backed tech company with offices in London and Athens, looking for an experienced and driven Backend Engineer who would work on building, scaling and automating our data processing and information extraction pipelines.Responsibilities
- Designing, creating and maintaining optimal data processing and information extraction pipelines
- Scaling and automating new and existing data pipelines for fastest and smoothest data transition from source to production environment, optimizing data delivery
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
- Fluency working with large volumes of data (hundreds of millions of data points) in various formats, e.g. comma-separated, json, relational, graph database formats etc.
- Build analytics and troubleshooting tools to help data scientists and product owners be able to measure their work, identify and fix data leaks, work with issues at OS level
- Work with stakeholders in NLP/ML engineering, full-stack, knowledge engineering teams to design, build and continuously improve data processing modules
- Be able to understand at high level underlying operation of various data processing, machine learning stages comprising an information extraction pipelines
- Implement processes supporting data transformation, data structure manipulation, metadata, dependency and workload management
- BSc in a related technical field
- 5+ years experience working in a related field of big data processing pipelines
- Fluency in Python, Linux OS, GCP/AWS
- Strong knowledge of ElasticSearch, MySQL
- Experience with data processing (Pipelining, Storage, ETL, Analytics, Map/Reduce, ML)
- Experience with Spark and/or Pandas
- Working knowledge of message queuing and stream processing
- Working knowledge of software development best practices, e.g. testing, versioning, documentation
- Excellent problem solving, ownership, organizational skills, high attention to detail and quality
- Experience working with biomedical/life sciences data processing
- Experience with Neo4j, graph database architectures
- Experience with BigQuery
- Competitive Salary
- Individual training budget for professional development
- Plenty of opportunity to take on more responsibility as we grow
- Be part of a multinational, diverse and exceptional early team that builds a transformative knowledge product with the potential to have real impact
- Regular team outings
- Annual team retreat to secret destination
- Easily accessible office in the heart of Angel, Islington