Web Crawling & Indexing Engineer

vacanciesin.eu

About Mistral – At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world. – Our mission is to make AI ubiquitous and open. – We are creative, low-ego, team-spirited, and have been passionate about AI for years. – We hire people that foster in competitive environments, because they find them more fun to work in. – We hire passionate women and men from all over the world.- Our teams are distributed between France, UK and USA
Role Summary – We are seeking a skilled and motivated Web Crawling and Data Indexing Engineer to join our dynamic engineering team. – The ideal candidate will have a strong background in web scraping, data extraction, and indexing, with a focus on leveraging advanced tools and technologies to gather and process large-scale data from various web sources.- The role is based in Paris or London
Key Responsibilities – Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites.- Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes.- Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives.- Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction.- Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks.- Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process.- Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.
Qualifications & profile – Bachelor’s or master’s degree in computer science, information systems, or information technology- Strong understanding of web technologies, data structures, and algorithms. – They should have knowledge of database management systems and data warehousing.- Programming Languages: Proficiency in programming languages such as Python, Java, or C++ is essential. – Masterings of Web Technologies: Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites. – Knowledge of HTTP and HTTPS protocols- A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary – Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data.- Understanding distributed systems and technologies like Hadoop or Spark Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup- Understanding how search engines work and how to optimize web crawling.- Experience in Machine Learning to improve the efficiency and accuracy of web crawling- Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data.
Benefits – Daily lunch vouchers – Contribution to a Gympass subscription – Monthly contribution to a mobility pass – Full health insurance for you and your family – Generous parental leave policy

Apply now
To help us track our recruitment effort, please indicate in your email/cover letter where (vacanciesin.eu) you saw this job posting.

Published by

Recent Posts

Conducteur de Travaux TP H/F in Nancy, France

vacanciesin.eu Conducteur de Travaux TP H/F Colas, filiale du groupe Bouygues, a pour mission d’imaginer,…

3 hours ago

Lead, Software Engineer

vacanciesin.eu Description & Requirements WHAT MAKES US A GREAT PLACE TO WORK We are proud…

3 hours ago

Vétérinaire Généraliste H/F – Belleville in Belleville-en-Beaujolais, France

vacanciesin.eu La clinique Notre clinique est installée dans des locaux spacieux et récents (7 salles…

3 hours ago

Stage – Actuariat

vacanciesin.eu STAGE - ACTUARIAT - F/H Référence : 11822  STAGIAIRE  NANTES  Generali   Avec plus de 70…

3 hours ago

Registered Nurse

vacanciesin.eu 4720 Paris Ave, New Orleans, Louisiana, 70122-2553, United States of America DaVita is seeking a Nurse who is…

3 hours ago

Stage Assistante marketing et qualité en imagerie médicale H/F in Suresnes, France

vacanciesin.eu Ce poste vous donne la possibilité d’ Accompagner les équipes des différentes solutions en…

3 hours ago
If you dont see Apply Button. Please use Non-Amp Version