Software Engineer, Web Crawling & Indexing (Paris/London)

August 23

🏢 In-office - London

Apply Now
Logo of Mistral AI

Mistral AI

Developing the best generative AI models

11 - 50

Description

• Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites. • Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes. • Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives. • Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction. • Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks. • Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process. • Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.

Requirements

• Bachelor’s or master’s degree in computer science, information systems, or information technology • Strong understanding of web technologies, data structures, and algorithms. • Knowledge of database management systems and data warehousing. • Proficiency in programming languages such as Python, Java, or C++ is essential. • Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites. • Knowledge of HTTP and HTTPS protocols • A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary • Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data. • Understanding distributed systems and technologies like Hadoop or Spark • Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup • Understanding how search engines work and how to optimize web crawling. • Experience in Machine Learning to improve the efficiency and accuracy of web crawling • Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data.

Benefits

• Daily lunch vouchers • Contribution to a Gympass subscription • Monthly contribution to a mobility pass • Full health insurance for you and your family • Generous parental leave policy

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@techjobsuk.co.uk
Jobs by Title
Account Executive jobsAccounting Manager jobsAccountant jobsAdministration jobsAdministrative Assistant jobsAnalytics Engineer jobsAndroid Engineer jobsAttorney jobsBackend Engineer jobsBusiness Development Rep jobsBusiness Operations & Strategy jobsChief of Staff jobsCivil Engineer jobsCloud Engineer jobsCommunity Manager jobsCompliance jobsContent Marketing Manager jobsContent Manager jobsContent Writer jobsCopywriter jobsCustomer Success jobsCustomer Support jobsData Analyst jobsDatabase Administrator jobsData Engineer jobsData Entry jobsData Scientist jobsDevOps jobsEcommerce jobsElectrical Engineer jobsEmail Marketing Manager jobsEngineering Manager jobsExecutive Assistant jobsController jobsFinancial Planning and Analysis jobsFull-stack Engineer jobsFrontend Engineer jobsGame Engineer jobsGeneral Counsel jobsGraphics Designer jobsGrowth Marketing jobsHuman Resources jobsiOS Engineer jobsInfluencer Marketing jobsInfrastructure Engineer jobsIT Support jobsMachine Learning Engineer jobsMarketing jobsMedical Writer jobsMechanical Engineer jobsOperations jobsParalegal jobsPerformance Marketing jobsProduct Analyst jobsProduct Designer jobsProduct Manager jobsProject Manager jobsProgram Manager jobsProduct Marketing jobsQA Engineer jobsSDET jobsRecruitment jobsRisk jobsSales jobsSales Development Rep jobsSales Engineer jobsSalesforce Administrator jobsSalesforce Analyst jobsSalesforce Consultant jobsSalesforce Developer jobsScrum Master / Agile Coach jobsSecurity Engineer jobsSEO Marketing jobsSite Reliability Engineer jobsSocial Media Manager jobsSoftware Engineer jobsSolutions Engineer jobsSupport Engineer jobsSystem Administrator jobsSystems Engineer jobsTax jobsTechnical Account Manager jobsTechnical Writer jobsTechnical Product Manager jobsUser Researcher jobs