August 23
🏢 In-office - London
Distributed Systems
Hadoop
Java
JavaScript
Kubernetes
NoSQL
Numpy
Pandas
Postgres
Python
Redis
Selenium
Spark
SQL
• Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites. • Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes. • Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives. • Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction. • Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks. • Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process. • Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.
• Bachelor’s or master’s degree in computer science, information systems, or information technology • Strong understanding of web technologies, data structures, and algorithms. • Knowledge of database management systems and data warehousing. • Proficiency in programming languages such as Python, Java, or C++ is essential. • Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites. • Knowledge of HTTP and HTTPS protocols • A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary • Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data. • Understanding distributed systems and technologies like Hadoop or Spark • Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup • Understanding how search engines work and how to optimize web crawling. • Experience in Machine Learning to improve the efficiency and accuracy of web crawling • Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data.
• Daily lunch vouchers • Contribution to a Gympass subscription • Monthly contribution to a mobility pass • Full health insurance for you and your family • Generous parental leave policy
Apply Now