With the purpose of developing better strategies to identify hidden risks and building better products, web data extraction can help to transform data into a structured format that can assist businesses in making key decisions in today’s ever-changing environment.
However, this is an area that has become increasingly complex, as a result of evolving structures, practices and organisational requirements.
To break down some of the most important issues, Scrapinghub hosted this year’s Web Data Extraction Summit, a virtual event including a series of presentations and panels that saw a range of speakers discuss latest trends in data extraction, web scraping best practices, and more.
Encompass’ Head of Product, Alan Samuels, attended and shares some of his reflections after taking in a range of sessions that were part of a busy agenda.
One of the key things to note is the level of interest in this event. More than 1,800 people registered around the globe, and contributed to the conversation around topics including running a business on web scraped data, how venture capital firms use web data to find billion-dollar companies, how web scraping can help counter human trafficking and legal compliance in the world of web scraping.
Hearing from those at the centre of the industry showed that there is a growing interest in sharing ideas about best practices and encouraging innovation, with these topics being highlighted on a bigger stage than ever before.
We know that extracting web data at scale is complicated. There are lots of challenges to take into account, and this is a key driver behind the innovation – a theme throughout the summit this year, as experts offered their own thoughts and experience.
The session on ‘Web Scraping Tech Stack for 2020,’ led by Ondra Urban, Engineering Team at Apify, reminded us that web scraping is so much more than just parsing HTML It covered technologies to address varied parts of the web scraping challenge including: scaling to process high volumes of data, overcoming anti-scraping systems, processing results in real-time or near real-time, and, importantly, monitoring changes to website structures and continuously adjusting so that the scraping processes don’t break.
Away from the technical focus, legal compliance was also one of the highlights of the day, with the panel session dedicated to this being of particular interest.
We heard from the likes of Paul Griffin, CEO of First Compliance, Irish data protection specialists, and Sarah McKenna, who has years of experience in leading technology-focused transformation projects, as they gave their perspectives on the background and positions in relation to a number of industry issues.
In doing so, they discussed areas such as GDPR, the California Consumer Privacy Act (CCPA) and the impact of global data legislation and what it means. They also looked at current cases, with these practical examples adding more to the understanding of where compliance fits when considering the industry as a whole.
There are powerful use cases for web scraping, not least of all helping to end human trafficking, and this event provided a valuable platform to join with others to hear insights into the state of the industry and, importantly, the latest techniques and tools being used to solve real business