An International Journal House

Einstein International Journal Organization(EIJO)

Connecting People With Genius Thought

Einstein International Journal Organization(EIJO) is an international Genius Thought journals platform .
JOURNALS || EIJO Journal of Engineering, Technology and Innovative Research (EIJO – JETIR) [ ISSN : 2455 - 9172 ]
Web Scraping Technology

Author Names : 1Ms. Akansha Agarwal, 2Ms. Mahima Saini, 3Dr. Himanshu Arora and 4Ms. Shilpi Mishra  Volume 5 Issue 1
Article Overview

Internet has the vastest information and the data sources ever built by mankind. For the evolution of World Wide Web, the scenario of internet user and data exchange is fastly changes. Due to all changes, large number of users joined the internet and use the facilities. By the daily use of internet, large amount of data is available on internet and data plays an important role in every field. Researchers, market analyzer or academicians, Businessman all are share their advertisements, information on internet so that they can connect the people easily. To share and store data on internet, a new problem arises that how to handle such overloaded data and how the user will get or access the web information in least efforts. To solve this is problem, a new technique is used i.e. web scraping. In this paper, our main focus is on recovering the web information using python script. The greater part of the web data shows in unstructured configuration. Thus, web scraping is utilized for separating the unstructured information from the sites and transformed into organized way.

Keywords: Web scraping, Selenium, Beautiful Soup, HTML Tags.

Reference
  1. S.C.M de S Sirisuriya2015, A comparative study on web scraping. Proceeding of 8th international research conference, KDU.
  2. List of Web harvester, data scraper, web scrapping, software and tools, n.d. web data scraping. URL https://web data-scraping.com/web scraping-software/.
  3. Felipe Jorbao Almeida Prado Mattosinho.master thesis, mining product opinions and reviews on the web, TU Dresden.
  4. Data toolbar.computersoftware.datatoolbar.2013. Web.
  5. MCAFEE, Andrew and Erik Brynjonfsson. “Bigdata the management evolution” Harvard business review. Hank Boye, 1October 2012 web 8 April 2016.
  6. Text categorization by Fabrizio Sebastiani bipartimento Dmentematica Pura E applicate universta Di padova 35131 padova, Italy.
  7. Beautiful soup documentation_ www.crummy.com.
  8. Screen scraping: Techopedia.
  9. Urllib2_extensible library for opening URLS: https://docs.python.org.
  10. Scherenk, M.Web boats, Spyders, and screen scraper: aguide to developing internet isn’t with PHP/curl no starch press, 2007.
  11. Deepak kumar mahto, lisha singh, A drive into web scraper world, 2016 International Conference on computing for Sustainable global development (Indiacom) 978-9-3805-4421-2/16/$31.00 C,2016 IEEE.
  12. Amer Jahazvinian, sean Holbert, Nikil,Viswanathan, scrappy, simple, webscraping, department of bio-medical informatics, department of computer science , Stanford university .
  13. ELOISA Vargiu, Mirko,URRU 1,2013 exploiting web scraping in a collaborative filtering/ based approach to web advertising, artificial intelligence research. 2013, Vol.2, No. 1, https://dx.doi.org/10.5430/air.v2n1p44.