A price comparison engine using web scraping
ShopScrap
Don’t let the name fool you, I’m not shopping for scrap, I’m rather scraping shops here ?
This is a side project of mine, written entirely in Javascript, which explores the possibility of creating a small product search engine (which just searches for one product across multiple sites) using web scraping.
It consists of an app, written in React Native, and a node server, which parses sites. The user will be required to enter a search query in the app, and the search results will be displayed in the app as well.
ShopScrap, is in no way, intended to be deployed at scale anywhere. This is just a fun side project.
I am currently only using shopping websites from the UAE since I’ve found them to change their source code rarely (which favours scraping). But do note that this project can be easily modified to include almost any website that has a search option.
Problem
You might already know that search engines use tricks like crawling and indexing to display search results. It might also include the use of trackers and the results might be contaminated due to various marketing and promotional agents working behind the scenes. For instance, if you search for a product on Google, you might see a bunch of ads along with the results. Even Amazon’s own search results promote various products.
ShopScrap aims to deliver a clean search result just by scraping the results off of various seller websites, sorting them and giving them to you.
Challenges
These are the main challenges ShopScrap faces:
- Web scraping in itself, is extremely slow
- Javascript as a language is slow too
- Seller websites might change their source code frequently, and since web scraping relies on the HTML source of the webpage, it is hard to maintain the functionality as time passes
- The results being displayed on seller websites will surely include thousands of irrelevant ones too. It is hard to sort out the relevant ones the user wants
This project aims to solve these challenges gradually, if possible in the first place ?
Installation (server)
Regardless of your method of choice to use the app, you have to install and run the node server, as this is the actual web scraper.
Prerequisites
- A computer with NodeJS installed. On Linux, it would be enough to run
apt install nodejs npm
. - Basic knowledge to use git
- ADB (Optional), in case you’re having connection issues
Steps
-
Clone this repo to your machine
git clone https://github.com/vishalkrishnads/ShopScrap.git
-
Go to the
server
directory,cd ShopScrap/server
-
Install dependencies
npm install
-
Start the server
node server.js
-
If your android device or emulator fails to connect to the server and throws an error, try reversing its port 3000 before opening any issues
adb reverse tcp:3000 tcp:3000
TIP: The server uses Chromium by default. If you want to use Firefox, modify the start command like so:
node server.js firefox
Installation (app)
If you don’t plan on editing the app’s code and playing around with it, then you would be better off with installing the apk of the latest release from the releases page.
Instead, if you’re that geek who wants to play with the app, install it by following the steps.
Prerequisites
- A PC with React Native development environment set up. Follow the steps listed under the React Native CLI Quickstart tab here to setup.
- ADB installed and running.
- Knowledge in JS is recommended if you wanna edit the app. Take this tutorial if you need a quick recap.
- A cup of coffee (I mean, patience?)
Steps
-
Create a new React Native project ShopScrap with
react-native
version 0.63.4npx react-native init ShopScrap --version 0.63.4
-
Change the working directory like so
cd ShopScrap
-
With a physical device or emulator connected via adb, verify that the sample app runs first
npx react-native run-android
If it does and you see the Welcome to React Native greeting, proceed to get the source code.
-
Intialize an empty Git repository
git init
-
Add this repository as origin
git remote add origin https://github.com/vishalkrishnads/ShopScrap.git
-
Delete the conflicting files
# Linux $ rm App.js index.js README.md package.json package-lock.json .eslintrc.js .gitignore .gitattributes app.json # Windows del App.js index.js README.md package.json package-lock.json .eslintrc.js .gitignore .gitattributes app.json
-
Pull the source
git remote pull origin main
-
Install dependencies
npm install
-
Link the icons module
npx react-native link react-native-vector-icons
-
Build and run the app
npx react-native run-android
Happy coding!!
Wrapping up
ShopScrap tries to address and solve the challenges mentioned earlier in this documentation. But this is currently riddled with bugs everywhere. Hence, any PR’s to improve the existing scraping, searching & sorting algorithms are welcome.
I am also thinking about migrating the project entirely to Kotlin, eventually eliminating the need for a separate server and making it a standalone app. Any developments regarding this will be made in a separate branch in the future. So, if you’re well versed in making Android apps with Kotlin, hit me up if you’re ready to help.