Image Scraper in BASH
Moksh Jain, 16IT221
Nishanth Hebbar, 16IT234
Suyash Ghuge, 16IT114
Abhishek Kamal, 16IT202
Code at: github.com/MJ10/Unix-Project
Introduction
Growing popularity of Machine Learning.
Machine Learning models requiring a lot of Data.
Categorized datasets hard to find.
Creating new datasets hard for beginner.
Functioning
Can be used by beginners to create toy datasets to test machine learning models.
Scrapes Google Search results for a particular category name, and downloads the specified number of images.
Scraped images can be resized to the desired dimensions and saved.
Commands Used
awk
convert
curl
cat
egrep
mkdir
rm
rmdir
wget
which
...
awk
- The basic function of awk is to search files for lines (or other units of text) that contain certain patterns.
- When a line matches one of the patterns, awk performs specified actions on that line. awk continues to process input lines in this way until it reaches the end of the input files.
awk -F <delimiter> <action> <file>
convert (ImageMagick)
- The Convert command is used to convert between image formats as well as resize an image, blur, crop, dither, draw on, flip, join or re-sample images.
- It is a part of the ImageMagick software suite.
convert $input_file -resize $RESIZE_WIDTH\x$RESIZE_HEIGHT! $output_file &
which
- Which is used to return the pathnames of the files ( or links ) which would be executed in the current environment.
- It does this by searching the paths in the path environment variable for executable files matching the names of arguments.
- Which doesn’t allow symbolic links.
which [filename/command]
curl
- curl - Transfers data from or to a server, using one of the protocols: HTTP, HTTPS, FTP, FTPS, SCP, SFTP, TFTP, DICT, TELNET, LDAP or FILE.
- curl supports features like pause and resume of downloads and it has around 120 command line options for various tasks.
curl [options] [URL...]
wget
- wget command stands for “web get”. It is a command line utility for downloading files from the Internet.
- It supports downloading multiple files, downloading in the background, resuming downloads, limiting the bandwidth used for downloads and viewing headers.
- It can also be used for taking a mirror of a site.
wget [options] [url]
egrep
- Egrep command is used to search for a pattern extended regular expressions. Egrep is essentially the same as running grep with the option -E option. This version of grep is efficient and fast.
- In case of egrep, even if you do not escape the meta-characters,it would treat them as special characters and substitute them for their special meaning instead of treating them as a part of string.
egrep [option] pattern [file…]
Screenshots

Screenshots

Screenshots

References
https://www.gnu.org/software/wget/manual/
https://curl.haxx.se/docs/manpage.html
https://www.imagemagick.org/script/convert.php
https://linux.die.net/man/1/egrep
https://www.gnu.org/software/gawk/manual/gawk.html
https://images.google.com
Unix-Project
By Moksh Jain
Unix-Project
Presentation for Unix(IT202) Course Project
- 341