Want to learn more? Take the full course at [ Ссылка ] at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
Welcome to Intermediate Shell! My name is Susan Sun, and I do data work. I'm looking forward to learning with you in this course.
In data, many of us bypass the command line in favor of GUI interfaces like Anaconda and RStudio because that is what we are familiar with. However, taking the time to learn data science on the command line is a great long term investment that will, ultimately, make us better and more productive data people.
In this course, we take a practical approach and learn command line tools useful for everyday data processing and analyses.
First, let's learn how to download data files using curl.
curl, short for Client for URLs,
is a Unix command line tool
for transferring data to and from a server.
It is often used to download data from HTTP sites and FTP servers.
To check if curl has been properly installed, type the following in the command line:
man curl
If curl has not been installed, you will see:
curl command not found
To install curl, follow this link.
If curl is installed, your console will look like this:
Keep pressing Enter to scroll through the curl manual.
To exit and return to your console, press q.
The basic syntax for curl has the following structure:
curl, option flags, URL
The URL is required for the command to run successfully.
curl supports a large number of protocol calls.
For a full list, use curl dash-dash-help.
Let's download a single file stored at this hypothetical URL using curl.
To save the file with its original name datafilename-dot-txt, use the option flag dash-uppercase-O.
This reads:
curl dash uppercase-O
followed by the file URL location
To save the file under a different name, replace dash uppercase O with dash lowercase o and the new filename.
Now it reads:
curl dash lowercase o followed by the new filename and the file URL location
Oftentimes, a server will host multiple data files, with similar filenames. Like this:
Instead of curl-ing each file individually, we can use wildcards to download all the files at once.
To download every file hosted on this server that starts with datafilename and ends in dot-txt, we use:
curl dash uppercase-O https colon forwardslash forwardslash websitename-dot-com forwardslash datafilename asterisk dot txt
Another option is to increment using a globbing parser.
The following will download every file sequentially starting with datafilename001-dot-txt and ending with datafilename100-dot-txt.
Note the end of the command that reads:
square bracket zero zero one dash one hundred close square bracket-dot-txt. That's the globbing at work.
We can increment through the files and download every Nth file. For example, to download every 10th file, we can modify the globbing parser to read:
open square bracket zero zero one dash one hundred colon ten close square bracket dot txt
Sometimes Internet can time out. To make sure that our download progress is not lost, curl has these two flags:
dash-uppercase-L redirects the HTTP URL if a 300 error code occurs.
dash-uppercase-C resumes a previous file transfer if it times out before completion.
Putting everything together:
Note that all option flags come before the URL, but the order of the flags does not matter.
In this lesson, we learned how to download files using curl. Let's put our new knowledge to practice! Happy curl-ing!
#ShellTutorial #DataCamp #Data #Processing #Shell #curl
Ещё видео!