Want to learn more? Take the full course at [ Ссылка ] at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
Every R programmer at one point or another has uttered the phrase, my code is slow! This is usually followed with tears and curses (not necessarily in that order). But what do I mean by slow? Is one second slow? What about one minute? This is obviously problem dependent. What you need is code that is fast enough!
To determine if it is worth changing your code, you need to compare your existing solution with one or more alternatives. This is what we mean by benchmarking. The concept is straightforward. You simply time how long each solution takes, and all things being equal, select the fastest.
Benchmarking follows a two steps. First, you construct a function around the feature you wish to benchmark. Typically the function has an argument that enables you to vary the complexity of the task. For example a parameter n that alters the data size. Second, you time the function under different sceneries, such as different data sizes.
Let's have an example. Suppose we want to generate a sequence of numbers. The two obvious ways of achieving this are using the standard colon operator or the seq() function.
We begin by wrapping both options in functions and allow the sequence length parameter n, to be passed. Next, to determine how long the function takes to run, we wrap the function call with system.time().
Running this code produces three numbers: user, system and elapsed time. Roughly the ‘user time’ is the CPU time charged for the execution of user instructions. The ‘system time’ is the CPU time charged for execution by the system on behalf of the calling process. The elapsed time is approximately the sum of user and elapsed; this is the number we typically care about. So in this example, it took X.YY seconds.
One small problem is that we haven't stored the result. I often using system.time() during an analysis. For example I set my code running as I leave the office, and want to know how long the job took when I return the next morning. In this case we use the arrow operator. Using the arrow within a function call performs two tasks: argument passing and object assignment. This allows us to both time and store the operation.
As well as considering elapsed time. It's worthwhile calculating the relative time. This is simply the ratio. So in this example, the elapsed times are 0.002 and 0.008 seconds. The relative time is 40. That is the seq command is 40 times slower than using the colon.
As with all things in R there is a package that simplifies benchmarking. The microbenchmark package is a wrapper around system.time() and makes it straightforward when comparing multiple functions. The key function in this package is the unimaginatively named microbenchmark
In this code, we are comparing functions colon, seq_default and seq_by. The times argument specifies how many times we should call each function. As a bonus, the cld column provides a statistical ranking. As you would expect, the colon operator is the fastest function for generating a sequence of integers.
#DataCamp #RTutorial #Writing #Efficient #RCode
Ещё видео!