More thorough testing & error handling

Stress testing with ApacheBench and a little bit of penetration testing

Rakha Kanz Kautsar
Scrum.ai

--

As we strived to serve Scrum.ai as a Software-as-a-service, we need to be ready for everything, that includes scaling and security aspects.

One way to test if our app can handle a big volume of request is though stress testing. For this purpose, I came across a novel tool called ab, or ApacheBench. It’s easy to use and give a fairly good metrics to evaluate.

Let’s start with testing our landing page.

ab -n 1000 -c 50 -s 3 https://scrum-ai-test.herokuapp.com/

Here we define the number of request we want to stress test, that is 1000 request (the landing page is just serving static content and shouldn’t have that much traffic). Then we define the concurrency level, 40, to simulate 50 user hitting the landing page at the same time. Then we define the timeout or how long should we wait for a response before closing the connection, 3 second.

After waiting about half a minute or so, we get the result.

Server Software:        Cowboy
Server Hostname: scrum-ai-test.herokuapp.com
Server Port: 443
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
TLS Server Name: scrum-ai-test.herokuapp.com
Document Path: /
Document Length: 503 bytes
Concurrency Level: 50
Time taken for tests: 25.238 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 671000 bytes
HTML transferred: 503000 bytes
Requests per second: 39.62 [#/sec] (mean)
Time per request: 1261.917 [ms] (mean)
Time per request: 25.238 [ms] (mean, across all concurrent requests)
Transfer rate: 25.96 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 751 968 243.6 919 2274
Processing: 245 275 24.2 273 472
Waiting: 244 273 19.9 273 444
Total: 1019 1243 245.7 1194 2562
Percentage of the requests served within a certain time (ms)
50% 1194
66% 1224
75% 1245
80% 1258
90% 1325
95% 1388
98% 2398
99% 2455
100% 2562 (longest request)

We can see that the average time per request is 1261 ms, and about 1000ms is wasted on the connection time (from Indonesia to US West, as US West is the only available region for Heroku Free plan). We can also see that all of it finishes within 3 seconds (the longest is 2.5s).

What can we learn from this numbers? Well, for one, we can see that Heroku gives a reasonable stable connection for their server, and we can see that it can handle a traffic spike just fine. But we can further improve the connection time with things like CDN that can also reduce the number of traffic to our server.

One more endpoint to stress test is the slack event endpoint. Basically, it’ll easily be the most frequently hit endpoint in our small server. All slack events will hit this endpoint, and if we want to serve it as a SaaS, each team slack will hit this endpoint at the same time with big volume of data.

ab -n 10000 -c 40 -s 3 -m post https://scrum-ai-test.herokuapp.com/slack/events

We add -m post to define the HTTP method we use to hit the URL. Here’s the result.

Server Software:        Cowboy
Server Hostname: scrum-ai-test.herokuapp.com
Server Port: 443
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
TLS Server Name: scrum-ai-test.herokuapp.com
Document Path: /slack/events
Document Length: 506 bytes
Concurrency Level: 40
Time taken for tests: 290.743 seconds
Complete requests: 10000
Failed requests: 0
Non-2xx responses: 10000
Total transferred: 6910000 bytes
HTML transferred: 5060000 bytes
Requests per second: 34.39 [#/sec] (mean)
Time per request: 1162.970 [ms] (mean)
Time per request: 29.074 [ms] (mean, across all concurrent requests)
Transfer rate: 23.21 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 739 870 163.4 849 8839
Processing: -179 291 131.9 281 11417
Waiting: 0 289 131.2 280 11417
Total: 984 1161 215.8 1133 12248
Percentage of the requests served within a certain time (ms)
50% 1133
66% 1165
75% 1191
80% 1219
90% 1272
95% 1313
98% 1441
99% 1977
100% 12248 (longest request)

We can see an outlier here. A 12 second long request. Most likely this is the time Heroku takes to warm-up the server (cold start) because we use the Free plan. Aside from that, it seems normal enough for 10.000 request, all of it came out as successful. But if we want to scale, we might want to implement an event queue like I mentioned in my earlier blog:

Now, let’s do a little bit of penetration testing. Here, I’ll do a white box testing for injection, specifically SQL Injection. We use TypeORM with PostgreSQL, so we want to test if TypeORM canhandle SQL Injection attack.

Here, we are trying to see if we can add a task with the name containing an apostrophe. As we know, apostrophe is used in SQL, so if we inject an apostrophe, we expect an error if it’s not handled correctly.

But, well, the bot doesn’t respond, this is not good. But it’s a good thing we do this penetration testing before the app went public. Let’s look at the database.

The task is saved into the database correctly, along with the apostrophe, it seems that TypeORM handles that correctly. But the papertrail integration alerts an error in the application:

Well it seems that the error is with the application itself when handling the request, not with the TypeORM handling the SQL Injection. It’s a good thing we setup papertrail in advance to alert us for errors and exceptions.

Anyway, it’s not a good thing at all that our bot didn’t respond. In the future, we must handle these kind of error and report to the user too.

--

--

Rakha Kanz Kautsar
Scrum.ai

React Native developer excited about performance and system designs. https://rakha.dev/