So let's consider the example in which we have some traditional platform like Java or Ruby or Python or maybe even PHP, and we have this web server which is this blue rectangle, and then we have users that use browser as a client and they submit request. So we have four users and each of them, they're making multiple requests. So the user at the bottom, he or she, they make three requests. So the first request comes in, and then the two other requests, they are blocked, basically. They have to wait for the first request to finish.
The first request could be like, "Hey, give me that data from the database, okay?" So that would be blocking input/output. Similarly to the second user from the bottom, his or her request will be blocked by the first request, and the only way to scale in this situation is to increase the number of threads. So increase the number of servers to handle those threads because for each users, for each request that the rest of the request will be blocked by the first request until the first request is completed.
Now consider this Node.js architecture where we have same number of users and there's meeting different requests. But then we have this event loop. So event loop will delegate all those expensive input/output operations to asynchronous threads, and they would basically go and do their job. Read from a database. Write to a database. Maybe manipulate a file system...instructions form on the file system. It doesn't really matter, but it what really matters that while the system is waiting on those input/output tasks, the event loop will continue to spin around and look for new requests to delegate. And when one of the requests are coming back, it says, "Okay I have this data from the database."
Event loop will pick that request and each request will have a callback, and event loop will execute that callback. The callback, for example, will have send the data back to the user. So as a user we will get here his or her request back. So this is how, just with a single thread, we can handle multiple requests at the same time. Because while each request is processed by the database or by the file system...we don't really interested in that. We don't really worry about that, but while it's processed, we can...Our system can handle other requests. It's not blocked. So I always would like to give you this example when I'm talking about blocking versus non-blocking I/Os.
So consider a coffee shop. You have a line and you have some people waiting, and typically what happens if you have just one person in that coffee shop, or maybe the coffee shop didn't optimize and they don't have non-blocking, so they use this blocking. So what would happen, the cashier would take your order, and that same person at the register, he or she would usually just turn around and go and start making you that drink. So the entire line, the entire cube would be blocked by your order, by that task, and it could take anywhere from 30 seconds to maybe 2 or 3, maybe 5 minutes.
So imagine the pain and frustration. And then, let's say a manager walks in, sees this long line of frustrated people, some of them they just give up and they're leaving. And the only way to scale that coffee shop for the manager or the owner would be to increase the number of registers.
So that's your traditional blocking I/O concept where you have to increase the number of servers because each server is processing the task and while it's processing the tasks, the other tasks in the queue, or in the line, they're blocked. That's definitely not a good architecture. The better approach would be to use a non-blocking system and this is what you usually see in the Starbucks or any other corporate chain type of a coffee shop because they studied it.
They learned what is better for the performance and for the customers. So in this approach, you would just make your order. You would pay at the register and then you would get your ticket, like a line number, and then you can sit at your table, maybe watch some Udemy videos. So you don't have to stay in that line, and what usually happens, they have more than one person behind the bar.
So one person would yell to another. One person at the register, the cashier, he or she would yell to another person, like, "Hey make me a frappuccino," or something like that. And then the cashier would continue to process the next people in the line so you can go and sit at your table because you know that your order is queued and when it's ready they would call your name and you would come up from completely different side of the bar and get your drink when it's ready.
So everyone is more happier than in the previous example and we have less frustrated people. In this architecture, the way to scale, it's really easy. You don't have to add more registers because the line is shorter. Now people have paid, but the most expensive I/O is actually the make of the drinks. That's how you can scale. So you don't need more register. You don't need more servers in the system.
Okay, so how would you scale that single-threaded system that had this non-blocking I/O? It's actually very, very trivial to do that. There is a core cluster module. It's a core module. So you can use that, but it's very rudimentary. It's very low-level.
So better of approach would be to use StrongLoop's PM, which is process manager, or PM2, process manager two. Those are great libraries. They will allow you to scale vertically. Vertically, it means you take advantage of the...each individual machine. So increase the capacity, either by adding more processes, by adding more CPU, by adding more RAM to that machine. And the horizontal, it's the reverse. It's where you add more machines like that.