Archive for February 6th, 2010

Developing a sense of Scalability

February 6th, 2010

Scott Hanselman’s recent post about exporting data from a database to a csv file where he talks about how some code could be improved. He does a very nice job of showing how and were the code could remove the smells that were present in the original design. However there are a few things that I see that if changed could improve both the performance and scalability of the system.

While this is something that depending on the complexity and scalability needs of your site one may consider over kill, it is something that takes minimal effort in design to create a system that is innately more scalable. What happens when the data you are querying starts to push out is 5-10Mb of Data to a CSV file?

Option 1

So rather than generate the list of Foo into a CSV at each request why not pre generate the list at pre configured intervals depending on the volatility of the data and save the resultant CSV to a san that can be served up via IIS. In this scenario there is no need for the ASP.NET worker process to get involved and the IIS process can do what it was meant to do serve content.

Pregenerated file system

Advantages

  • No need to hit the DB to get the data
  • You don’t have to involve the asp.net worker process and let IIS do what its designed to do

Disadvantages

  • Your data may be stale up to a preconfigured amount of time (but since its going to a report of sorts it is more than likely stale)

Option 2

So your business requirement absolutely dictates that it is required to be generated and sent as the user requests it. There is no choice, you have but to send it through the response stream. BUT there is another way. Say when this request for the data to be exported VIA an AJAX call to your server the response from the server is a URL to where the file will be output. Now the server it self can then send a message to the export to csv service and request the file be output to the san. The client can then poll the server waiting for a 200 response code, when it does get one it can then redirect the browser to the given URL.

polling file system

Advantages

  • Gives you more or less up to the minute accurate data
  • Reduces the amount of work the asp.net worker process has to perform

Disadvantages

  • Introduces polling into the design of the client

When designing systems that do not take Scalability into account from the ground up, you generally are missing something. Considering Scalability does not mean that you choose either of these two options all the time. At times it might make sense to think about using the approach suggested in Scott’s article, the key being to always think about know your data and make informed decisions.