Comments
bruce.armstrong wrote: Somebody just said it better than I did, and with more chops to say it: Open Letter to Mark Zuckerberg, Sheryl Sandberg & Facebook Mobile
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


The Real News Is Not that Facebook Serves Up 1 Trillion Pages a Month...
It’s how much load that really generates and how it scales to meet the challenge

It’s how much load that really generates and how it scales to meet the challenge.

There’s some amount of debate whether Facebook really crossed over the one trillion page view per month threshold. While one report says it did, another respected firm says it did not; that its monthly page views are a mere 467 billion per month.

In the big scheme of things, the discrepancy is somewhat irrelevant, as neither show the true load on Facebook’s infrastructure – which is far more impressive a set of numbers than its externally measured “page view” metric.  Mashable reported in “Facebook Surpasses 1 Trillion Pageviews per Month” that the social networking giant saw “approximately 870 million unique visitors in June and 860 million in July” and followed up with some per visitor statistics, indicating “each visitor averaged approximately 1,160 page views in July and 40 per visit — enormous by any standard. Time spent on the site was around 25 minutes per user.”

image

From an architectural standpoint it’s not just about the page views. It’s about requests and responses, many of which occur under the radar from metrics and measurements typically gathered by external services like Google. Much of Facebook’s interactive features are powered by AJAX, which is hidden “in” the page and thus obscured from external view and a “page view” doesn’t necessarily include a count of all the external objects (scripts, images, etc…) that comprises a “page”. So while 1 trillion (or 467 billion, whichever you prefer) is impressive, consider that this is likely only a  fraction of the actual requests and responses handled by Facebook’s massive infrastructure on any given day.

Let’s examine what the actual requests and responses might mean in terms of load on Facebook’s infrastructure, shall we?

SOME QUICK MATH

Loading up Facebook yields 125 requests to load various scripts, images, and content. That’s a “page view”. Sitting on the page for a few minutes and watching Firebug’s console, you’ll note a request to update content occurs approximately every minute you are on a page. If we do the math – based on approximate page views per visitor, each of which incurs 125 GET requests – we can math that up to an approximation of 19,468 RPS (Requests per Second).

That’s only an approximation, mind you, and doesn’t take into consideration the time factor, which also incurs AJAX-based requests to update content occurring on a fairly regular basis. These also add to the overall load on Facebook’s massive infrastructure. And that’s before we start considering the impact from “unseen” integrated traffic via Facebook’s API which, according to the most recently available data (2009) was adding 5 billion requests a day to that load. If you’re wondering, that’s an additional 57,870 requests per second, which gives us a more complete number of 77,338 requests per second.

SOURCE: webcast2009 Interop F5 Keynote

image

Let’s take a moment to digest that, because that’s a lot of load on a site – and I’m sure it still isn’t taking into consideration everything. We also have to remember that the load at any given time could be higher – or lower – based on usage patterns. Averaging totals over a month and distilling down to a per second average is just that – a mathematical average. It doesn’t take into consideration that peaks and valleys occur in usage throughout the day and that Facebook may be averaging only a fraction of that load with spikes two and three times as high throughout the day.

That realization should be a bit sobering, as we’ve seen recent DDoS attacks that have crippled and even toppled sites with less traffic than Facebook handles in any given minute of the day.

The question is, how do they do it? How do they manage to keep the service up and available despite the overwhelming load and certainty of traffic spikes?

IT’S the ARCHITECTURE

Facebook itself does a great job of discussing exactly how it manages to sustain such load over time while simultaneously managing growth, and its secret generally revolves around architectural choices. Not just the “Facebook” application architecture, but its use of infrastructure architecture as well. That may not always be apparent from Facebook’s engineering blog, which generally focuses on application and software architecture topics, but it is inherent in those architectural decisions.

Take, for example, an engineer’s discussion on Facebook’s secrets to scaling to over 500 million users and beyond. The very first point made is to “scale horizontally”.

quote-badge

This isn't at all novel but it's really important. If something is increasing exponentially, the only sensible way to deal with it is to get it spread across arbitrarily many machines. Remember, there are only three numbers in computer science: 0, 1, and n. (Scaling Facebook to 500 Million Users and Beyond (Facebook Engineering Blog))

Horizontal scalability is, of course, enabled via load balancing which generally (but not always) implies infrastructure components that are critical to an overall growth and scalability strategy. The abstraction afforded by the use of load balancing services also has the added benefit of enabling agile operations as it becomes cost and time effective to add and remove (provision and decommission) compute resources as a means to meet scaling challenges on-demand, which is a key component of cloud computing models.

In other words, in addition to Facebook’s attention to application architecture as a means to enable scalability, it also takes advantage of infrastructure components providing load balancing services to ensure that its massive load is distributed not just geographically but efficiently across its various clusters of application functionality. It’s a collaborative architecture that spans infrastructure and application tiers, taking advantage of the speed and scalability benefits afforded by both approaches simultaneously.

Yet Facebook is not shy about revealing its use of infrastructure as a means to scale and implement its architecture; you just have to dig around to find it. Consider as an example of a collaborative architecture the solution to some of the challenges Facebook has faced trying to scale out its database, particularly in the area of synchronization across data centers. This is a typical enterprise challenge made even more difficult by Facebook’s decision to separate “write” databases from “read” to enhance the scalability of its application architecture. The solution is found in something Facebook engineers call “Page Routing” but most of us in the industry call “Layer 7 Switching” or “Application Switching”:

quote-badge

The problem thus boiled down to, when a user makes a request for a page, how do we decide if it is "safe" to send to Virginia or if it must be routed to California?

This question turned out to have a relatively straightforward answer. One of the first servers a user request to Facebook hits is called a Load balancer; this machine's primary responsibility is picking a web server to handle the request but it also serves a number of other purposes: protecting against denial of service attacks and multiplexing user connections to name a few. This load balancer has the capability to run in Layer 7 mode where it can examine the URI a user is requesting and make routing decisions based on that information. This feature meant it was easy to tell the load balancer about our "safe" pages and it could decide whether to send the request to Virginia or California based on the page name and the user's location. (Scaling Out (Facebook Engineering Blog))

That’s the hallmark of the modern, agile data center and the core of cloud computing models: collaborative, dynamic infrastructure and applications leveraging technology to enable a cost-efficient, scalable architectures able to maintain growth along with the business.

SCALABILITY TODAY REQUIRES a COMPREHENSIVE ARCHITECTURAL STRATEGY

Today’s architectures – both application and infrastructure – are growing necessarily complex to meet the explosive growth of a variety of media and consumers. Applications alone cannot scale themselves out – there simply aren’t physical machines large enough to support the massive number of users and load on applications created by the nearly insatiable demand consumers have for online games, shopping, interaction, and news. Modern applications must be deployed and delivered collaboratively with infrastructure if they are to scale and support growth in an operationally and financially efficient manner.

Facebook’s ability to grow and scale along with demand is enabled by its holistic, architectural approach that leverages both modern application scalability patterns as well as infrastructure scalability patterns. Together, infrastructure and applications are enabling the social networking giant to continue to grow steadily with very few hiccups along the way. Its approach is one that is well-suited for any organization wishing to scale efficiently over time with the least amount of disruption and with the speed of deployment required of today’s demanding business environments.

Read the original blog entry...

About Lori MacVittie
Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.

Enterprise Open Source Magazine Latest Stories . . .
Grid Dynamics, an eCommerce technology solutions company, and GridGain Systems, makers of an open source in-memory platform for Big Data processing, on Wednesday announced the expansion of their partnership which began in 2008. Grid Dynamics provides personalization and big data solut...
Before embarking on using open source cloud technology for your web property, a basic understanding of cloud, as it’s used in the industry, is essential. While there might be exceptions, here are the definitions. A software application delivered on the web instead of installing standa...
Private clouds solve many problems for enterprises and bring unique operational challenges along with them. There are dozens of companies of all sizes that will build you a private cloud and turn over the keys – then what? Trying to convert a traditional enterprise IT operations team t...
The networking industry has gone through different waves over last 30+ years. In the ’80s, the first wave was all about connecting and sharing; how to connect a computer to other peripheral devices and other computers. There were many players who developed technology and services to ad...
If your organization already uses virtualized infrastructure, you are well on your way to providing IT as a Service. But as businesses demand faster results in today’s competitive market, organizations look to gain more benefits from cloud computing than just virtualized infrastructure...
In this CTO Power Panel at the 10th International Cloud Expo, moderated by Cloud Expo Conference Chair Jeremy Geelan, industry-leading CTOs & VPs of Technology will discuss such topics as: Which do you think is the most important cloud computing standard still to tackle? Who should...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE