Monday, 10 September 2012



TECHNOLOGY BEHIND FACEBOOK
Facebook serves 570 billion page views per month (according to Google Ad Planner).There are more photos on Facebook than all other photo sites combined (including sites like Flickr).More than 3 billion photos are uploaded every month.Facebook’s systems serve 1.2 million photos per second. This doesn’t include the images served by Facebook’s CDN.More than 25 billion pieces of content (status updates, comments, etc) are shared every month.Facebook has more than 30,000 servers (and this number is from last year)


Software that helps Facebook


Facebook still uses PHP, but it has built a compiler for it so it can be turned into native code on its web servers, thus boosting performance.

Facebook uses Linux, but has optimized it for its own purposes (especially in terms of network throughput).

Facebook uses MySQL, but primarily as a key-value persistent storage, moving joins and logic onto the web servers since optimizations are easier to perform there (on the “other side” of the Memcached layer).

Memcached

MemcachedMemcached is by now one of the most famous pieces of software on the internet. It’s a distributed memory caching system which Facebook (and a ton of other sites) use as a caching layer between the web servers and MySQL servers (since database access is relatively slow). Through the years, Facebook has made a ton of optimizations to Memcached and the surrounding software (like optimizing the network stack).
Facebook runs thousands of Memcached servers with tens of terabytes of cached data at any one point in time. It is likely the world’s largest Memcached installation.

HipHop for PHP

HipHop for PHPPHP, being a scripting language, is relatively slow when compared to code that runs natively on a server. HipHop converts PHP into C++ code which can then be compiled for better performance. This has allowed Facebook to get much more out of its web servers since Facebook relies heavily on PHP to serve content.
A small team of engineers (initially just three of them) at Facebook spent 18 months developing HipHop, and it is now live in production.

Cassandra

CassandraCassandra is a distributed storage system with no single point of failure. It’s one of the poster children for the NoSQL movement and has been made open source (it’s even become an Apache project). Facebook uses it for its Inbox search.
Other than Facebook, a number of other services use it, for example Digg. We’re even considering some uses for it here at Pingdom.

Hadoop and Hive

HadoopHadoop is an open source map-reduce implementation that makes it possible to perform calculations on massive amounts of data. Facebook uses this for data analysis (and as we all know, Facebook has massive amounts of data). Hive originated from within Facebook, and makes it possible to use SQL queries against Hadoop, making it easier for non-programmers to use.
Both Hadoop and Hive are open source (Apache projects) and are used by a number of big services, for example Yahoo and Twitter.


No comments:

Post a Comment