Scribe is a server for aggregating log data streamed in real-time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.
Scribe servers are arranged in a directed graph, with each server knowing only about the next server in the graph. This network topology allows for adding extra layers of fan-in as a system grows, and batching messages before sending them between datacenters, without having any code that explicitly needs to understand datacenter topology, only a simple configuration.
Scribe was designed to consider reliability but to not require heavyweight protocols and expansive disk usage. Scribe spools data to disk on any node to handle intermittent connectivity node failure, but doesn't sync a log file for every message. This creates a possibility of a small amount of data loss in the event of a crash or catastrophic hardware failure. However, this degree of reliability is often suitable for most Facebook use cases.