Logging All things open source: 2008

Thursday, December 25, 2008

Eventually Consistent - Revisited

Excellent posting on data consistency in distributed systems

Wednesday, December 24, 2008

Saturday, October 18, 2008

Thursday, October 9, 2008

FireWall

With so many revelations it is hard not to gain more interest in the financial world, the following is the collection of audio/videos for main street from public radio

Thursday, September 4, 2008

What every programmer should know about memory

Wednesday, September 3, 2008

Wednesday, August 20, 2008

Tuesday, July 15, 2008

Monday, July 7, 2008

Saturday, July 5, 2008

Thursday, July 3, 2008

Perl Compatible Regular Expressions

PCRE

Wednesday, July 2, 2008

Saturday, June 28, 2008

Automate

Installation, configuration, deployment, and management of many machines using following:

SystemImager

CVSup

Subcon

Friday, June 27, 2008

Thursday, June 26, 2008

Real Time System Tracing

"premature optimization is the root of all evil(or at least most of it) in programming."

Monitoring Comparative

Cacti:

Graphing Monitoring Data
Data Store in RRDtool
No reporting
Plugin support for extension

Ganglia:

Highly Scalable Monitoring system
Designed for grid and cluster computing
XML for data representation
XDR for compact portable data transport
RRDtool for data storage and visualization
Ideal for large monitoring environments
No support for events or notifications
Does not support thresholds
Extension by pluggable modules similar to apache
Gmond – Metric gathering agent installed on individual servers
Gmetad – Metric aggregation agent installed on one or more specific task oriented servers
Ported to various different platforms (Linux, FreeBSD, Solaris, others)
Apache Web Frontend – Metric presentation and analysis server
Attributes:

Multicast – All gmond nodes are capable of listening to and reporting on the status of the entire cluster
Failover – Gmetad has the ability to switch which cluster node it polls for metric data
Lightweight and low overhead metric gathering and transport

Nagios:

Monitor hosts, service and network
Plugin extension
Automatic log file rotation
Redundant monitoring hosts
web interface for viewing current network status, notification and problem history, log file, etc.
Supports thresholds
Generates events or notifications against thresholds
Supports "active checks" and "passive checks"

RRDtool:

Data logging and graphing system for time series data
Constant data storage size
Data storage is created upfront

Munin:

Written in Perl
Uses RRDTool
Plugins can be written in any language
Master/Node Architecture
Default plugins like load average, memory usage, CPU usage and network traffic

Hyperic:

Wednesday, June 25, 2008

Tuesday, June 24, 2008

SELinux

"God is a challenge because there is no proof of his existence and therefore the search must continue."

IBM Article

Graphing

"People think that computer science is the art of geniuses but the actual reality is the opposite, just many people doing things that build on each other, like a wall of mini stones."

Distributed File System

"A list is only as strong as its weakest link."

Hadoop

Hadoop Summit - March 25, 2008

Saturday, June 21, 2008

Multi-Processing Modules

The different Processing Modules supported in Apache 2.

Friday, June 20, 2008

Perpetual Data

"An algorithm must be seen to be believed."

This is pretty cool

Thursday, June 19, 2008

Testing

"Beware of bugs in the above code; I have only proved it correct, not tried it."

STAF

"The hardest thing is to go to sleep at night, when there are so many urgent things needing to be done. A huge gap exists between what we know is possible with today's machines and what we have so far been able to finish."

Wiki

dedicated to performance

Logging Services

"The most important thing in the kitchen is the waste paper basket and it needs to be centrally located."

Hash Map

"If you optimize everything, you will always be unhappy."

Google Sparse Hash

Sunday, June 15, 2008

Unix Tools

"I define UNIX as 30 definitions of regular expressions living under one roof."

A site which lists tools for Unix

For Linux:

OProfile

From GOOG:

Performance

Tuesday, June 10, 2008

Guru of the Week (gotw)

Guru of the Week is a regular series of C++ programming problems created and written by Herb Sutter. Since 1997, it has been a regular feature of the Internet newsgroup comp.lang.c++.moderated, where you can find each issue's questions and answers (and a lot of interesting discussion).

Wednesday, June 4, 2008

ZooKeeper

Building distributed systems is a zoo...

Service Monitoring

Ganglia

Tuesday, June 3, 2008

Capacity Planning

Talk on how flickr uses ganglia to help with capacity planning.

Wednesday, May 28, 2008

Scalable Nonblocking Data Structures

InfoQ has an interesting writeup of Dr. Cliff Click's work on developing highly concurrent data structures for use on the Azul hardware (which is in production with 768 cores), supporting 700+ hardware threads in Java.

Monday, May 26, 2008

jemalloc - New malloc implementation used in Firefox 3

Friday, May 23, 2008

Data Encodings - IETF specification for base 64, base 32, and base 16 encoding schemes.

Thursday, May 22, 2008

All Things Distributed

Werner Vogels, CTO of Amazon.com blogs here http://www.allthingsdistributed.com/

High Scalability is a pretty good site covering real life issues

Wednesday, May 21, 2008

Memcached - How does it work

Hash Functions

Sunday, February 3, 2008

Event Handling Framework

libevent provides a simple portable framework for getting the events which uses the most efficient possible system calls available on your system.

Saturday, February 2, 2008

Ken Thompson Refelections on Trusting Trust

BTW if you we were wondering, he is in GOOGLE now

Unix Memory Model

There are some basic regions ("segments") provided by all Unix variants:

Stack: (Variable size) This is where information about function call sequences is stored. There is microprocessor support for the stack.
Code: (Fixed size) The area of memory containing machine code instructions. Typically r+x permissions. Aka the text segment.
Data: (Fixed size) The area of memory containing initialized data. This includes static variables, string constants, etc.
BSS: (Variable size) The area of memory containing uninitialized data. This is where "heap allocated" objects live.

You can view a process' memory map with the utility pmap.

System V shared memory

System V provides an alternative mechanism for setting up shared memory, via the shmctl (), shmget (), shmat (), and shmdt () set of calls. These are not suggested for use, because

System V entities live in a seperate namespace with seperate access permissions and adminstrative tools (e.g., ipcs).
System V entities are not automatically cleaned up if all programs using them exit, and can be a resource management nightmare.

Shared memory

Physical memory can be shared between two processes merely by manipulating their page tables. This happens automatically in modern Unixes in various circumstances, e.g.,

When implementing shared libraries, the dynamic loader will mmap the library, and the kernel will share the maps amongst processes.
When forking, the child gets a copy of the parent's page table, i.e., their pages originally all coexist in physical memory; but the first time a page is written (by either), the kernel traps the write and makes a copy. Thus a child can share a large read-only data structure constructed by the parent prior to forking; although as caveat, in same languages, a read-only data structure is still written to by the runtime (e.g., garbage collection metadata).
Threads share their entire page map. The OS will simply reset the stack pointer when switching contexts, as opposed to flushing the TLB.

Besides these examples, if two processes want to share physical memory, there are two major techniques: mmap, and system V.

mmap

The mmap () system call allows the programmer to associate a region of the process virtual address space with a file. It is an extremely general purpose utility:

It allows the mapped memory to have protection attributes (readable, writable, execable).
It allows the process to have a private copy (on-write) version of the file; changes are private to the process and disappear when the process exits. Alternatively, it allows the process to share the mapping with other processes; writing the memory area is equivalent to writing the file.
The memory mapped region need not correspond to an actual file (i.e. anonymous); by creating an anonymous mmap in a parent and forking, the children can share memory.

Mmap has several desirable properties, including

The namespace for mmap corresponds to the filesystem, adhering to the "everything is a file" Unix ideal.
Access permissions correspond to file permissions.
The actual relationship between the virtual address space and physical memory consumed by the mmap is controlled by the OS; in particular, memory resources are automatically freed when all processes using an mmap either munmap or exit.

These properties stand in contrast to system V shared memory.

Associated with mmap are the system calls msync () and madvise (). msync instructs the OS to write all modified pages to disk, either synchronous (don't return until call is complete) or asynchronously (return after sync has been scheduled); the OS will also optionally asynchronously sync dirty pages to disk if the proper flag is passed to mmap. madvise provides hints to the kernel as to how the program will access the mmap, in order to optimize.

Unix Signals

Signals are an asynchronous notification mechanism. Signals are covered by a POSIX standard. Under Linux, the signal (7) man page contains the list of signals supported.
POSIX.1 signals

Event Server

Servers typically handle three types of events: File descriptor, signal and timeouts

1. File Descriptor Events

There are several system calls available to receive file descriptor events.

select (2) is the most portable and least efficient.
poll (2) is nearly as portable, less inefficient, and has a very intelligible interface.
Linux has epoll (4) which is a vastly more efficient variant of poll.
FreeBSD has kqueue, which is a single extensible kernel interface for all event handling.
Finally, POSIX.4 defines asynchronous I/O (AIO).

select (2)

Portability: Maximum, Efficiency: Worst, Notification Type: readiness, level triggered

poll (2)

Portability: Maximum, Efficiency: poor, Notification Type: readiness, level triggered

/dev/poll

Portability: solaris, Efficiency: acceptable, Notification Type: readiness, level triggered

epoll (4)

Portability: linux 2.4+ , Efficiency: good, Notification Type: readiness, level or edge triggered

POSIX.4 AIO

Portability: linux 2.6, freebsd, Efficiency: variable, Notification Type: completion

kqueue (2)

Portability: bsd, os/x , Efficiency: good, Notification Type: completion and readiness, level or edge triggered

2. Signal Events

Using kqueue makes it easy to mix signal event and file descriptor event notification. There is an event filter for signals, interest is signaled similarly to file descriptors, and the events are delivered in same way as file descriptor events.

Besides kqueue, every other way is ugly.

If the signal is delivered during the poll system call, poll will be interrupted with return value EINTR, even if timeout is -1. Thus, you will handle the signal event with low latency. The bad news is, if a signal is delivered between the first sigprocmask call and the poll call, and the poll timeout is -1, poll will not be interrupted and the signal will not be handled until (if) a file descriptor event occurs. One way to guard against this is to have a maximum poll timeout, e.g., of 100ms, which means in exchange for the (slight) extra overhead of 10 system calls a second when idle, you will have a maximum signal latency of circa 100ms.

POSIX provides the pselect (2) system call, which is like the sequence

sigprocmask (SIG_SETMASK, &mask, &oldmask);
select (...)
sigprocmask (SIG_SETMASK, &oldmask, NULL);

except that the system call eliminates the possibility of a signal being delivered between when the sigprocmask call returns and the select call begins. It was designed for the usage outlined above with poll, and therefore sounds ideal. Unfortunately, pselect is broken under Linux. Also, it uses select, and we prefer poll.

Another alternative is to have your signal handler write to a file descriptor that is included in your poll set:

With this setup, you can have an arbitrary poll timeout and maintain low latency. However, it's important to use a pipe, so that the (typically 4 byte) write and read of the signo is atomic; under POSIX, only pipes guarantee a minimum atomic read/write size larger than 4 bytes. It's also important that the pipe be set to non-blocking (see below), to avoid deadlock.

Both techniques also apply to epoll.

3. Timeout Events

The next event type of interest is the timer, e.g., you want a 50ms timeout on a sub request. In general, you may have many more simultaneous timers pending in a complicated server than you have file descriptors, since there are generally multiple timeouts per request.

Once again, kqueues makes it easy. There is a timer filter type which is treated similarly to the file descriptor filters.

Also once again, every other way makes it ugly.

poll, epoll, etc. all provide a single timeout argument to the system call. The problem, then, is to consider the entire set of timers, determine the delta-t until the next timer goes off, and use that deltat as the timeout argument to the system call. By storing the timers in a binary tree sorted by (absolute, not relative) expiration time, the next expiring timer can be found in O (log (N))) time, and creating and removing timers can also be done in O (log (N)) time; the latter is especially important, since most timers are cancelled before expiring (since they are most often used to timeout subrequests, and most of the time, your subservers are within SLA). This technique utilizes the gettimeofday (2) system call.

Thursday, January 31, 2008

Dan Kegel's Web Hostel

Binary XML

This paper discusses the use of Abstract Syntax Notation 1 (ASN.1)

Wednesday, January 30, 2008

Insightful Presentation

Open Source documentation and collaboration platform

Tuesday, January 29, 2008

Universally Unique Identifier (UUID)

Scheduling

Memory Pool

Boost

More Companies seem to be joining Dataportability.org

Monday, January 28, 2008

Thinking In C++

STL Containers...

Sunday, January 27, 2008

Message Broker

The Publish/Subscribe Messaging is one of the flexible ways to implement robust, real-time, distributed, highly available services. I have used TIBCO SmartSockets in my previous projects and love the product. I have wondered whether there is any Open source Implementation, so far I have found:

Saturday, January 26, 2008

Pimpl

"The Pimpl technique is a useful way to minimize coupling, and separate interface and implementation."

Resources:

Virtualization is being applied in all aspects of software development, this DDJ Article recommends for build environment.

google "father of C++" returns Bjarne Stroustrup's Homepage . This homepage has useful resources especially technical FAQ.

Check out this site for All About Agile

Friday, January 25, 2008

Memory Leak

Ah, not my favorite topic, but always interested to find out new ways and tools to debug memory leaks. DDJ, one of my favorite site has an article in the current issue, check it out Memory Leaks Detection: A Different Approach

In Memory Cache

I have used following in my past projects:
Commercial:

Oracle TimesTen

Open Source:

Memcached

Client libraries are available in C, Java, PHP, etc.
Clients partition data across memcached servers using modular hash algorithm 'key % # partitions'.
If an item isn't found in the cache, it's looked up in the source of truth and added to the cache. The extra cost of hitting the source of truth is amortized across all the accesses.
Distributed Caching with Memcached
Consistent Hashing
FAQ: http://www.socialtext.net/memcached/index.cgi?faq

Testing Tools

I am always thinking about traffic generation tools for stress testing and every company i worked for always seem to be lacking in this area. I have always managed to find good open source traffic generation tools which can be adapted to your personal requirements

Here are a few I bumped into today for Traffic generation:

SWIG

The other day a dear friend of mine mentioned to me about SWIG, I have not personally used this for any project, but seems a potential candidate to bridge the software language islands.

Check out more @ SWIG