Thursday, October 9, 2008

FireWall

With so many revelations it is hard not to gain more interest in the financial world, the following is the collection of audio/videos for main street from public radio

Thursday, June 26, 2008

XML

  1. C parser and Toolkit
  2. Validation
  3. Expat

Package Manager

  1. yum
  2. Smart
  3. apt-get

Network Backup Tool

Bacula

Real Time System Tracing

"premature optimization is the root of all evil(or at least most of it) in programming."
  1. DTrace
  2. SystemTap
  3. Frysk
  4. OProfile
  5. LTTng

Monitoring Comparative

Cacti:
  1. Graphing Monitoring Data
  2. Data Store in RRDtool
  3. No reporting
  4. Plugin support for extension
Ganglia:
  1. Highly Scalable Monitoring system
  2. Designed for grid and cluster computing
  3. XML for data representation
  4. XDR for compact portable data transport
  5. RRDtool for data storage and visualization
  6. Ideal for large monitoring environments
  7. No support for events or notifications
  8. Does not support thresholds
  9. Extension by pluggable modules similar to apache
  10. Gmond – Metric gathering agent installed on individual servers
  11. Gmetad – Metric aggregation agent installed on one or more specific task oriented servers
  12. Ported to various different platforms (Linux, FreeBSD, Solaris, others)
  13. Apache Web Frontend – Metric presentation and analysis server
  14. Attributes:
  • Multicast – All gmond nodes are capable of listening to and reporting on the status of the entire cluster
  • Failover – Gmetad has the ability to switch which cluster node it polls for metric data
  • Lightweight and low overhead metric gathering and transport

Nagios:
  1. Monitor hosts, service and network
  2. Plugin extension
  3. Automatic log file rotation
  4. Redundant monitoring hosts
  5. web interface for viewing current network status, notification and problem history, log file, etc.
  6. Supports thresholds
  7. Generates events or notifications against thresholds
  8. Supports "active checks" and "passive checks"
RRDtool:
  1. Data logging and graphing system for time series data
  2. Constant data storage size
  3. Data storage is created upfront
  1. Written in Perl
  2. Uses RRDTool
  3. Plugins can be written in any language
  4. Master/Node Architecture
  5. Default plugins like load average, memory usage, CPU usage and network traffic
Hyperic:

Thursday, June 19, 2008

Monitoring Tools

  1. MySQL Replication
  2. MMAIM
  3. Ganglia
  4. nagios
  5. MUNIN

Testing

"Beware of bugs in the above code; I have only proved it correct, not tried it."
  1. STAF

Performance

"The hardest thing is to go to sleep at night, when there are so many urgent things needing to be done. A huge gap exists between what we know is possible with today's machines and what we have so far been able to finish."
  1. Yahoo! Exceptional Performance
  2. Google Tools
  3. MySQL
  4. High Performance MySQL

Wiki

  1. dedicated to performance

Logging Services

"The most important thing in the kitchen is the waste paper basket and it needs to be centrally located."
  1. apache software foundation
  2. LogAnalysis
  3. socklog
  4. Swatch
  5. LogStatistics

Hash Map

"If you optimize everything, you will always be unhappy."

Google Sparse Hash

Sunday, June 15, 2008

Unix Tools

"I define UNIX as 30 definitions of regular expressions living under one roof."

A site which lists tools for Unix

For Linux:
From GOOG:

Tuesday, June 10, 2008

Guru of the Week (gotw)

Guru of the Week is a regular series of C++ programming problems created and written by Herb Sutter. Since 1997, it has been a regular feature of the Internet newsgroup comp.lang.c++.moderated, where you can find each issue's questions and answers (and a lot of interesting discussion).

Wednesday, May 28, 2008

Scalable Nonblocking Data Structures
InfoQ has an interesting writeup of Dr. Cliff Click's work on developing highly concurrent data structures for use on the Azul hardware (which is in production with 768 cores), supporting 700+ hardware threads in Java.

Monday, May 26, 2008

Friday, May 23, 2008

Data Encodings - IETF specification for base 64, base 32, and base 16 encoding schemes.

Thursday, May 22, 2008

Sunday, February 3, 2008

Event Handling Framework
libevent provides a simple portable framework for getting the events which uses the most efficient possible system calls available on your system.

Saturday, February 2, 2008

Ken Thompson Refelections on Trusting Trust

BTW if you we were wondering, he is in GOOGLE now
Unix Memory Model
There are some basic regions ("segments") provided by all Unix variants:
  • Stack: (Variable size) This is where information about function call sequences is stored. There is microprocessor support for the stack.
  • Code: (Fixed size) The area of memory containing machine code instructions. Typically r+x permissions. Aka the text segment.
  • Data: (Fixed size) The area of memory containing initialized data. This includes static variables, string constants, etc.
  • BSS: (Variable size) The area of memory containing uninitialized data. This is where "heap allocated" objects live.
You can view a process' memory map with the utility pmap.
System V shared memory
System V provides an alternative mechanism for setting up shared memory, via the shmctl (), shmget (), shmat (), and shmdt () set of calls. These are not suggested for use, because
  • System V entities live in a seperate namespace with seperate access permissions and adminstrative tools (e.g., ipcs).
  • System V entities are not automatically cleaned up if all programs using them exit, and can be a resource management nightmare.
Shared memory
Physical memory can be shared between two processes merely by manipulating their page tables. This happens automatically in modern Unixes in various circumstances, e.g.,
  • When implementing shared libraries, the dynamic loader will mmap the library, and the kernel will share the maps amongst processes.
  • When forking, the child gets a copy of the parent's page table, i.e., their pages originally all coexist in physical memory; but the first time a page is written (by either), the kernel traps the write and makes a copy. Thus a child can share a large read-only data structure constructed by the parent prior to forking; although as caveat, in same languages, a read-only data structure is still written to by the runtime (e.g., garbage collection metadata).
  • Threads share their entire page map. The OS will simply reset the stack pointer when switching contexts, as opposed to flushing the TLB.
Besides these examples, if two processes want to share physical memory, there are two major techniques: mmap, and system V.
mmap
The mmap () system call allows the programmer to associate a region of the process virtual address space with a file. It is an extremely general purpose utility:
  • It allows the mapped memory to have protection attributes (readable, writable, execable).
  • It allows the process to have a private copy (on-write) version of the file; changes are private to the process and disappear when the process exits. Alternatively, it allows the process to share the mapping with other processes; writing the memory area is equivalent to writing the file.
  • The memory mapped region need not correspond to an actual file (i.e. anonymous); by creating an anonymous mmap in a parent and forking, the children can share memory.
Mmap has several desirable properties, including
  • The namespace for mmap corresponds to the filesystem, adhering to the "everything is a file" Unix ideal.
  • Access permissions correspond to file permissions.
  • The actual relationship between the virtual address space and physical memory consumed by the mmap is controlled by the OS; in particular, memory resources are automatically freed when all processes using an mmap either munmap or exit.
These properties stand in contrast to system V shared memory.

Associated with mmap are the system calls msync () and madvise (). msync instructs the OS to write all modified pages to disk, either synchronous (don't return until call is complete) or asynchronously (return after sync has been scheduled); the OS will also optionally asynchronously sync dirty pages to disk if the proper flag is passed to mmap. madvise provides hints to the kernel as to how the program will access the mmap, in order to optimize.
Unix Signals
Signals are an asynchronous notification mechanism. Signals are covered by a POSIX standard. Under Linux, the signal (7) man page contains the list of signals supported.
POSIX.1 signals
Event Server
Servers typically handle three types of events: File descriptor, signal and timeouts

1. File Descriptor Events

There are several system calls available to receive file descriptor events.
  1. select (2) is the most portable and least efficient.
  2. poll (2) is nearly as portable, less inefficient, and has a very intelligible interface.
  3. Linux has epoll (4) which is a vastly more efficient variant of poll.
  4. FreeBSD has kqueue, which is a single extensible kernel interface for all event handling.
  5. Finally, POSIX.4 defines asynchronous I/O (AIO).
select (2)
  • Portability: Maximum, Efficiency: Worst, Notification Type: readiness, level triggered
poll (2)
  • Portability: Maximum, Efficiency: poor, Notification Type: readiness, level triggered
/dev/poll
  • Portability: solaris, Efficiency: acceptable, Notification Type: readiness, level triggered
epoll (4)
  • Portability: linux 2.4+ , Efficiency: good, Notification Type: readiness, level or edge triggered
POSIX.4 AIO
  • Portability: linux 2.6, freebsd, Efficiency: variable, Notification Type: completion
kqueue (2)
  • Portability: bsd, os/x , Efficiency: good, Notification Type: completion and readiness, level or edge triggered
2. Signal Events

Using kqueue makes it easy to mix signal event and file descriptor event notification. There is an event filter for signals, interest is signaled similarly to file descriptors, and the events are delivered in same way as file descriptor events.

Besides kqueue, every other way is ugly.

If the signal is delivered during the poll system call, poll will be interrupted with return value EINTR, even if timeout is -1. Thus, you will handle the signal event with low latency. The bad news is, if a signal is delivered between the first sigprocmask call and the poll call, and the poll timeout is -1, poll will not be interrupted and the signal will not be handled until (if) a file descriptor event occurs. One way to guard against this is to have a maximum poll timeout, e.g., of 100ms, which means in exchange for the (slight) extra overhead of 10 system calls a second when idle, you will have a maximum signal latency of circa 100ms.

POSIX provides the pselect (2) system call, which is like the sequence

sigprocmask (SIG_SETMASK, &mask, &oldmask);
select (...)
sigprocmask (SIG_SETMASK, &oldmask, NULL);

except that the system call eliminates the possibility of a signal being delivered between when the sigprocmask call returns and the select call begins. It was designed for the usage outlined above with poll, and therefore sounds ideal. Unfortunately, pselect is broken under Linux. Also, it uses select, and we prefer poll.

Another alternative is to have your signal handler write to a file descriptor that is included in your poll set:

With this setup, you can have an arbitrary poll timeout and maintain low latency. However, it's important to use a pipe, so that the (typically 4 byte) write and read of the signo is atomic; under POSIX, only pipes guarantee a minimum atomic read/write size larger than 4 bytes. It's also important that the pipe be set to non-blocking (see below), to avoid deadlock.

Both techniques also apply to epoll.

3. Timeout Events

The next event type of interest is the timer, e.g., you want a 50ms timeout on a sub request. In general, you may have many more simultaneous timers pending in a complicated server than you have file descriptors, since there are generally multiple timeouts per request.

Once again, kqueues makes it easy. There is a timer filter type which is treated similarly to the file descriptor filters.

Also once again, every other way makes it ugly.

poll, epoll, etc. all provide a single timeout argument to the system call. The problem, then, is to consider the entire set of timers, determine the delta-t until the next timer goes off, and use that deltat as the timeout argument to the system call. By storing the timers in a binary tree sorted by (absolute, not relative) expiration time, the next expiring timer can be found in O (log (N))) time, and creating and removing timers can also be done in O (log (N)) time; the latter is especially important, since most timers are cancelled before expiring (since they are most often used to timeout subrequests, and most of the time, your subservers are within SLA). This technique utilizes the gettimeofday (2) system call.

Sunday, January 27, 2008

Message Broker

The Publish/Subscribe Messaging is one of the flexible ways to implement robust, real-time, distributed, highly available services. I have used TIBCO SmartSockets in my previous projects and love the product. I have wondered whether there is any Open source Implementation, so far I have found:
  1. The Spread Toolkit
  2. D-Bus

Saturday, January 26, 2008

Pimpl

"The Pimpl technique is a useful way to minimize coupling, and separate interface and implementation."


Resources:

  1. Boost Serialization
  2. Making Pimpl Easy
  3. Compilation Firewalls
  4. The Fast Pimpl Idion
Virtualization is being applied in all aspects of software development, this DDJ Article recommends for build environment.
google "father of C++" returns Bjarne Stroustrup's Homepage . This homepage has useful resources especially technical FAQ.
Check out this site for All About Agile

Friday, January 25, 2008

Memory Leak

Ah, not my favorite topic, but always interested to find out new ways and tools to debug memory leaks. DDJ, one of my favorite site has an article in the current issue, check it out Memory Leaks Detection: A Different Approach
In Memory Cache

I have used following in my past projects:
Commercial:
  1. Oracle TimesTen

Open Source:
  1. Memcached
Testing Tools

I am always thinking about traffic generation tools for stress testing and every company i worked for always seem to be lacking in this area. I have always managed to find good open source traffic generation tools which can be adapted to your personal requirements

Here are a few I bumped into today for Traffic generation:
  1. Seagull
  2. SIPp
SWIG

The other day a dear friend of mine mentioned to me about SWIG, I have not personally used this for any project, but seems a potential candidate to bridge the software language islands.

Check out more @ SWIG