Monday, May 19, 2008

CSQL 2.0 Released

CSQL 2.0 , Transaction Data Cache Released.

This release contains
  • Operation synchronous table cache,
  • Gateway(access to non-cached tables in target database),
  • Primitive ODBC Driver
  • Improved Test Coverage
  • csqldump tool(to dump all database tables and records into file)
  • cachetable tool(to load, unload, reload tables from target database)


CSQL - Transaction Data Cache for any DB

Lead / Introduction
--------------------
CSQL is compact main memory open source SQL database engine that delivers ultra fast performance. It can handle 100K selects/sec and 50K updates/sec.
It can also work as a middle tier caching solution for any open source and commercial database thereby increasing the throughput of the application by 20 to 100 times without making any code changes.

CSQL - Main memory Database
-------------------------------

Main memory databases are times faster than disk based database systems, as all the data is available in physical memory. It also avoids the buffer manager overhead which is found in disk based database systems. Moreover data access algorithms can work efficiently as compared to traditional disk based algorithms. One of the major factor which determines the performance of database management is disk I/O. Access time for main memory is orders of magnitude less than that of disk ( 100ns vs 10ms).

Real time applications perform random access(point lookups => f1 = ?) 90% of the time, for which records from different disk blocks needs to be read. This hampers the performance of the application. Main memory cost has reduced significantly in the recent years and even desktops now have 1 GB RAM. Database managment system which depends only on memory and does not any disk I/O will be many times faster than disk based database sytems. This led to design a main memory database engine which does not involve any disk I/O.

Focus
-----
CSQL compact main memory open source SQL database engine, uses shared memory architecture, wherein the database will be available to the application process in its own process address space and can be accessed by simple pointer semantics. OS mutex mechanisms are costly as it requires context switch, csql implements its own mutex mechanism through atomic assembly instructions to reduce the machine cycles. CSQL is designed for one single purpose
'performance'. It is not a feature rich database product. It does only limited things, but with ultra fast high performance.

SQL and Standard interface support
-----------------------------------
CSQL supports all DML statements which involve single table and DDL statements related to table and index.
Standard JDBC and ODBC drivers are also available to access the SQL kernel. It also provides proprietary C++ SQL interface to access the SQL Kernel and C++ interface to access storage kernel. Apart from primitive data types, it supports Date, Time and TimeStamp data types.

Storage Engine Capabilities
----------------------------
CSQL storage engine supports all the ACID properties.
Atomicity is implemented by using physical and logical undo logs.
It supports Unique, Not null and primary key, foreign key contraints.
It supports the first three isolation levels, READ_UNCOMMITTED, READ_COMMITTED, READ_REPEATABLE.
Disk performance is stagnated and it is around 7.2K rpm for IDE and 15K for SCSI disks. But network speed is increasing at good pace. 1 Gbps Ethernet is reality now. Because of these reasons, CSQL decides to rely on network so that it performance improves with increase in network speed rather than getting stagnated as in case with disk based database systems. Because of the above said reason, durability property of transaction is supported in csql using synchronized replication.
Hash Index and Tree indexing mechanisms make sure that equality predicates and range predicates are executed faster. Index structures are highly concurrent as only short duration locks are only taken on them.
Multi granular locking aids to improve the concurency. Row level locking gives the highest level of concurrency.
Custom built fixed size and variable size allocators are highly concurrent and uses latch free algorithms which aids in increasing the performance and gives high level of concurrency.

Transaction Data Cache
--------------------------
CSQL can act as updateable real time caching for relational table of any database system (commercial and open source database systems including ORACLE and Mysql). This cache resides in application tier (same host where application resides) and reduces the computation and network overhead involved in executing SQL statements. It shall be used by both read intensive applications as well as update intensive applications, thereby increasing the throughput by 20 to 100 times.

Caching is done at the table level granularity and application shall either specify tables to be cached in a configuration file or it can use CLI tool or API's to cache the table dynamically.It loads the full table from the target database into CSQL database. After loading, any select query will be handled by the csql itself. Non -select DML statements are executed at both csql and target database, so that the cache and target database are in sync. It supports three modes to execute DML statements at target database

Applications shall also access tables which are not there in cache, using the same driver provided by CSQL. These statements on non cached tables are handled by the 'Gateway' component in the caching subsystem, whose responsibility is to check, where the table is present and route it either csql or target database based on the SQL query. If application executes complex queries which are not supported in CSQL Engine, then they are also routed to target database for execution. This will allow the application to adapt the caching solution without any application code changes.

a)DSYNC - statement execute will return after the operation is succesfully executed at csql as well as target database. This gives high consistency between cached data and target database. If csql server goes down, during recovery it automatically loads the tables from the target database.


b)ASYNC - statement execute will return after the operation is successfully executed at csql and update logs are generated. These update logs and sent to target database host by another process called "Propagator". CSQLExecutor process will run on the host where target database is running whose responsibility is to receive the update logs from the propagator and execute them on target database.

Applications can choose appropriate mode based on durability and consistency requirements. For applications, where updates shall happen directly to the target database, it provides CLI and API to reload the table fully or incremental. Incremental loading will work best if only insert and delete operations needs to be loaded from target database. Direct updates to target database shall be fetched to cache by using full table reloading. Application also can configure to have automatic loading at regular intervals, either fully or incremental.

Incase of foreign key constraints, applications shall either use DYNC mode, in which constaint violations are detected during target database execution. Other alternative is to load all the tables which are involved in the relationship to cache and use either NSYNC or ASYNC options based on the requirement.

Product Page
-------------
http://sourceforge.net/projects/csql