SQLite 101

Setting up the program on RHEL

Install the packages
(sys admin work, require root priv)
sudo yum install sqlite 		# this actually install sqlite3.  the older version rpm is called sqlite2

Initialize an example DB
SQLite is meant to be an embeded db, so many thing run as a user, and db is installed into the current dir of the user.

sqlite3    mytest.db			# this only specify to work with a new file as db, like calling vi myfile.txt
   create table tb1 (			# issue command to create table 
       f1 varchar(30) primary key,
       f2 text
   ^D					# end of file to terminate.  it saves and exit.

Example Script for Taxonomy DB setup

-- this filename: import_taxo.sql
-- sqlite3 db creation and loading script
-- run as
-- sqlite3 ncbi-taxo.db < import_taxo.sql
-- or
-- .read import_taxo.sql
-- essentially, commands that can be typed inside the sqlite shell can be used in the script as-is.

create table gi_taxid(gi integer PRIMARY KEY, taxid integer);
.mode list
.separator \t
pragma temp_store = 2;
.import gi_taxid_prot.dmp gi_taxid
.import gi_taxid_nucl.dmp gi_taxid
CREATE UNIQUE INDEX gi_idx_on_gi_taxid ON gi_taxid(gi);
CREATE INDEX taxid_idx_on_gi_taxid ON gi_taxid(taxid);

-- UNIQUE keyword in index means duplicates in that column will result in error.

.schema		-- show how tables are created
.indices	-- show index(s) 

# other possible load options...
.header on		-- maybe this only affect export to csv to add header row?
.mode csv

-- see http://www.sqlite.org/cli.html for more details.
-- .import don't seems to have any options for specifying how to handle quotes, escape chars, etc.

SQL queries using sqlite

Example from a Taxonomy DB
   # ref http://dgg32.blogspot.com/2013/07/map-ncbi-taxonomy-and-gi-into-sqlite.html

   $ sqlite3 ncbi-taxo.db	# call sqlite, giving it a file where the database is.

   .tables			# show a list of tables in the DB
   .help			# list sqlite commands
   .mode list

   SELECT taxid  FROM tree WHERE name  = "Proteobacteria";
   SELECT name   FROM tree WHERE taxid = '2';
   SELECT parent FROM tree WHERE taxid = '976';

   SELECT taxid  FROM tree     WHERE parent = '976';
   SELECT gi     FROM gi_taxid WHERE taxid  = '2';

Ref: https://www.sqlite.org/cli.html

Performance, Benchmark SQLite

  • A taxonomy db with file size of 33GB, including index, provided essentially instantataneous answer to simple SELECT queries like the one in the examples above.
  • A python program querying the db with 20 simultaneous thread that retrieve 100+ elements returned in 1-2 seconds.
  • Python program querying the DB 46k times took about 15 min (includes other sorting, processing in a taxonomy reporter tool), so about 50 queries per second.
  • Bulk loading using .import seems to be a lot faster than running python code that use cursor to INSERT INTO TABLE one row at a time. The fact that I was doing this inside a try/catch block may have added to the delay. Trying to find which record that would result in Primary Key violation (table admitedly changed a bit as well, more digging TBA, as current performance is about 700 GB per 8 hours, long way to go for a DB of 89GB, but there were index there)....


    Datatypes in SQLite3 include the usual suspects.  They are listed in 
    1. TEXT is used to store stings.
    2. VARCHAR is supported. VARCHAR(10) does NOT limit string to 10 chars, and sqlite allegedly will store without trucating anything, even if 50M chars... see https://www.sqlite.org/faq.html#q9
    3. CLOB ... not sure if there is a point to use this if VARCHAR can store arbitrary length. perhaps data manipulation is different...


    If get an error message like:
    Error: database or disk is full
    This isn't really out of space for the database or its file, but temp space!
    pragma temp_store;	# display what's the value set for temp_store.  0 is the default
    pragma temp_store = 2;  # set to 2, ie use RAM
    			# 1 = use file?  need to define temp_store_directory = '/some/place'   
    			# is /tmp not the default?
    ref: http://stackoverflow.com/questions/5274202/sqlite3-database-or-disk-is-full-the-database-disk-image-is-malformed#5275022


    To export a specific table into a comma delimited file:
    .mode csv 
    -- use '.separator SOME_STRING' for something other than a comma.
    .headers on 
    .out file.dmp 
    select * from MyTable;
    ref: http://stackoverflow.com/questions/75675/how-do-i-dump-the-data-of-some-sqlite3-tables

    Secure SQLite Installation

    LOL =)

    File Formats

    DB Dump



    SQLite3 overview


    [Doc URL: http://tin6150.github.io/psg/psg/mysql.html]
    (cc) Tin Ho. See main page for copyright info.