Sunday, 8 June 2014

Postgresql Backup To Amazon S3 On OpenShift Origin

To move forward, you need to backup. Backing up your production data is critical and with Postgresql, you can backup WAL (Write Ahead Log) archives and this post gives you steps to accomplish for backing up postgresql WALs to Amazon S3 on your OpenShift Origin using WAL-E.

WAL-E is a great tool that simplifies backup of postgresql by performing continuous archiving of PostgreSQL WAL files and base backups. Enough blabbering, you can reach out technical docs on how WAL works. I'll just mention series of commands and steps necessary for sending WAL archives to AWS S3 bucket.

On the node containing application with postgresql cartridge, run the following commands:

$ yum install python-pip lzop pv
$ rpm -Uvh ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/p_conrad:/branches/Fedora_19/x86_64/daemontools-0.76-3.1.x86_64.rpm
$ pip install wal-e
$ umask u=rwx,g=rx,o=
$ mkdir -p /etc/wal-e.d/env
$ echo "secret-key-content" > /etc/wal-e.d/env/AWS_SECRET_ACCESS_KEY
$ echo "access-key" > /etc/wal-e.d/env/AWS_ACCESS_KEY_ID
$ echo 's3://backup/production/pgsql' > \
  /etc/wal-e.d/env/WALE_S3_PREFIX
$ chmod -R 765 /etc/wal-e.d/


Then, edit the postgresql configuration file so as to turn on wal archiving. You need to find the right container for your postgresql in /var/lib/openshift (Its quite trivial if you know OpenShift basics).

$ vi YOUR_OO_CONTAINER/postgresql/data/postgresql.conf wal_level = archive # hot_standby in 9.0 is also acceptable
archive_mode = on
archive_command = 'envdir /etc/wal-e.d/env wal-e wal-push %p'
archive_timeout = 60


Finally, you need to ensure that you are taking base backups periodically which can be achieved by utilizing cron cartridge. Clone the repo, add the following file and push to the application.

$ vi .openshift/cron/daily/postgres-backup
#!/bin/bash

if [ $OPENSHIFT_POSTGRESQL_DIR ]; then
        /usr/bin/envdir /etc/wal-e.d/env /bin/wal-e backup-push ${OPENSHIFT_POSTGRESQL_DIR}data
fi

$ git add .openshift/cron/daily/postgres-backup
$ git commit -m "Added pg cron script"
$ git push origin master


Make sure you use the OPENSHIFT_POSTGRESQL_DIR env-var or some other env-var that does not have two forward slashes adjacently since WAL-E hates it.

This should help you keep your data backed up regularly and you can enjoy beers.


Read more...

Thursday, 5 June 2014

Setting Up JVM Heap Size In JBoss OpenShift Origin

Openshift is an awesome technology and have fell in love with it recently. In this post, I will talk about how we can set JVM Heap Size for your application using Jboss cartridge.

If you look into the content of the standalone.conf located at $OPENSHIFT_JBOSSEAP_DIR/bin, you can see that JVM_HEAP_RATIO is set to 0.5 if it is not already set.

if [ -z "$JVM_HEAP_RATIO" ]; then
        JVM_HEAP_RATIO=0.5


And, later this ratio is used to calculate the max_heap so as to inject the maximum heap size in jboss java process. You can see how gear memory size is used to calculate the value of heap size. This is the very reason why the default installation allocates half of total gear memory size.

max_memory_mb=${OPENSHIFT_GEAR_MEMORY_MB}
max_heap=$( echo "$max_memory_mb * $JVM_HEAP_RATIO" | bc | awk '{print int($1+0.5)}')


OpenShift keeps its number of environment variables inside /var/lib/openshift/OPENSHIFT_GEAR_UUID/.env so what I did was SSH to my OO node and run the command below (you should replace your gear's UUID):

$ echo -n 0.7 > /var/lib/openshift/52e8d31bfa7c355caf000039/.env/JVM_HEAP_RATIO


Alternatively, rhc set-env JVM_HEAP_RATIO=0.7 -a appName should also work but I have not tried it.


Read more...

Friday, 11 April 2014

Patching Your OpenShift Origin Against Heartbleed vulnerability

Recently the heartbleed bug was exposed which existed in all the services that used OpenSSL 1.0.1 through 1.0.1f (inclusive) for years already. This weakness allows stealing the information protected, under normal conditions, by the SSL/TLS encryption used to secure the Internet by reading the memory of the system without need of any kind of access.

I've been administering OpenShift applications recently and this post outlines the measures I took to secure the OpenShift applications from this critical vulnerability.

In order to check if you are vulnerable or not, you can either check OpenSSL version:

# openssl version -a
OpenSSL 1.0.1e-fips 11 Feb 2013
built on: Wed Jan 8 07:20:55 UTC 2014
platform: linux-x86_64


Alternatively, you can use one of the online tools or the offline python tool to check if you are vulnerable or not.

Note that in case of OpenShift origin, you will have to update the OpenSSL package in brokers and nodes such that all the OpenShift apps are secure.

# yum install -y openssl


Once completed, verify the installation of patched version:

# openssl version -a
OpenSSL 1.0.1e-fips 11 Feb 2013
built on: Tue Apr 8 00:29:11 UTC 2014
platform: linux-x86_64

# rpm -q --changelog openssl | grep CVE-2014-0160
- pull in upstream patch for CVE-2014-0160


We'll have to restart the proxy systems (node-proxy) for the nodes for the effect of the patch. In fact, we will have to restart all the services that use the vulnerable OpenSSL versions.

# systemctl restart openshift-node-web-proxy.service

# /bin/systemctl reload httpd.service


I hope this helps :)


Read more...

Saturday, 7 December 2013

MyISAM to InnoDB Engine Conversion

We are doing lots of MyISAM to InnoDB migrations in our production environment and since the engine conversion needs to be done for each table, its good to generate a script to do so when you have huge number of databases each having several tables. Here is the quick script to generate script for MyISAM to InnoDB engine conversion.
mysql -u <user> -p -e "SELECT concat('ALTER TABLE \`',TABLE_SCHEMA,'\`.\`',TABLE_NAME,'\` ENGINE=InnoDB;') FROM Information_schema.TABLES WHERE TABLE_SCHEMA in ('database1', 'database2', 'databaseN') AND ENGINE = 'MyISAM' AND TABLE_TYPE='BASE TABLE'" | tail -n+2 > alter.sql


Once the SQL script is generated, all you need to do is run the sql file to your database server.
$ mysql -u <user> -p < alter.sql


Note that while InnoDB is generally the better engine than MyISAM and MySQL has InnoDB as default engine since 5.5, MyISAM has its own benefits and you should make performance analysis in preferably a test environment while converting the engine type.


Read more...

Monday, 18 November 2013

Install HTTrack On CentOS

Since I could not find the rpm in the repo, here is the quick How To to install HTTrack website copier on CentOS.

$ yum install zlib-devel
$ wget http://download.httrack.com/cserv.php3?File=httrack.tar.gz -O httrack.tar.gz
$ tar xvfz httrack.tar.gz
$ cd httrack-3.47.27
$ ./configure
$ make && sudo make install


This should do all. If you wish not to install zlib compression support, you can skip the first step and run the configure as ./configure --without-zlib. I hope this helps :)


Read more...

Sunday, 10 November 2013

JPEG To PDF With Imagemagick

ImageMagick is an awesome toolkit with several powerful features for image creation and manipulation. You can use ImageMagick to translate, flip, mirror, rotate, scale, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bezier curves. Here, I will show how you can use ImageMagick suite to convert JPEG to PDF quickly.

First make sure imagemagick suite is installed in your system.

Ubuntu/Debian
$ sudo apt-get install imagemagick


CentOS/Fedora
$ sudo yum install imagemagick


Below are some of the examples of using convert which is a part of ImageMagick to convert Jpeg to PDF.

Single Image
$ convert image.jpg image.pdf


Multiple Images
$ convert 1.jpg 2.jpg 3.jpg output.pdf


Resize and Convert
$ convert -resize 80% image.jpg image.pdf


Negate and Convert
$ convert -negate image.jpg image.pdf


You can actually use different available switches to get your output as expected. I usually use PdfTk in conjunction with this technique to work in different scenarios and it really works great. I hope this helps :)


Read more...

Saturday, 9 November 2013

Fix Your Ubuntu

Recently Ubuntu has been known for turning into an advertising company and has been accused of not protecting user's privacy so just came across this site that fixes your ubuntu by applying some patches to turn off some of the invasive features of Ubuntu.

FixUbuntu.com


Read more...

Wednesday, 6 November 2013

Recursion and Memoization - A Fibonacci Example

In this post, I will try to describe why memoization can be a great optmization technique in the recursive function implementations with an example of fibonacci sequence.

Straight from Wikipedia, memoization is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs.

Basically, we maintain a lookup table and store the computer values for particular cases which lets us query and use the corresponding value for particular case present in the lookup tables. This reduces function call overheads. Now in order to understand why this is a great optimization technique in recursion, lets first draw a recursion tree for finding nth term in fibonacci sequence.

                           fib(5)                          
                             /\                              
                            /  \
                           /    \
                          /      \                      
                     fib(4)       fib(3)                                 
                      /\               /\                          
                     /  \             /  \
                    /    \           /    \
                   /      \         /      \                         
               fib(3)    fib(2)     fib(2) fib(1) -> 1                                    
                  /\         /\          /\                          
                 /  \       /  \        /  \                         
                /    \     /    \      /    \
               /      \   /      \    /      \                        
          fib(2) fib(1) fib(1) fib(0) fib(1) fib(0) -> 0                                        
          /\        |     |      |        |    |               
         /  \       1     1      0        1    0               
    fib(1) fib(0)                                                       
       |      |                                               
       1      0 


We can clearly see the calls to fib() with same arguments several times. For example, fib(1) is called 5 times and fib(2) 3 times. Thus, we are repeating same calculations multiple times and imagine how this would look like for large value of n. If we would have maintained the value of fib(n) in the lookup table when computed the value for the first time.

The python code without memoization looks like below and notice the runtime:
#!/usr/bin/python

def fib(n):
        if n == 0:
                return 0
        if n == 1:
                return 1
        val = fib(n-1) + fib(n-2)
        return val

print fib(50)


And, now with the memoization, you will notice significant improvement in runtime.

#!/usr/bin/python

known = {0:0, 1:1}

def fib(n):
        if n in known:
                return known[n]
        known[n] = fib(n-1) + fib(n-2)
        return known[n]

print fib(50)


If you run and compare above two codes, you will find that the addition of memoization significantly improves the performance of recursive functions. Recursion are generally known to be terribly slow however memoization can make the difference insignificant. Some languages now provide memoization as the language feature natively or via third party APIs such as groovy memoize.


Read more...