Wednesday, October 31, 2012

Cassandra & Opscenter [based on DataStax instructions]


Goal:
Description of what we want to meet:
We have 6 instances are running on EC2. For our cassandra cluster, we need 3 data centers(DC1,DC2,and DC3), which means 2 cassandra nodes(RAC1 and RAC2 respectively) for each data center. DC1 locates in  us-west-2a, DC2 locates in us-west-2b and DC3 locates in us-west-2c. Once cassandra cluster is running,  then start Opscenter and opscenter-agent to monitor cassandra cluster.


Installation:
DataStax has a very easy way to install cassandra, opscenter and opscenter-agent on CentOS:
Cassandra must be installed on all nodes
Opscenter can be installed on one of the nodes, I instlled it on the first of the 6 nodes.
Opscenter-agent must be setup on all nodes


Configuration:
All configuration files:
  • /etc/cassandra/conf/cassandra.yaml
  • /etc/cassandra/conf/cassandra-topology.properties
  • /etc/opscenter/opscenterd.conf
  • /etc/opscenter/cluster/Default.conf    [generated by yourself]
  • /var/lib/opscenter-agent/conf/address.conf
How DataStax configure the above files?
How we configure the above file?

IPs:
6 nodes, their internal IP from node1 to node6 are:
node1 & 2 are in DC1, node3 & 4 are in DC2, and node5 & 6 are in DC3.

10.252.171.91 
10.253.0.234
10.249.30.92
10.249.7.76
10.244.155.181
10.244.164.144

We will discuss those 4 configuration files in order:

cassandra.yaml:

initial_token: 
commitlog_directory: /commit
seeds: "10.252.171.91,10.249.30.92,10.244.155.181"
listen_address: 10.253.0.234
broadcast_address:
rpc_address:
endpoint_snitch: PropertyFileSnitch


Now, I'll explain them one by one.


initial_token:
we assign values to intial_token of all 6 nodes, but it seems it does not work as what we expected, so we assign the token values manually using nodetool. The bad thing is once we reboot cassandra on any of the nodes, we need to removetoken using nodetool as well. [when start cassandra, start the seed nodes first may solve the "not using assigned value" problem, or perhaps cassandra was caching old data because the first thing they advise is to rm -rf /var/lib/cassandra/data/*]

DataStax has its own way to generate tokens for multiple data center cluster: http://www.datastax.com/docs/0.8/install/cluster_init#initializing-a-multi-node-or-multi-data-center-cluster
Here's what we got:

    "0": {
        "0": 0,
        "1": 85070591730234615865843651857942052864
    },
    "1": {
        "0": 56713727820156410577229101238628035242,
        "1": 141784319550391026443072753096570088106
    },
    "2": {
        "0": 28356863910078205288614550619314017621,
        "1": 113427455640312821154458202477256070485
    }




commitlog_directory:
we allocated a single disk for the commitlog, the commitlog will increase very fast, so a single disk can guarantee that cassandra's performance will not be affected because of the increasing size of commitlog.
The disk should be partitioned and formated before use it.

seeds:
It is comma separated as you have seen our own configuration. If you read it very carefully, you will find those 3 IP belongs to node1,3 and 5. node 1, 3 and 5 are in DC1, DC2 and DC3 respectively.

listen_address:
The listen_address the local machine's internal IP address.

broadcast_address:
we leave it to blank, which means it will be the same as listen_address. 

rpc_address:
we leave it to blank, which means it will be the same as listen_address.
The default of rpc_address is "localhost", the problem is that opscenter cannot connect to cassandra cluster, once we make it blank, opscenter works fine with cassandra cluster.

endpoint_snitch: 
we use PropertyFileSnitch,  the PropertyFileSnitch can help us configure our own data centers and racks.
This refers to cassandra-topology.properties file, which is our next step.


cassandra-topology.properties:
In this configuration file, we comment out all the default DCs and RACs, and setup our owns like this:

# Our Own DCs and RACs

10.252.171.91=DC1:RAC1
10.253.0.234=DC1:RAC2
10.249.30.92=DC2:RAC1
10.249.7.76=DC2:RAC2
10.244.155.181=DC3:RAC1
10.244.164.144=DC3:RAC2

default=DC2:RAC1

It is clear. One thing need to mention is the default=DC:RAC1, once a new cassandra node is added to the cluster, its data center and rack will be set to default.



opscenterd.conf:
The following is exactly what we did in our pscenterd.conf file:

[webserver]
port = 8888
interface = 10.252.171.91

[logging]
# level may be TRACE, DEBUG, INFO, WARN, or ERROR
level = DEBUG


[agents]
use_ssl = false



One important thing is DataStax says the interface could be set to 0.0.0.0, then it could always works. BUT, our own experience tells us, don't do that! Set it to your machine's exact IP address is the best way. The problem we met here is that when interface is 0.0.0.0 , opscenter cannot find cassandra cluster.
level = DEBUG can help you find out what is going on exactly.



/etc/opscenter/cluster/Default.conf    [generated by yourself]
This configuration file is generated by yourself. If you do not create this conf file. Opscenter cannot even know cassandra cluster exist via JMX and the thrift port. Here is what we did:

[jmx]
port = 7199

[cassandra]
seed_hosts = 10.252.171.91,10.249.30.92,10.244.155.181
api_port = 9160

jmx is what we already know when cassandra is installed. use_ssl = false, this command makes ssl disabled.
seed_hosts is the exactly the same as the seeds of cassandra.yaml. This helps opscenter find our existing cassandra cluster. api_port is the thrift port( also known as rpc_port in cassandra.yaml).

[jmx]],[cassandra], and api_port you can also define in opscenterd.conf and when you run opscenter, it will genereate the cluster folder and Default.conf for you.

So far so good, opscenter and cassandra cluter should work properly. Then turn off all of them, and restart cassandra first, and then start Opscenter.

Open a browser on your local machine, type in "https://external-ip:8888"
external-ip: The public IP of the machine where you install Opscenter.


Next step is to setup opscenter-agent, the best is to set automatically:
then change address.conf, add "use_ssl : 0" to the end of the file.

RESTART opscenter and opscenter-agent.





Alternatives:
If you want to use external ip addresses  for the cassandra cluster, what you need to do:
1. change listen_address and seeds to the current machine's public ip address in cassandra.yaml file.
2. use external ip addresses in the cassandra-topology.properties file.
3. seed_hosts of Default.conf should be the same as the seeds of cassandra.yaml



Pitfalls:
0. Oraclle JRE 6. Java 7 is not recommended.

1. Install Cassandra, Opscenter and opscenter-agent all using rpm, or install all using tar.gz, do not mix those two ways.

2. For all configuration files, all IP we used are internal IP.

3. List the required ports explicitly for the firewall rules.

4. When start cassandra, start the seed nodes first.

5. Perhaps cassandra was caching old data because the first thing they advise is to rm -rf /var/lib/cassandra/data/* . You would do above if you changed the cluster_name for example.

6.I think this is the reason the tokens didn't work before:


Purging Gossip State on a Node
Gossip information is also persisted locally by each node to use immediately next restart without having to wait for gossip. To clear gossip history on node restart (for example, if node IP addresses have changed), add the following line to the cassandra-env.sh file. This file is located in /usr/share/cassandra or /conf.
-Dcassandra.load_ring_state=false

7. when you are done editing hold down the "shift" key and press "zz" - that means save and exit in vi
if you need to look at the contents of a file use "less" not vi
less /etc/cassandrfa/conf/cassandra.yaml
that will *never* affect the application but vi certainly will.











2 comments: