Running Cassandra on Fedora 32

This is not a tutorial. These are my running notes from getting Cassandra to run on Fedora 32. The debugging steps are interesting in their own right. I’ll provide a summary at the end for any sane enough not to read through the rest.

Old Instructions

So…Starting with https://www.liquidweb.com/kb/how-to-install-cassandra-2-on-fedora-20/ The dsc-20 is, I think, version specific, so I want to se if there is something more appropriate for F32 (has it really been so many years?)

Looking in here https://rpm.datastax.com/community/noarch/ I see that there is still a dsc-20 series of packages, but also dsc-30…which might be a bit more recent of a release.

Dependencies resolved.
==================================================================================================================
 Package                      Architecture            Version                     Repository                 Size
==================================================================================================================
Installing:
 dsc30                        noarch                  3.0.9-1                     datastax                  1.9 k
Installing dependencies:
 cassandra30                  noarch                  3.0.9-1                     datastax                   24 M
 
Transaction Summary
==================================================================================================================
Install  2 Packages

I’d be interested to see what is in the dsc30 package versus Cassandra.

$ rpmquery --list dsc30
(contains no files)

OK. But…there is no Systemd file:

sudo systemctl start cassandra
Failed to start cassandra.service: Unit cassandra.service not found.

Garbage Collection Configuration

We’ll, let’s just try to run it.

sudo /usr/sbin/cassandra
 
Unrecognized VM option 'UseParNewGC'

Seems like it is built to use an older version of the Java CLI params, which is now gone. Where does this come from?

$ rpmquery --list cassandra30 | xargs grep UseParNewGC  2>&1 | grep -v "Is a direc" 
/etc/cassandra/default.conf/jvm.options:-XX:+UseParNewGC

We can remove it there. According to this post, the appropriate replacement is -XX:+UseG1GC

OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Unrecognized VM option 'PrintGCDateStamps'

OK, lets take care of both of those. According to this post, the GC line we put in above should cover UseConcMarkSweepGC.

The second option is in the logging section. It is not included in the jvm.options. However, if I run it with just the first option removed, I now get:

$ sudo /usr/sbin/cassandra
[0.000s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/var/log/cassandra/gc.log instead.
Unrecognized VM option 'PrintGCDateStamps'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

More trial and error shows I need to comment out all of the GC logging values at the bottom of the file:

#-XX:+PrintGCDetails
#-XX:+PrintGCDateStamps
#-XX:+PrintHeapAtGC
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
#-XX:+PrintPromotionFailure
#-XX:PrintFLSStatistics=1
#-Xloggc:/var/log/cassandra/gc.log
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=10
#-XX:GCLogFileSize=10M

-Xloggc is deprecated. Will use -Xlog:gc:/var/log/cassandra/gc.log instead. This is not from the jvm.options file (it was already commented out above).

$ rpmquery --list cassandra30 | xargs grep loggc  2>&1 | grep -v "Is a direc" 
/etc/cassandra/default.conf/cassandra-env.sh:JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log"
/etc/cassandra/default.conf/cassandra-env.sh.orig:JVM_OPTS="$JVM_OPTS -Xloggc:${CASSANDRA_HOME}/logs/gc.log"
/etc/cassandra/default.conf/jvm.options:#-Xloggc:/var/log/cassandra/gc.log

I’m going to replace this with -Xlog:gc:/var/log/cassandra/gc.log as the message suggests in /etc/cassandra/default.conf/cassandra-env.sh

Thread Priority Policy

$ sudo /usr/sbin/cassandra
intx ThreadPriorityPolicy=42 is outside the allowed range [ 0 ... 1 ]
Improperly specified VM option 'ThreadPriorityPolicy=42'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
$ rpmquery --list cassandra30 | xargs grep ThreadPriorityPolicy  2>&1 | grep -v "Is a direc" 
/etc/cassandra/default.conf/cassandra-env.sh:JVM_OPTS="$JVM_OPTS -XX:ThreadPriorityPolicy=42"
/etc/cassandra/default.conf/cassandra-env.sh.orig:JVM_OPTS="$JVM_OPTS -XX:ThreadPriorityPolicy=42"

Looks like that was never a legal value. Since I am running a pretty tip-of-tree Linux distribution and OpenJDK version, I am going to set this to 1.

And with that, Cassandra will run. Too much output here. Let’s try to connect:

cqlsh doesn’t run

cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': OperationTimedOut('errors=Timed out creating connection (5 seconds), last_host=None',)})

OK…let’s dig: First, is it listening:

$ ps -ef | grep java
root      618809    5573  7 14:30 pts/3 ....
java    618809 root   61u     IPv4           32477117       0t0        TCP localhost:7199 (LISTEN)
java    618809 root   62u     IPv4           32477118       0t0        TCP localhost:46381 (LISTEN)
java    618809 root   70u     IPv4           32477124       0t0        TCP localhost:afs3-fileserver (LISTEN)
$ grep afs3-file /etc/services 
afs3-fileserver 7000/tcp                        # file server itself
afs3-fileserver 7000/udp                        # file server itself

I’m not sure off the top of my head which of those is the Query language port, but I can telnet to 7000, 7199, and 46381

Running cqlsh –help I see:

Connects to 127.0.0.1:9042 by default. These defaults can be changed by
setting $CQLSH_HOST and/or $CQLSH_PORT. When a host (and optional port number)
are given on the command line, they take precedence over any defaults.

Lets give that a try:

[ayoung@ayoungP40 ~]$ cqlsh 
Connection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection <AsyncoreConnection(140617688084880) 127.0.0.1:7000 (closed)> is already closed',)})
[ayoung@ayoungP40 ~]$ export CQLSH_PORT=7100
[ayoung@ayoungP40 ~]$ cqlsh 
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 7100)]. Last error: Connection refused")})
[ayoung@ayoungP40 ~]$ export CQLSH_PORT=46381
[ayoung@ayoungP40 ~]$ cqlsh 
nOPConnection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection <AsyncoreConnection(139655917236624) 127.0.0.1:46381 (closed)> is already closed',)})

Nope. Ok, maybe there is a log file. Perhaps the Casandra process is stuck.

[ayoung@ayoungP40 ~]$ ls -lah /var/log/cassandra/ total 52M drwxr-xr-x. 2 cassandra cassandra 4.0K Oct 19 15:41 . drwxr-xr-x. 23 root root 4.0K Oct 19 11:49 .. -rw-r–r–. 1 root root 19M Oct 19 15:41 debug.log

That is a long log file. I’m going to stop the process, wipe this directory and start again. Note that just hitting Ctrl C on the terminal was not enough to stop the process, I had to send a kill by pid.

This time the shell script exited on its own, but the cassandra process is running in the background of that terminal. lsof provides similar output. The high number port is now 44823 which means that I can at least rule that out; I think it is an ephemeral port anyway.

[ayoung@ayoungP40 ~]$ export CQLSH_PORT=7199
[ayoung@ayoungP40 ~]$ cqlsh 
Connection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection <AsyncoreConnection(140156084482448) 127.0.0.1:7199 (closed)> is already closed',)})

According to This post, the port for The query language is not open. That would be 9042. The two ports are for Data sync and for Java Management Extensions (JMX).

Why don’t I get Query port? Lets look in the log:

INFO  [main] 2020-10-19 15:46:11,640 Server.java:160 - Starting listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
INFO  [main] 2020-10-19 15:46:11,665 CassandraDaemon.java:488 - Not starting RPC server as requested. Use JMX (StorageService-&gt;startRPCServer()) or nodetool (enablethrift) to start it

But starting it seems to trigger a cascading failure: I now have a lot of log files. Let me see if I can find the first error. Nah, they are all zipped up. Going to wipe and restart, using tail -f on the log file before asking to restart thrift.

$ grep start_rpc /etc/cassandra/conf/*
/etc/cassandra/conf/cassandra.yaml:start_rpc: false
/etc/cassandra/conf/cassandra.yaml.orig:start_rpc: false
grep: /etc/cassandra/conf/triggers: Is a directory

Since trying to start it with nodetool enablethrift failed. Let me try changing that value in the config file and restarting. My log file now ends as:

INFO  [main] 2020-10-19 16:12:47,695 ThriftServer.java:119 - Binding thrift service to localhost/127.0.0.1:9160
INFO  [Thread-1] 2020-10-19 16:12:47,699 ThriftServer.java:136 - Listening for thrift clients...
$ cqlsh 
Connection error: ('Unable to connect to any servers', {'127.0.0.1': OperationTimedOut('errors=Timed out creating connection (5 seconds), last_host=None',)})

Something is not happy. Let me see where the errors start. tail the log and tee it into a file in /tmp so I can look at it in the end.

ERROR [SharedPool-Worker-6] 2020-10-19 16:24:34,069 Message.java:617 - Unexpected exception during request; channel = [id: 0x0e698e4a, /127.0.0.1:35340 =&gt; /127.0.0.1:9042]
java.lang.RuntimeException: Unable to access address of buffer
        at io.netty.channel.epoll.Native.read(Native Method) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
        at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:678) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
        at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
        at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe$3.run(EpollSocketChannel.java:755) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:268) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
        at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
ERROR [SharedPool-Worker-2] 2020-10-19 16:24:34,069 Message.java:617 - Unexpected exception during request; channel = [id: 0x0e698e4a, /127.0.0.1:35340 =&gt; /127.0.0.1:9042]

Note: At this point, I suspected SELinux, so I put my machine in permissive mode. No change.

Native Transport

So I turned to Minecraft. Turns out they have the same problem there, and the solution is to disable native transport: Lets see if that applies to Cassandra.

$ grep start_native_transport /etc/cassandra/conf/*
/etc/cassandra/conf/cassandra.yaml:start_native_transport: true
/etc/cassandra/conf/cassandra.yaml.orig:start_native_transport: true

Ok, and looking in that file I see:

#Whether to start the native transport server.

#Please note that the address on which the native transport is bound is the

#same as the rpc_address. The port however is different and specified below.

Let me try disabling that and see what happens. No love but…in the log file I now see:

INFO  [main] 2020-10-19 17:23:28,272 ThriftServer.java:119 - Binding thrift service to localhost/127.0.0.1:9160
INFO  [main] 2020-10-19 17:23:28,272 ThriftServer.java:119 - Binding thrift service to localhost/127.0.0.1:9160

So let me try on that port.

[ayoung@ayoungP40 ~]$ export CQLSH_PORT=9160
[ayoung@ayoungP40 ~]$ cqlsh

Maybe it needs the native transport, and it should not be on the same port? Sure enough, further down the conf I find:

rpc_port: 9160

Change the value for start_native_transport back to true and restart the server.

Now it fails with no message why.

This native_transport intrigues me. Lets see what else we can find…hmm seems as if that is an old protocol, and the native_transport has been in effect for 5 or so years…which would explain why it shows in the F22 page, but is not really supported. I should probably turn it off.

Interlude: nodetool status

OK…what else can I do to test my cluster? Nodetool?

$ nodetool status
WARN  21:37:34 Only 51050 MB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  186.68 KB  256          100.0%            6cf084ed-a30a-4b80-9efb-acbcd22362c2  rack1

Better install repository

OK….wipe the system, try with a different repo. Before this, I wiped out all of my old config files, and will need to remake any changes that I noted above.

sudo yum install http://apache.mirror.digitalpacific.com.au/cassandra/redhat/311x/cassandra-3.11.8-1.noarch.rpm http://apache.mirror.digitalpacific.com.au/cassandra/redhat/311x/cassandra-tools-3.11.8-1.noarch.rpm

Still no systemd scripts. Maybe in the 4 Beta. I’ll check that later. Make the same config changes for the jvm.options. Note that the thread priority has moved here, too. Also, it does not want to run as root any more…progress. Do this to let things log:

sudo chown -R ayoung:ayoung /var/log/cassandra/

Insufficient permissions on directory /var/lib/cassandra/data

Get that one too.

sudo chown -R ayoung:ayoung /var/lib/cassandra/

telnet localhost 9160

Connects….I thought that would be turned off…interesting

cqlsh Connection error: (‘Unable to connect to any servers’, {‘127.0.0.1’: ConnectionShutdown(‘Connection to 127.0.0.1 was closed’,)}) $ strace cqlsh 2>&1 | grep -i connect\( connect(5, {sa_family=AF_INET, sin_port=htons(9160), sin_addr=inet_addr(“127.0.0.1”)}, 16) = -1 EINPROGRESS (Operation now in progress)

This despite the fact that the docs say

Connects to 127.0.0.1:9042 by default.

Possibly picking up from config files.

Anticlimax: Works in a different terminal

I just opened another window, typed cqlsh and it worked…..go figure….Maybe some phantom env var from a previous incantation.

Summary

  • For stable (not beta) Use the 3.11 version
  • Run as non-root user is now enforced
  • Change ownership of the var directories so your user can read and write
  • remove the GC options from jvm.options
  • Set the threading priority policy to 1 (or 0)
  • make sure that you have a clean env when running cqlsh

It would be nice to have a better understanding of what went wrong running cqlsh. Problems that go away by themselves tend to return by them selves.

1 thought on “Running Cassandra on Fedora 32

  1. For all those who land here with cassandra service starting issues –
    All you need is to configure the proper JAVA env for the cassandra ver you have installed.
    In case if you have multi java env installed – use alternatives as below –

    [root@fedora ~]# update-alternatives –config java

    There are 2 programs which provide ‘java’.

    Selection Command
    ———————————————–
    *+ 1 java-17-openjdk.x86_64 (/usr/lib/jvm/java-17-openjdk-17.0.8.0.7-1.fc38.x86_64/bin/java)
    2 java-11-openjdk.x86_64 (/usr/lib/jvm/java-11-openjdk-11.0.20.0.8-1.fc38.x86_64/bin/java)

    Enter to keep the current selection[+], or type selection number: 2
    ———–
    I selected 2 and cassandra started fine.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.