This is not a tutorial. These are my running notes from getting Cassandra to run on Fedora 32. The debugging steps are interesting in their own right. I’ll provide a summary at the end for any sane enough not to read through the rest.
Table of contents
Old Instructions
So…Starting with https://www.liquidweb.com/kb/how-to-install-cassandra-2-on-fedora-20/ The dsc-20 is, I think, version specific, so I want to se if there is something more appropriate for F32 (has it really been so many years?)
Looking in here https://rpm.datastax.com/community/noarch/ I see that there is still a dsc-20 series of packages, but also dsc-30…which might be a bit more recent of a release.
Dependencies resolved. ================================================================================================================== Package Architecture Version Repository Size ================================================================================================================== Installing: dsc30 noarch 3.0.9-1 datastax 1.9 k Installing dependencies: cassandra30 noarch 3.0.9-1 datastax 24 M Transaction Summary ================================================================================================================== Install 2 Packages |
I’d be interested to see what is in the dsc30 package versus Cassandra.
$ rpmquery --list dsc30 (contains no files) |
OK. But…there is no Systemd file:
sudo systemctl start cassandra
Failed to start cassandra.service: Unit cassandra.service not found. |
Garbage Collection Configuration
We’ll, let’s just try to run it.
sudo /usr/sbin/cassandra Unrecognized VM option 'UseParNewGC' |
Seems like it is built to use an older version of the Java CLI params, which is now gone. Where does this come from?
$ rpmquery --list cassandra30 | xargs grep UseParNewGC 2>&1 | grep -v "Is a direc" /etc/cassandra/default.conf/jvm.options:-XX:+UseParNewGC |
We can remove it there. According to this post, the appropriate replacement is -XX:+UseG1GC
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release. Unrecognized VM option 'PrintGCDateStamps' |
OK, lets take care of both of those. According to this post, the GC line we put in above should cover UseConcMarkSweepGC.
The second option is in the logging section. It is not included in the jvm.options. However, if I run it with just the first option removed, I now get:
$ sudo /usr/sbin/cassandra [0.000s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/var/log/cassandra/gc.log instead. Unrecognized VM option 'PrintGCDateStamps' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. |
More trial and error shows I need to comment out all of the GC logging values at the bottom of the file:
#-XX:+PrintGCDetails #-XX:+PrintGCDateStamps #-XX:+PrintHeapAtGC #-XX:+PrintTenuringDistribution #-XX:+PrintGCApplicationStoppedTime #-XX:+PrintPromotionFailure #-XX:PrintFLSStatistics=1 #-Xloggc:/var/log/cassandra/gc.log #-XX:+UseGCLogFileRotation #-XX:NumberOfGCLogFiles=10 #-XX:GCLogFileSize=10M |
-Xloggc is deprecated. Will use -Xlog:gc:/var/log/cassandra/gc.log instead. This is not from the jvm.options file (it was already commented out above).
$ rpmquery --list cassandra30 | xargs grep loggc 2>&1 | grep -v "Is a direc" /etc/cassandra/default.conf/cassandra-env.sh:JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log" /etc/cassandra/default.conf/cassandra-env.sh.orig:JVM_OPTS="$JVM_OPTS -Xloggc:${CASSANDRA_HOME}/logs/gc.log" /etc/cassandra/default.conf/jvm.options:#-Xloggc:/var/log/cassandra/gc.log |
I’m going to replace this with -Xlog:gc:/var/log/cassandra/gc.log as the message suggests in /etc/cassandra/default.conf/cassandra-env.sh
Thread Priority Policy
$ sudo /usr/sbin/cassandra intx ThreadPriorityPolicy=42 is outside the allowed range [ 0 ... 1 ] Improperly specified VM option 'ThreadPriorityPolicy=42' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. |
$ rpmquery --list cassandra30 | xargs grep ThreadPriorityPolicy 2>&1 | grep -v "Is a direc" /etc/cassandra/default.conf/cassandra-env.sh:JVM_OPTS="$JVM_OPTS -XX:ThreadPriorityPolicy=42" /etc/cassandra/default.conf/cassandra-env.sh.orig:JVM_OPTS="$JVM_OPTS -XX:ThreadPriorityPolicy=42" |
Looks like that was never a legal value. Since I am running a pretty tip-of-tree Linux distribution and OpenJDK version, I am going to set this to 1.
And with that, Cassandra will run. Too much output here. Let’s try to connect:
cqlsh doesn’t run
cqlsh Connection error: ('Unable to connect to any servers', {'127.0.0.1': OperationTimedOut('errors=Timed out creating connection (5 seconds), last_host=None',)}) |
OK…let’s dig: First, is it listening:
$ ps -ef | grep java root 618809 5573 7 14:30 pts/3 .... java 618809 root 61u IPv4 32477117 0t0 TCP localhost:7199 (LISTEN) java 618809 root 62u IPv4 32477118 0t0 TCP localhost:46381 (LISTEN) java 618809 root 70u IPv4 32477124 0t0 TCP localhost:afs3-fileserver (LISTEN) $ grep afs3-file /etc/services afs3-fileserver 7000/tcp # file server itself afs3-fileserver 7000/udp # file server itself |
I’m not sure off the top of my head which of those is the Query language port, but I can telnet to 7000, 7199, and 46381
Running cqlsh –help I see:
Connects to 127.0.0.1:9042 by default. These defaults can be changed by setting $CQLSH_HOST and/or $CQLSH_PORT. When a host (and optional port number) are given on the command line, they take precedence over any defaults. |
Lets give that a try:
[ayoung@ayoungP40 ~]$ cqlsh Connection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection <AsyncoreConnection(140617688084880) 127.0.0.1:7000 (closed)> is already closed',)}) [ayoung@ayoungP40 ~]$ export CQLSH_PORT=7100 [ayoung@ayoungP40 ~]$ cqlsh Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 7100)]. Last error: Connection refused")}) [ayoung@ayoungP40 ~]$ export CQLSH_PORT=46381 [ayoung@ayoungP40 ~]$ cqlsh nOPConnection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection <AsyncoreConnection(139655917236624) 127.0.0.1:46381 (closed)> is already closed',)}) |
Nope. Ok, maybe there is a log file. Perhaps the Casandra process is stuck.
[ayoung@ayoungP40 ~]$ ls -lah /var/log/cassandra/ total 52M drwxr-xr-x. 2 cassandra cassandra 4.0K Oct 19 15:41 . drwxr-xr-x. 23 root root 4.0K Oct 19 11:49 .. -rw-r–r–. 1 root root 19M Oct 19 15:41 debug.logThat is a long log file. I’m going to stop the process, wipe this directory and start again. Note that just hitting Ctrl C on the terminal was not enough to stop the process, I had to send a kill by pid.
This time the shell script exited on its own, but the cassandra process is running in the background of that terminal. lsof provides similar output. The high number port is now 44823 which means that I can at least rule that out; I think it is an ephemeral port anyway.
[ayoung@ayoungP40 ~]$ export CQLSH_PORT=7199 [ayoung@ayoungP40 ~]$ cqlsh Connection error: ('Unable to connect to any servers', {'127.0.0.1': ConnectionShutdown('Connection <AsyncoreConnection(140156084482448) 127.0.0.1:7199 (closed)> is already closed',)}) |
According to This post, the port for The query language is not open. That would be 9042. The two ports are for Data sync and for Java Management Extensions (JMX).
Why don’t I get Query port? Lets look in the log:
INFO [main] 2020-10-19 15:46:11,640 Server.java:160 - Starting listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)... INFO [main] 2020-10-19 15:46:11,665 CassandraDaemon.java:488 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it |
But starting it seems to trigger a cascading failure: I now have a lot of log files. Let me see if I can find the first error. Nah, they are all zipped up. Going to wipe and restart, using tail -f on the log file before asking to restart thrift.
$ grep start_rpc /etc/cassandra/conf/* /etc/cassandra/conf/cassandra.yaml:start_rpc: false /etc/cassandra/conf/cassandra.yaml.orig:start_rpc: false grep: /etc/cassandra/conf/triggers: Is a directory |
Since trying to start it with nodetool enablethrift failed. Let me try changing that value in the config file and restarting. My log file now ends as:
INFO [main] 2020-10-19 16:12:47,695 ThriftServer.java:119 - Binding thrift service to localhost/127.0.0.1:9160 INFO [Thread-1] 2020-10-19 16:12:47,699 ThriftServer.java:136 - Listening for thrift clients... |
$ cqlsh Connection error: ('Unable to connect to any servers', {'127.0.0.1': OperationTimedOut('errors=Timed out creating connection (5 seconds), last_host=None',)}) |
Something is not happy. Let me see where the errors start. tail the log and tee it into a file in /tmp so I can look at it in the end.
ERROR [SharedPool-Worker-6] 2020-10-19 16:24:34,069 Message.java:617 - Unexpected exception during request; channel = [id: 0x0e698e4a, /127.0.0.1:35340 => /127.0.0.1:9042] java.lang.RuntimeException: Unable to access address of buffer at io.netty.channel.epoll.Native.read(Native Method) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:678) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe$3.run(EpollSocketChannel.java:755) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:268) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na] ERROR [SharedPool-Worker-2] 2020-10-19 16:24:34,069 Message.java:617 - Unexpected exception during request; channel = [id: 0x0e698e4a, /127.0.0.1:35340 => /127.0.0.1:9042] |
Note: At this point, I suspected SELinux, so I put my machine in permissive mode. No change.
Native Transport
So I turned to Minecraft. Turns out they have the same problem there, and the solution is to disable native transport: Lets see if that applies to Cassandra.
$ grep start_native_transport /etc/cassandra/conf/* /etc/cassandra/conf/cassandra.yaml:start_native_transport: true /etc/cassandra/conf/cassandra.yaml.orig:start_native_transport: true |
Ok, and looking in that file I see:
#Whether to start the native transport server.
#Please note that the address on which the native transport is bound is the
#same as the rpc_address. The port however is different and specified below.
Let me try disabling that and see what happens. No love but…in the log file I now see:
INFO [main] 2020-10-19 17:23:28,272 ThriftServer.java:119 - Binding thrift service to localhost/127.0.0.1:9160 INFO [main] 2020-10-19 17:23:28,272 ThriftServer.java:119 - Binding thrift service to localhost/127.0.0.1:9160 |
So let me try on that port.
[ayoung@ayoungP40 ~]$ export CQLSH_PORT=9160 [ayoung@ayoungP40 ~]$ cqlsh |
Maybe it needs the native transport, and it should not be on the same port? Sure enough, further down the conf I find:
rpc_port: 9160 |
Change the value for start_native_transport back to true and restart the server.
Now it fails with no message why.
This native_transport intrigues me. Lets see what else we can find…hmm seems as if that is an old protocol, and the native_transport has been in effect for 5 or so years…which would explain why it shows in the F22 page, but is not really supported. I should probably turn it off.
Interlude: nodetool status
OK…what else can I do to test my cluster? Nodetool?
$ nodetool status WARN 21:37:34 Only 51050 MB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.1 186.68 KB 256 100.0% 6cf084ed-a30a-4b80-9efb-acbcd22362c2 rack1 |
Better install repository
OK….wipe the system, try with a different repo. Before this, I wiped out all of my old config files, and will need to remake any changes that I noted above.
sudo yum install http://apache.mirror.digitalpacific.com.au/cassandra/redhat/311x/cassandra-3.11.8-1.noarch.rpm http://apache.mirror.digitalpacific.com.au/cassandra/redhat/311x/cassandra-tools-3.11.8-1.noarch.rpmStill no systemd scripts. Maybe in the 4 Beta. I’ll check that later. Make the same config changes for the jvm.options. Note that the thread priority has moved here, too. Also, it does not want to run as root any more…progress. Do this to let things log:
sudo chown -R ayoung:ayoung /var/log/cassandra/ |
Insufficient permissions on directory /var/lib/cassandra/data
Get that one too.
sudo chown -R ayoung:ayoung /var/lib/cassandra/telnet localhost 9160
Connects….I thought that would be turned off…interesting
cqlsh Connection error: (‘Unable to connect to any servers’, {‘127.0.0.1’: ConnectionShutdown(‘Connection to 127.0.0.1 was closed’,)}) $ strace cqlsh 2>&1 | grep -i connect\( connect(5, {sa_family=AF_INET, sin_port=htons(9160), sin_addr=inet_addr(“127.0.0.1”)}, 16) = -1 EINPROGRESS (Operation now in progress)This despite the fact that the docs say
Connects to 127.0.0.1:9042 by default.
Possibly picking up from config files.
Anticlimax: Works in a different terminal
I just opened another window, typed cqlsh and it worked…..go figure….Maybe some phantom env var from a previous incantation.
Summary
- For stable (not beta) Use the 3.11 version
- Run as non-root user is now enforced
- Change ownership of the var directories so your user can read and write
- remove the GC options from jvm.options
- Set the threading priority policy to 1 (or 0)
- make sure that you have a clean env when running cqlsh
It would be nice to have a better understanding of what went wrong running cqlsh. Problems that go away by themselves tend to return by them selves.
For all those who land here with cassandra service starting issues –
All you need is to configure the proper JAVA env for the cassandra ver you have installed.
In case if you have multi java env installed – use alternatives as below –
[root@fedora ~]# update-alternatives –config java
There are 2 programs which provide ‘java’.
Selection Command
———————————————–
*+ 1 java-17-openjdk.x86_64 (/usr/lib/jvm/java-17-openjdk-17.0.8.0.7-1.fc38.x86_64/bin/java)
2 java-11-openjdk.x86_64 (/usr/lib/jvm/java-11-openjdk-11.0.20.0.8-1.fc38.x86_64/bin/java)
Enter to keep the current selection[+], or type selection number: 2
———–
I selected 2 and cassandra started fine.