Pacemaker cluster on OpenIndiana

April 3rd, 2013

Today I finally got my NAS cluster working on the 3 OI 151a7 boxes I have set up in the lab at work for this purpose.

I took the work Mike Rowell had done on Linux-HA clustering and tweaked it a bit, and then added an OCF resource script to handle zpools and SCSI-3 persistent reservations groups. The idea of doing this without SCSI-3 PRG seemed like a disaster waiting to happen, but the sg3_utils that are available seem to work fine with our EMC Vplex.

I’ll have to document the details, perhaps in a future post here or something. It’s far from polished yet, but I can move zpools between nodes in the cluster with crm and crm_resource, and they’re exported as expected. Maybe it’ll help someone else trying to do HA SAN work with OI.

I’m hoping to make this work with OmniOS too.

sys-fs/eudev

January 19th, 2013

Looks like it’s finally time to migrate to sys-fs/eudev. udev 197 was unmasked today on amd64, and I would rather not go down that path.

At least I have an alternative, so I don’t have to go it alone.

I ended up having to use kmod, instead of module-init-tools. modprobe -l no longer works. I guess that feature is deprecated, and I’m supposed to use find in /lib/modules now. That seems less convenient. Other than that they seem the same.

ZFS assertion failed for SAN devices

January 8th, 2013

This was frustrating:

# zpool import nas2
Assertion failed: rn->rn_nozpool == B_FALSE, file ../common/libzfs_import.c, line 1086, function zpool_open_func

Discovered that I needed to tell zpool where to look for the devices:

zpool import -d /devices/scsi_vhci nas2

There was a time when I would have known that already, if we still used Solaris in any way at work. I wish the error message was more descriptive. Considering asking about it on the openindiana or illumos mailing lists.

Apache proxy relay OpenIndiana pkg install

May 22nd, 2012

Hi,

This took me a bit of time to figure out. I needed to update some OpenIndiana boxes from 151a to 151a4 and they had no direct access to the internet. I did, however, have a box that I could relay the requests through. I first figured this method out, and then later I found a hint that there is a better way to do it. I don’t have the details on the “new” way written down, but I will include this other Apache virtual host proxy thing in case someone finds it useful.

I ended up adding this to the 00_default_vhost.conf file (on Gentoo, but a similar incantation would work on Apache 2.2 elsewhere I’m sure):

Listen 8080
NameVirtualHost *:8080

<virtualhost *:8080>
        ServerName relay.example.com:8080
        Include /etc/apache2/vhosts.d/default_vhost.include

        <ifmodule mpm_peruser_module>
                ServerEnvironment apache apache
        </ifmodule>
       
        ProxyRequests On
        ProxyVia Block
        ProxyStatus On
        ProxyPreserveHost Off
        ProxyPass /dev/ http://pkg.openindiana.org/dev/
        ProxyPass /legacy/ http://pkg.openindiana.org/legacy/
        ProxyPass /sfe/ http://pkg.openindiana.org/sfe/
        ProxyPass /sfe-encumbered/ http://pkg.openindiana.org/sfe-encumbered/
        ProxyPassReverse /dev/ http://pkg.openindiana.org/dev/
        ProxyPassReverse /legacy/ http://pkg.openindiana.org/legacy/
        ProxyPassReverse /sfe/ http://pkg.openindiana.org/sfe/
        ProxyPassReverse /sfe-encumbered/ http://pkg.openindiana.org/sfe-encumbered/
       
        AllowEncodedSlashes NoDecode

</virtualhost>

This was cobbled together from various sources. I wish I’d kept some references to them. The tricky bit for me was the AllowEncodedSlashes, an Apache directive I had never heard of before. Before that it can’t find packages because it translates %2F to /, and therefore the OpenIndiana pkg.depotd server can’t find the right file.

After that you just point at this with pkg set-publisher:

pfexec pkg set-publisher -p http://relay.example.com:8080/dev/ openindiana.org

You will also need to clean up the old openindiana.org publisher if I recall correctly.

Anyway, it kept me from having to copying all of OpenIndiana and setting up a repository locally.

Basic Linux firewall

May 22nd, 2012

Starting on a new Gentoo box, I was putting together a new firewall setup, and I thought I’d put my hacked down firewall setup script here so I’ll have something to start with next time. I used to try some of the other “higher level” tools to generate my firewall, but eventually they all got on my nerves. It was worth it, finally, to sit down for a couple of hours and understand what iptables does. In a lot of ways I prefer it to the Solaris ipf firewall tools now, but that is just personal preference, they are both very capable. I am hardly an expert on either one (or firewalls in general), but they can be useful tools, and provide some peace of mind. I also use this in conjunction with TCP wrappers (/etc/hosts.allow and /etc/hosts.deny).

Anyway, here is a very basic firewall setup script:

#!/bin/sh

# Flush all the rules
/sbin/iptables -F

# Set the default policy for inbound/forwarded/outbound traffic.
/sbin/iptables -P INPUT DROP
/sbin/iptables -P FORWARD DROP
/sbin/iptables -P OUTPUT ACCEPT

# Accept anything on loopback interface.
/sbin/iptables -A INPUT -i lo -j ACCEPT

# Accept traffic from this box to its own IP (e.g. 192.168.1.1).
/sbin/iptables -A INPUT -s 192.168.1.1/32 -j ACCEPT

# Allow state tracking.
/sbin/iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Accept incoming SSH connections.
# You may want to add some source (-s) addresses to this one, depending on
# your security policy.
/sbin/iptables -A INPUT -p tcp --dport ssh -j ACCEPT

# Accept incoming connections from 192.168.1.0/24 to http/https.
/sbin/iptables -A INPUT -p tcp -s 192.168.1.0/24 --dport http -j ACCEPT
/sbin/iptables -A INPUT -p tcp -s 192.168.1.0/24 --dport https -j ACCEPT

# Display all your rules.
/sbin/iptables -L -v -n --line-numbers

# IPv6 example - most people should not need this today, but I use IPv6 networking
# internally just for fun.
/sbin/ip6tables -F

/sbin/ip6tables -P INPUT DROP
/sbin/ip6tables -P FORWARD DROP
/sbin/ip6tables -P OUTPUT ACCEPT

/sbin/ip6tables -A INPUT -i lo -s ::1/128 -j ACCEPT
# This address is specific to my host.  Get your own.  This prefix is for autoconfig anyway.
/sbin/ip6tables -A INPUT -s fe80::4a5b:39ff:fe67:9b7/128 -d fe80::4a5b:39ff:fe67:9b7/128 -j ACCEPT

/sbin/ip6tables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

/sbin/ip6tables -A INPUT -p tcp --dport ssh -j ACCEPT
/sbin/ip6tables -A INPUT -p tcp --dport http -j ACCEPT
/sbin/ip6tables -A INPUT -p tcp --dport https -j ACCEPT

/sbin/ip6tables -A INPUT -p ipv6-icmp -j ACCEPT

/sbin/ip6tables -L -v -n --line-numbers

After you run the script, the rules will be installed. You have to be careful if you’re doing this on a box you can’t get into via other means (iLO, DRAC, physical console). When testing remotely I sometimes run this with a script in cron to clear all the rules.

A cron entry like this will reset the rules on the quarter hour, in case you get locked out:

0,15,30,45 * * * * /sbin/iptables-restore < /root/firewall_reset

And /root/firewall_reset contains:

*filter
:INPUT ACCEPT [164:15203]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [147:63028]
COMMIT

*mangle
:PREROUTING ACCEPT [164:15203]
:INPUT ACCEPT [164:15203]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [147:63028]
:POSTROUTING ACCEPT [147:63028]
COMMIT

*nat
:PREROUTING ACCEPT [14:672]
:POSTROUTING ACCEPT [9:684]
:OUTPUT ACCEPT [9:684]
COMMIT

One you are satisfied with your firewall, you can save the rules with:

/etc/init.d/iptables save
/etc/init.d/ip6tables save

Obviously, make sure you disable the cron job above.

IllumOS development talk

March 19th, 2012

This is a few months old, but worth watching if you’re interested in IllumOS at all:

We ran Solaris 2.0 at the Visualization Lab just to become familiar with it. It would crash just sitting there with nobody logged in, running nothing but system daemons. I don’t think I really liked Solaris at all until 2.3, as I started to get more familiar with it.

OpenIndiana 151a

October 10th, 2011

I’ve been having some problems with the OpenIndiana oi_148 to oi_151a update, but only on one virtual machine in particular. Until today i’ve been stuck, until I thought to:

pkg set-publisher --non-sticky opensolaris.org

After that, the image-update went fine. I need to learn more of the options for pkg, apparently. I had skipped that before because my other OI machines did not have that non-sticky option set, and the upgrade worked there.

python and perl USE flag change

October 9th, 2011

Today, I had to enable python and perl USE flags globally in /etc/make.conf. I guess Gentoo changed the default setting for these flags. Personally I like having the kind of built-in tools and scripts that these provide, so I turned them on. Plus, enabling these flags avoided having to rebuild dozens of ebuilds (I have a lot of stuff installed).

I can understand why the Gentoo devs did it (most people don’t need perl/python/ruby support embedded in Vim, for example), but I’m beginning to think there’s no way for me to keep up with changes, or if there is, I totally missed it, so everytime this changes I’ll be looking at Bugzilla and the Forums. Maybe there’s a way I can track these kind of USE flag changes on my own system, in a more obvious way then the output from eix-sync. I’ll have to look into it.

Gentoo poppler 0.16.7 build error

October 5th, 2011

This is a gentoo problem I hadn’t seen before:

Building CXX object qt4/src/CMakeFiles/poppler-qt4.dir/poppler-ps-converter.cc.o
/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.5/../../../../x86_64-pc-linux-gnu/bin/ld: warning: libopenjpeg.so.2, needed by /usr/lib/libpoppler.so.13, not found (try using -rpath or -rpath-link)
/usr/lib/libpoppler.so.13: undefined reference to `opj_set_event_mgr'
/usr/lib/libpoppler.so.13: undefined reference to `opj_cio_open'
/usr/lib/libpoppler.so.13: undefined reference to `opj_image_destroy'
/usr/lib/libpoppler.so.13: undefined reference to `opj_cio_close'
/usr/lib/libpoppler.so.13: undefined reference to `opj_set_default_decoder_parameters'
/usr/lib/libpoppler.so.13: undefined reference to `opj_destroy_decompress'
/usr/lib/libpoppler.so.13: undefined reference to `opj_create_decompress'
/usr/lib/libpoppler.so.13: undefined reference to `opj_decode'
/usr/lib/libpoppler.so.13: undefined reference to `opj_setup_decoder'
collect2: ld returned 1 exit status
linking of temporary binary failed: Command '['gcc', '-o', '/var/tmp/portage/app-text/poppler-0.16.7/work/poppler-0.16.7_build/glib/tmp-introspectN6cqKw/Poppler-0.16', '-O2', '-march=nocona', '-pipe', '-L.', '-Wl,-rpath=.', '-lpoppler-glib', '-pthread', '-lgio-2.0', '-lgobject-2.0', '-lgmodule-2.0', '-lgthread-2.0', '-lrt', '-lglib-2.0', '/var/tmp/portage/app-text/poppler-0.16.7/work/poppler-0.16.7_build/glib/tmp-introspectN6cqKw/Poppler-0.16.o']' returned non-zero exit status 1
make[2]: *** [glib/Poppler-0.16.gir] Error 1
make[1]: *** [glib/CMakeFiles/gir-girs.dir/all] Error 2
[ 93%] Building CXX object qt4/src/CMakeFiles/poppler-qt4.dir/poppler-qiodeviceoutstream.cc.o
[ 94%] Building CXX object qt4/src/CMakeFiles/poppler-qt4.dir/poppler-sound.cc.o
[ 94%] Building CXX object qt4/src/CMakeFiles/poppler-qt4.dir/poppler-textbox.cc.o
[ 94%] Building CXX object qt4/src/CMakeFiles/poppler-qt4.dir/poppler-page-transition.cc.o
[ 95%] Building CXX object qt4/src/CMakeFiles/poppler-qt4.dir/__/__/poppler/ArthurOutputDev.cc.o
Linking CXX shared library libpoppler-qt4.so

Turns out, I needed to build with “-introspection” in order to build it again with “introspection”. Or maybe I could just globally disable introspection, but not sure what that would change, it looks like a Gnome thing. As usual I ended up with lots of Gnome and KDE installed, and really I still just use Enlightenment everywhere. If it wasn’t Enlightenment, it would be Windowmaker or something similar.

A new way to cause a reboot loop

September 23rd, 2010

I sometimes have a need to take down a Linux host “hard”, that is, without the normal shutdown scripts.  Among other methods, I accomplish this with:

echo b > /proc/sysrq-trigger

This is suboptimal (risks corruption), but it does the job. The hosts in this case have kernel modules that the normal “rmmod” would hang on forever, and since they have no console/ILO or IPMI or other power control I am forced to use this or some other trick to take down (reboot -f or sending a NOC person out on the floor are the other common methods, the latter being our last resort).

Because this immediately forces the kernel to reboot (assuming you have the “Magic SysRq” option in your kernel), using ssh becomes a problem, as the TCP connection dies without you getting a RST packet, so normal ssh and TCP timeout mechanisms apply, and it’ll take a few minutes for the host to come back up and issue a TCP RST in response to a keep alive message for that connection.

Now, every once in awhile I do something without giving it enough thought. That’s a kind way of saying I do something really dumb. Imagine, if you will trying to reboot a lot of machines via an “at” job. Further imagine that you decided to change your normal reboot/shutdown -r to the aforementioned sysrq-trigger trick.

Also interesting is that using sysrq-trigger doesn’t allow the at job to be removed from the queue before the machine is restarted, and will just say there until the job is allowed to exit normally.

I think you see what I did. Dumb. Thankfully there are a few seconds available between boot and when the job runs to remove the job from the queue.