Obtaining correct core-ids on NUMA shared memory machines

On NUMA architectures it is sometimes cumbersome to obtain the correct core-ids, which are needed for efficient pinning of processes. For example on the multi-socket Westmere EX systems I’m currently working on, it’s not simply 0-9 for the physical core ids of the first CPU but, depending on the machine, sometimes 0-9 scattered over all CPUs or scattered in blocks. On one of the systems with 40 physical cores (80 threads) there was one of the cores numbered with ids 64 and 72, meaning you wouldn’t run on all physical cores if pinning your program to cores 0-39.

There are tools that allow to investigate the memory hierarchy and ids in all details, like Likwid, but on GNU/Linux you can actually derive most required information from /proc/cpuinfo. Hence, I googled a bit around and found this thread, where some of the required work was done already. I modified the script a bit to suit my needs and what came around was that:


# ncores
# "THE BEER-WARE LICENSE" (Revision 42):
# <br _at_ re-web _dot_ eu> wrote this file. As long as you retain this notice
# you can do whatever you want with this stuff. If we meet some day, and you
# think this stuff is worth it, you can buy me a beer in return.
# This script helps to investigate the number of physically available cores on
# a multicore machine and extracts the physical core ids ordered by memory
# affinity (i.e., all cores of one NUMA entity). It is particularly useful
# to do pinning on NUMA machines.
# It is based on the script from this thread:
# http://www.dslreports.com/forum/r20892665-Help-test-Linux-core-count-script

if [ $1 == "-h" ]; then
echo " Usage: ncores [-h | -l | -p | -v | -n <#proc>]"
echo "Parameters:"
echo " -h Print this help message"
echo " -l Print a list of all physical cores as"
echo " ‘<socket id>,<local core id>’"
echo " -p Print a comma separated list of all physical core ids"
echo " -v Print a comma separated list of all virtual core ids"
echo " -n <#proc> Print #proc core ids as comma separated list, starting"
echo " with physical ids and followed by virtual ids"

exit 0


# find the number of real cores/cpus in a box – ignore hyperthreading "logical" CPUs

# check if the physical id and core id fields are there. if not, just use
# raw processor count as the number
num_cores=`grep ‘physical id’ $file | sort -u | wc -l`

if [ $num_cores -eq 0 ]; then
# this box is either an old SMP or single-CPU box, so count the # of processors
num_cores=`grep ‘^processor’ $file | sort -u | wc -l`
list=(`grep -iE ‘^processor’ $file | cut -d: -f2 | tr -d ‘ ‘`)
# have to factor in physical id (physical CPU) and core id (multi-core)
# for each "processor" in $file
# concatenate physical_id and core_id then find the unique list of these
# to get the # of cores/cpus
processor_ids=(`grep -iE ‘^processor’ $file | cut -d: -f2 | tr -d ‘ ‘`)
physical_ids=(`grep -iE ‘physical.*id’ $file | cut -d: -f2 | tr -d ‘ ‘`)
core_ids=(`grep -iE ‘core.*id’ $file | cut -d: -f2 | tr -d ‘ ‘`)
for ent in ${physical_ids[@]}; do
if [ -z "$core_ids" ]; then
list+=( "$ent,-" )
entry="$(printf %04d $ent),0000"
list+=( "$ent,$core_id" )
entry="$(printf %04d $ent),$(printf %04d $core_id)"
plist+=( "$entry:$processor_id" )

# count number of physical cores
num_cores=`echo ${list[*]} | tr ‘ ‘ ‘n’ | sort -u | wc -l`

# create list of sockets with logical number of cores within sockets
list=`echo ${list[*]}`

# create list of all (physical and virtual) core ids
clist=`echo ${plist[*]} | tr ‘ ‘ ‘n’ | tr ‘:’ ‘ ‘ | sort | cut -c11-`
clist=( $clist )

# create list of core ids with a single id per physical core
plist=`echo ${plist[*]} | tr ‘ ‘ ‘n’ | tr ‘:’ ‘ ‘ | sort | uniq -w 9 | cut -c11-`
plist=( $plist )

# build list of virtual ids as diff of clist and plist
for i in ${clist[@]}; do
for j in ${plist[@]}; do
[[ $i == $j ]] && { skip=1; break; }
[[ -n $skip ]] || vlist+=( $i )

# build list of all ids in correct order
clist=( ${plist[*]} ${vlist[*]} )

# request detailed list
if [ "$1" == "-l" ]; then
echo $list | tr ‘ ‘ ‘n’ | sort -u
# request physical id list
elif [ "$1" == "-p" ]; then
echo ${plist[*]} | tr ‘ ‘ ‘,’
# request virtual id list
elif [ "$1" == "-v" ]; then
echo ${vlist[*]} | tr ‘ ‘ ‘,’
#request certain number of ids
elif [ "$1" == "-n" ]; then
[[ $2 -gt 0 ]] && { num_cores=$2; }
echo ${clist[@]:0:$num_cores} | tr ‘ ‘ ‘,’
# print only number of cores
echo $num_cores

exit 0

If called without any parameters, it just prints the number of physical cores. However, what I usually use for pinning, is a call like ncores -n 10, to obtain the first ten core ids such that they are placed as close to each other as possible, hence reducing memory access latency. To automatically set the required environment variables for OpenMP, I embedded it in a script:


if [[ $# -lt 2 ]]; then
echo "Usage: $0 <np> <binary> <arg1> <arg2> …"
exit 1

PROCLIST=$(ncores -n $NPROCS)



echo "Setting KMP_AFFINITY to ‘$KMP_AFFINITY’"


${BINARY} "$@"

Deployment of FeedHQ on Apache 2.2 – an Open Source Google Reader alternative

Click here to get directly to the deployment tutorial.

When Google announced last year that they’re discontinuing Google Reader, this came like a shock to many people. Especially since it seems like there were only few alternatives available, of which even less provided similar functionalities as Google did. Especially when it comes to synchronization services, that allow keeping feeds up-to-date across multiple devices, there were no real options other than Google Reader.

However, a lot has happened in terms of developments since that date: Some services, like Feedly or Digg, attempted to take over. Especially Feedly gained a lot of new users after the shutdown announcement. For those comfortable to host a synchronization service by themselves Tiny Tiny RSS became a popular choice. And an attempt to define a common and open synchronization interface, called Open Reader API, was born – widely based on the Google Reader API – and is (so far) implemented by two news aggregators: BazQux and FeedHQ.

I haven’t been using Google Reader or any other news aggregator yet but simply relied on my beloved Vienna, a very good OpenSource RSS client for OS X. But as a fresh owner of a new Fairphone (instead of an old shitty iPhone 3G S) I now wanted to be able to read the feeds also on other devices and online on a web interface. As Vienna supports the Open Reader API it was a natural choice to go for that solution.

FeedHQ and BazQux both provide extremely cheap accounts (12$/year or 9-29$/year, respectively), so it would have been an easy choice to just sign up there and use their services. But I’m a fan of keeping my data with me and hosting as much on my own server as possible (even though both provider state that they’re dedicated to users’ privacy), as I also do it for contacts, calendars and file sharing with my own instance of OwnCloud (instead of Google Calendar) and my photos on my own Gallery (instead of using flickr or something similar). And so far I’ve always felt reassured towards this attitude, for example recently when Feedly decided over night to only allow Google+ logins.

Therefore I chose the great FeedHQ-project, which is available as Open Source Software from GitHub.

Installing FeedHQ on Apache 2.2

Getting FeedHQ to work is not very difficult – but you should know what you’re doing. However, in combination with the popular Apache webserver (at the time of writing I’m using Apache 2.2.22 on Ubuntu 12.04 LTS) there a few small tricks one needs to know about, to get it working properly. I figured them out in a long debug-session with FeedHQ’s developer Bruno Renié.

FeedHQ is based on the Django Python Web-Framework, so if you have deployed or even developed an application with that framework before, you probably know about most of the following steps. A short installation manual is also available.

1. Install required dependencies
At the moment these are Python 2.7, Redis (2.6+ recommended) and PostgreSQL (9.2+ recommended). As I’m using most of such packages from the Ubuntu repositories, I have to rely on the versions available there, which are Python 2.7.3 and PostgreSQL 9.1.11. However, Redis is hopelessly outdated in the repositories, so I had to install that one manually (for example as described here).

If you want to use WSGI for the deployment (like I did), you also need the package libapache2-mod-wsgi from the Ubuntu repositories.

2. Check out the current version of FeedHQ
Obviously, the first step with FeedHQ is to obtain the current version of the code from GitHub:

git clone https://github.com/feedhq/feedhq.git
cd feedhq

3. Setup virtual python environment and install dependencies
This is done with the following commands. You might have to install python-virtualenv, python-pip and virtualenvwrapper from the repositories first.

virtualenv -p python2 env
source env/bin/activate
add2virtualenv .
pip install -r requirements.txt

4. Setup environment variables

FeedHQ relies on a number of environment variables, which are described in the configuration section. Unfortunately, WSGI does not pass any environment variables from the Apache configuration as os.environ['VAR_NAME'] on to the wsgi-script. There are a few very ugly workarounds for that, but none of them really convinced me. Therefore I just hard-coded them directly into feedhq/wsgi.py like that:
os.environ[‘DJANGO_SETTINGS_MODULE’] = ‘feedhq.settings’
os.environ[‘SECRET_KEY’] = ‘some very long secret key’
os.environ[‘ALLOWED_HOSTS’] = ‘sub.domain.com’
os.environ[‘FROM_EMAIL’] = ‘feedhq@domain.com’
os.environ[‘REDIS_URL’] = ‘redis://localhost:6379/1’ # (or however your redis is configured)
os.environ[‘DATABASE_URL’] = ‘postgres://<user>:<password>@localhost:5432/feedhq’ # (or however your database is set up)
os.environ[‘HTTPS’] = ‘1’ # (if you’re going to use an HTTPS connection)

It’s quite ugly, I know, but it was a fast fix. If you know of any more elegant solution, please let me know!

A short version of the corresponding configuration for the virtual host is (only the most important entries and assuming, feedhq was checked out to /var/www/feedhq):
<IfModule mod_ssl.c>
<VirtualHost *:443>
ServerName feedhq.domain/com
SSLEngine On

# Static files, media and css
AliasMatch ^/([^/]*.css) /var/www/feedhq/feedhq/static/styles/$1
Alias /static/ /var/www/feedhq/feedhq/static/
Alias /media/ /var/www/feedhq/feedhq/media/

<Directory /var/www/feedhq/static>
Order deny,allow
Allow from all

<Directory /var/www/feedhq/media>
Order deny,allow
Allow from all

# Let Django handle the authorization, not Apache
WSGIPassAuthorization On

# Allow encoded slashs
AllowEncodedSlashes On

# WSGI script
WSGIScriptAlias / /var/www/feedhq/feedhq/wsgi.py

<Directory /var/www/feedhq/feedhq>
<Files wsgi.py>
Order allow,deny
Allow from all

5. Create admin user

Finally, you just need to create an admin user to get started:

manage.py createsuperuser –username=joe –email=joe@example.com

Using that one you can now login to the admin panel feedhq.domain.com/admin, setup more users, groups and get to know FeedHQ.

6. Setup Cron-Jobs and task queue

To get all feeds and maintenance tasks done, a number of cronjobs is necessary, which again rely on a few workers that have to be running.

All required cronjobs are described in the installation manual. To setup the environment correspondingly, I’ve encapsulated the commands in a small bash-script, that first sets the variables and then executes the requested command.

To make sure that the workers are always online when restarting the server, I’ve written a small upstart script (since Ubuntu does now support upstart) that’s stored as /etc/init/feedhq:
# FeedHQ
# Server Application with OpenReader API to synchronize RSS Feeds

description "FeedHQ"

start on (postgresql-started and runlevel [2345])
stop on runlevel [!2345]

exec start-stop-daemon –start -c www-data –exec /scripts/www-data/feedhq_run_django-admin.sh rqworker store high default favicons

As you see, it does not directly call the django-admin but instead the before mentioned script, which only sets the appropriate environment variables first. It also waits that PostgreSQL is started first (since the DB is required), but unfortunately can’t wait for redis without too much of a trouble. Therefore I skipped this check for now.

With cronjobs and workers set up you’re done and can start enjoying FeedHQ.

LAMPP+Calibre+OwnCloud = my own fully integrated online-library and news-aggregator

Inspired by the recent Twitter-campaign #ichwillzahlen I thought, what I would like to have right to read text news. In my case this is a collection of several news feeds, that I can read on my Kindle during breakfast or on the way to university. But there is no handy and comfortable solution available, yet. So, I decided to create it myself.

My wish list of features:

  • Aggregation of several online medias (homepages of newspapers, news channels etc.)
  • Automaticpreparation in an e-book-reader friendly format (for the Kindle, this is .mobi)
  • Automatic download/transmission to my Kindle

If you just want to have a look what is now possible: have a look at the conclusion.

Status quo

I had a few solutions tested some time ago, and they all were too complicated/time consuming in the morning: Calibre offers a proper way to manage e-books and sync them with a Kindle. It also has a plugin readily available, that allows to download news feeds with a lot of presets and put them in a nice format. But to make it work automatically, my Notebook has to run including Calibre. Then it downloads feeds automatically and even sends them to the Kindle-mail-address at Amazon.

Nowadays everything goes to the cloud, why not also this?

Calibre offers a nice feature, a content server, that allows to browse the library in HTML-format and download ebooks. So, lets put Calibre on the server and make my library available there. But I would like to still maintain and fill it on my local notebook.

First step: sync Calibre Library with OwnCloud

Since the Calibre Library is just a folder, that contains many subfolders with the ebooks and some metadata plus a sqlite-database, it is quite easy to sync it. The most comfortable way for me was to use my existing installation of OwnCloud to do that. I just added the folder to the list of Sync-Folders – and voila: it’s as easy as Dropbox. Since OwnCloud (un)fortunately doesn’t offer encryption of the files on the server, the whole library is now available on the server.

Second step: Calibre Content Server

To make the library accessible, I had to install Calibre on the server. Thanks to out-of-the-box usable packages this is no problem at all. Luckily the content server integrates neatly with existing webservers using WSGI, so it’s as easy as modifying the given script with the correct paths and updating the virtual host of ownCloud, to provide the content server under the folder ‘/calibre’. My WSGI-script looks like that, with all paths suitable for Ubuntu 12.04:

[code language=”python”]
# WSGI script file to run calibre content server as a WSGI app

import sys, os

# You can get the paths referenced here by running
# calibre-debug –paths
# on your server

# The first entry from CALIBRE_PYTHON_PATH
sys.path.insert(0, ‘/usr/lib/calibre’)

sys.resources_location = ‘/usr/share/calibre’

sys.extensions_location = ‘/usr/lib/calibre/calibre/plugins’

# Path to directory containing calibre executables
sys.executables_location = ‘/usr/bin’

# Path to a directory for which the server has read/write permissions
# calibre config will be stored here

del sys
del os

from calibre.library.server.main import create_wsgi_app
application = create_wsgi_app(
# The mount point of this WSGI application (i.e. the first argument to
# the WSGIScriptAlias directive). Set to empty string is mounted at /

# Path to the calibre library to be served
# The server process must have write permission for all files/dirs
# in this directory or BAD things will happen

# The virtual library (restriction) to be used when serving this
# library.
# virtual_library=None

del create_wsgi_app

only the path_to_library has to modified, depending on your path to OwnCloud, Username in OwnCloud and name of the synced Library folder. To restrict access to myself I added simple htaccess-authentication to the Content-Server. Now it is already possible to browse the library online and download ebooks. This works also with the (experimental) browser provided by the Kindle itself. Important: due to a bug in Python or Calibre (noone feels responsible), the content server has to be accessed using ‘/calibre/’, i.e. include a trailing slash. Otherwise you will end up with Internal Server Errors.

Step 3: News aggregation

But now I don’t have the latest news on my Kindle yet. Luckily, someone did that before me. A script called news2kindle downloads a list of news feeds using Calibres recipes, packs them into a zip-file and sends them to the Kindle mail address. This worked perfectly fine out of the box. I created two lists with sources, one for daily and one for weekly feeds, and call the script with each list from a cronjob with the respective frequency.

Step 4: Adding news to the library

To complete the combination: I also wanted to add the downloaded newsfeeds to my calibre library, such that I could also download them via the web interface. Calibre allows to add files to its library using the command ‘calibredb add’. This will copy the ebook to the library folder and insert it into the database. Simply adding that to news2kindle makes the news now also available via the content server. But: OwnCloud has a caching technique for the list of files stored. And it doesn’t know anything about those newly added ebooks. Unfortunately, OC in version 5.0.x doesn’t provide any functionality to rescan the file tree and it seems the developer are also not aiming on adding it. Does that cause a problem? It does indeed! When syncing the library back to my notebook, the Calibre database will know about the ebooks from the news feeds, but won’t have the respective files. So how to make OC aware of those?

The cleanest solution would be to simply sync the Library from OwnCloud to a dedicated folder on the webserver. This would cause to have it available there twice, but other than that everything works fine. Unfortunately, the Sync Client needs an XServer to run. So I first tried this alternative solution which I simply didn’t get to work.

So instead I used the webdav mount as a quick-and-dirty fix. Mounting the Calibre-Library as a webdav-resource directly from OwnCloud makes it aware of all file-changes. However, calling ‘calibredb add’ on this mount is not possible, since the command modifies an sqlite database, which is not possible on webdav resources without further tweaks. But I do know (or can influence) where the new ebooks will be stored in the library folder (which is simply the folder ‘calibre’ in the Library folder). So I still call ‘calibredb add’ directly as before and just touch everything in calibre on the webdav mount. That way OC knows about the new files. Of course the Webdav-mount has to be configured appropriately in fstab. My resulting news2kindle script looks like that:

[code language=”bash”]
# —————————————————————————–
# news2kindle.sh
# Copyright (C) 2011 Gerald Backmeister (http://mamu.backmeister.name)
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
# —————————————————————————–
# Version: 0.1

if [[ $# -lt 1 ]]; then
echo "No recipe file given!"
exit 1


# If news shall be added to a library: add the path here, otherwise leave it empty



sendmail() {
if [[ -z $4 ]]; then
echo "To: $mail_recipient"
echo "Subject: $subject"
echo "MIME-Version: 1.0"
echo ‘Content-Type: multipart/mixed; boundary="-q1w2e3r4t5"’
echo ‘—q1w2e3r4t5’
echo "Content-Type: text/html"
echo "Content-Disposition: inline"
echo $message
echo ‘—q1w2e3r4t5’
echo ‘Content-Type: application; name="’$(basename $attachment_file)’"’
echo "Content-Transfer-Encoding: base64"
echo ‘Content-Disposition: attachment; filename="’$(basename $attachment_file)’"’
/usr/bin/uuencode –base64 $attachment_file $attachment_file
echo ‘—q1w2e3r4t5–‘
) | /usr/sbin/sendmail $flags $mail_recipient
return $?

NEWS_SUB_DIR=`date +"%Y%m%d_%H%M"`

# mount webdav folder

# Create folder
mkdir -p "$NEWS_DIR/$NEWS_SUB_DIR"

while read recipe; do
if [[ $recipe == #* ]]; then
if [[ $recipe == "" ]]; then

if [[ $recipe == */* ]]; then
recipe_name=`basename $recipe_file .recipe`
echo "Downloading $recipe_name to $recipe_target_file…" >"$recipe_log_file"
/usr/bin/ebook-convert "$recipe_file" "$recipe_target_file" >>"$recipe_log_file" 2>>"$recipe_log_file"
chmod +r "$recipe_target_file"
if [[ -n $CALIBRE_LIBRARY ]]; then
/usr/bin/calibredb add -d –with-library "$CALIBRE_LIBRARY" "$recipe_target_file"
touch "$CALIBRE_LIBRARY_WEBDAV/calibre/*"
recipe_counter=`expr $recipe_counter + 1`
done < $recipes_file # unmount webdav folder umount "$CALIBRE_LIBRARY_WEBDAV" ERRORS=$((ERRORS+$?)) cd "$NEWS_DIR/$NEWS_SUB_DIR" >/dev/null

echo "Compressing documents…"
/usr/bin/zip $zip_file *${TARGET_FORMAT_EXTENSION}

echo "Sending email…"
sendmail $MAIL_RECIPIENT news2kindle $zip_file $MAIL_SENDER

cd – >/dev/null

# all fine? then delete the tmp-files
if [[ $ERRORS -gt 0 ]]; then
rm -rf "$NEWS_DIR/$NEWS_SUB_DIR" >/dev/null

# check for errors to set return value
if [[ $ERRORS -gt 0 ]]; then
exit 1
exit 0


Now my preferred news feeds are downloaded automatically to my Calibre Library and are sent to my Kindle via E-Mail. All I have to do in the morning is to active the WiFi on my Kindle. At the same time I can also access all my ebooks over a (protected) web-interface and download them directly to the e-Reader. No more need to sync them via USB.

One could now also add some functionalities to automatically remove old issues of the news feeds from the library to avoid bloating it. This is also simply possible via ‘calibredb remove’, but requires a few thoughts on how to determine the proper ids of the entries. Since the timestamp of folders of old issues are updated each time new feeds are added, its not possible to identify them this way. Again, for that one would need to make OC aware of the changes to the file system, which is already not solved as elegant as possible.

A word of warning to the end: Having Calibre open during the time the news feeds are updated will lead to conflicts between the databases on the server and your local computer. Be aware of that and avoid it! I solved it for me by downloading the feeds early in the morning, when I’m (more or less sure), that I won’t be awake anyways… Also I recommend to use the same Calibre versions on the server and your local computer. I’m not sure how well different versions are compatible.

Displaying all available make-targets

At the moment I’m writing a lot of very small testcases, which are all compiled with the same simple command. Since I’m lazy and probably nobody ever needs them again, I’m just adding them as a new target to a single Makefile. The file is now already quite bloated and I forget about the target naming. For that it could be handy to have a command, that simply displays all available targets. A short Google-search brings up a nice script, that does exactly that:

[code language=”bash”]

/^# Make data base/,/^# Files/d # skip until files section
/^# Not a target/,+1 d # following target isnt
/^.PHONY:/ d # special target
/^.SUFFIXES:/ d # special target
/^.DEFAULT:/ d # special target
/^.PRECIOUS:/ d # special target
/^.INTERMEDIATE:/ d # special target
/^.SECONDARY:/ d # special target
/^.SECONDEXPANSION/ d # special target
/^.DELETE_ON_ERROR:/ d # special target
/^.IGNORE:/ d # special target
/^.LOW_RESOLUTION_TIME:/ d # special target
/^.SILENT:/ d # special target
/^.EXPORT_ALL_VARIABLES:/ d # special target
/^.NOTPARALLEL:/ d # special target
/^.ONESHELL:/ d # special target
/^.POSIX:/ d # special target
/^.NOEXPORT:/ d # special target
/^.MAKE:/ d # special target

# The stuff above here describes lines that are not
# explicit targets or not targets other than special ones
# The stuff below here decides whether an explicit target
# should be output.

/^[^#t:=%]+:([^=]|$)/ { # found target block
h # hold target
d # delete line
/^# File is an intermediate prerequisite/ { # nope
s/^.*$//;x # unhold target
d # delete line
/^([^#]|$)/ { # end of target block
s/^.*$//;x # unhold target
s/:.*$//p # write current target
d # hide any bugs

make -npq .DEFAULT 2>/dev/null | sed -n -r "$SCRIPT"
| sort | uniq

Having put it with an alias in .bashrc accelerated my workflow a lot. It became also quite handy for larger projects with Makefiles generated by autoconf.