Solr, Tomcat and UTF-8

I had to fix an issue recently where Apache Solr wasn’t returning any results for German words. After altering the schema to accommodate the German language, the same issue of being unable to search for German words was still there. It turns out that earlier version of Apache Tomcat aren’t UTF-8 enabled by default, it’s a configuration option that you need to explicitly set the URI encoding used.

<Connector port=“8080” protocol=“HTTP/1.1” connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" />

You can read more on the Tomcat Wiki.