Fetching content in eZ Publish 5 using Search service

8 Jul 2014  | 
Petar Španja
Fetching content in eZ Publish 5 using Search service

This article will cover what is, in my opinion, one of the coolest and most important features of eZ Publish 5. I’m not saying that, for example, HTTP caching is not cool or important, but if you don’t know how to find the stuff you need there will hardly be something to cache, right? So cover your bases.

Old and new

If you have any eZ Publish Legacy Stack experience, search roughly corresponds to the old fetch functions. In contrast to search in the new stack, they were not used only for finding content (they were also used for almost anything else) and it was possible to use them in templates. You also had quite a number of different ones, intended to be used in different cases, and each with their own set of parameters. At that, those parameters were flat, that is they did not provide for more complex expressions. You had to code extended attribute filters to achieve something not possible out of the box, and if you had experience with these, you also know it was not always fun.

The search in the new stack looks to improve on all of that.

Query object

A query object is a value object, and a starting point in search. You will create Query objects to specify parameters of your search, like search criteria, sort clauses and pagination, then pass them to one of the SearchService methods to receive back the result. Let’s take a look at some of the properties.

  • $query
    This holds the search criteria which will affect scoring of the search results (for the search backend that supports it - read Solr).
  • $filter
    This one holds the search criteria which will not affect scoring of the search results. Usually search backend which makes this distinction provides filter condition caching. So you may want to think a bit here and combine this one with $query property in a smart way.
  • $sortClauses
    Holds sort definitions.
  • $limit
    Limits the number of search results returned.
  • $offset
    Offsets the search results returned. Combine this with $limit to achieve pagination.

So how to build your search condition?

Criterion

The Criterion is the basic building block of your search criteria. eZ Publish provides a number of these out of the box, and they should be sufficient to cover the majority of use cases. If you still find the need for something special, you can implement a custom Criterion. Out of the scope of this post, but if you are looking to go for it, check out Netgen’s Tags bundle for a good example on how it is done.

Few of the provided criteria are:

  • ContentId, ContentTypeIdentifier, SectionId
  • LogicalAnd, LogicalOr, LogicalNot
  • Field, FullText, MapLocationDistance

Some of these, like ContentId, SectionId and ContentTypeIdentifier, relate to the inner model of data in eZ Publish. These are typically used to filter search results. For example, if you know the id of the Content you need to find. Or if you want to find all Content in a specific section, or maybe all Content of a ContentType:

use eZ\Publish\API\Repository\Values\Content\Query\Criterion;
 
// Content with id 123
$myContentCriterion = new Criterion\ContentId( 123 );
 
// Content in Section with id = 1 ("Standard" section)
$standardSectionContent = new Criterion\SectionId( 1 );
 
// Content of the "article" type
$articles = new Criterion\ContentTypeIdentifier( "article" );

One special group of criteria are logical criteria. These do not carry search conditions by themselves, but are used to combine or affect other criteria and thus create more complex expressions:

use eZ\Publish\API\Repository\Values\Content\Query\Criterion;
use eZ\Publish\API\Repository\Values\Content\Query\Criterion\Operator;
 
// Content that is not of the "article" Type
$noArticlesPlease = new Criterion\LogicalNot(
    new Criterion\ContentTypeIdentifier( "article" )
);
 
// Content of the Type "article", modified in last 24 hours
$recentArticles = new Criterion\LogicalAnd(
    array(
        new Criterion\ContentTypeIdentifier( "article" ),
        new Criterion\DateMetadata(
            Criterion\DateMetadata::MODIFIED,
            Operator::GTE,
            time() - 86400
        )
    )
);
 
// Content of the "article" or "blog_post" Type
$articlesAndBlogPosts = new Criterion\LogicalOr(
    array(
        new Criterion\ContentTypeIdentifier( "article" ),
        new Criterion\ContentTypeIdentifier( "blog_post" )
    )
);

So far pretty simple. One particular thing not obvious in examples above is input polymorphism. The last example could also be written like this:

use eZ\Publish\API\Repository\Values\Content\Query\Criterion;
 
$articlesAndBlogPosts = new Criterion\ContentTypeIdentifier(
    array(
        "articles",
        "blog_posts"
    )
);

Providing an array as the input value to the ContentTypeIdentifier criterion implies the IN operator. Then the interpretation of the above is: all Content of the given Type that is in the given array of identifiers. In the case when a string is provided, as in the preceding example, the EQ (equals) operator will be used. Sometimes, like in the cases just described, the operator will be implied by the type of the input. But for some criteria you will need to define the operator explicitly. Not to go over the specifics of each one here, check out the code on Github.

It is also possible to search for the actual contents of the Content you created. For this you will use mostly Field and FullText criteria. The Field criterion is useful if you know the specific field that you want to search on. For example, you want to find a blog post with the exact title "Search in eZ Publish 5":

use eZ\Publish\API\Repository\Values\Content\Query\Criterion;
use eZ\Publish\API\Repository\Values\Content\Query\Operator;
 
$theStuff = new Criterion\LogicalAnd(
    new Criterion\Field( "title", Operator::EQ, "Search in eZ Publish 5" ),
    new Criterion\ContentTypeIdentifier( "blog_post" )
);

On the other hand, FullText will search on all indexed text of the Content, which may include the TextLine, TextBlock, XmlText fields and possibly others in the future:

use eZ\Publish\API\Repository\Values\Content\Query\Criterion;
 
$theStuff = new Criterion\FullText( "Search in eZ Publish 5" );

You can expect the usual stuff valid for full text search here:

  • If multiple words are provided an AND query is performed.
  • OR queries are not (yet) supported.
  • Simple wildcards are supported. If an asterisk (*) is used at the end or beginning of a word this is translated into a wildcard query. Thus "fo*" would match "foo" and "foobar", for example.

Fields that you can not search on using Field criterion

The Field Criterion is usable for fields that store simple value. For example integer, string, float, a list of country codes and so on.
Some field types store data in structures external to the core eZ Publish structures and therefore require special handling. Moreover, the data they store can require an interface more complex than the one provided by the Field criterion.

A good example of this is searching on data stored by the MapLocation field type. MapLocation stores its data (geographical coordinates) in external structures. Also, aside from providing field definition identifier, operator and distance values, it requires coordinates of a geographical location to be used as a distance reference. For these reasons a dedicated MapLocationDistance criterion is implemented.

Lets assume you have some Content of a Type "pub", with a MapLocation field "location". You also find yourself at coordinates $x and $y. How hard would it be to create criteria for all pubs in a 1.5km radius from your current location? It is not hard at all:

use eZ\Publish\API\Repository\Values\Content\Query\Criterion;
use eZ\Publish\API\Repository\Values\Content\Query\Operator;
 
// Pubs near my location
$pubs = new Criterion\LogicalAnd(
    array(
        new Criterion\ContentTypeIdentifier( "pub" ),
        new Criterion\MapLocationDistance(
            "location",
            Operator::LTE,
            1.5,
            $x,
            $y
        )
    )
);

Of course it makes no sense to check the nearby pubs in a random order. You will want to sort the result set and find the nearest one.

SortClause

Same as Criterion being the basic building block of search criteria, SortClause is the basic building block of search result ordering definition. In simplest cases you will need to provide only the sort direction:

use eZ\Publish\API\Repository\Values\Content\Query;
use eZ\Publish\API\Repository\Values\Content\Query\SortClause;
 
// Sort Content by its id, ascending
$sortClause = new SortClause\ContentId( Query::SORT_ASC );

For some SortClauses you will need to be more specific. For example in addition to direction, Field SortClause needs to specify Content Type and FieldDefinition identifier:

use eZ\Publish\API\Repository\Values\Content\Query;
use eZ\Publish\API\Repository\Values\Content\Query\SortClause;
 
// Sort by article title, descending
$sortClause = new SortClause\Field( "article", "title", Query::SORT_DESC );

This will work fine if the "title" field is not translatable. If it is, you will also need to provide a language code, because the SortClause would be ambiguous otherwise:

use eZ\Publish\API\Repository\Values\Content\Query;
use eZ\Publish\API\Repository\Values\Content\Query\SortClause;
 
// Sort by article title in English language, descending
$sortClause = new SortClause\Field( "article", "title", Query::SORT_DESC, "eng-GB" );

It is also possible to define multiple SortClauses and provide them to the Query as an array:

use eZ\Publish\API\Repository\Values\Content\Query;
use eZ\Publish\API\Repository\Values\Content\Query\SortClause;
 
$sortClauses = array(
    new SortClause\ContentTypeIdentifier( Query::SORT_ASC ),
    new SortClause\ContentId( Query::SORT_ASC )
);

When an array of SortClauses is given, they will be applied in succession. This means the order of their appearance in the array is significant. For example, you searched for all articles and blog posts, and these were articles and blog posts existing in the database:

  1. Article 1: Content Type identifier "article", Content id 10
  2. Article 2: Content Type identifier "article", Content id 11
  3. Blog post 1: Content Type identifier "blog_post", Content id 12
  4. Blog post 2: Content Type identifier "blog_post", Content id 13

If you used SortClauses from the previous example, they will be sorted in the same order as given above. But if you change ContentId SortClause (second one) to sort in descending order, you’d get this:

  1. Article 2
  2. Article 1
  3. Blog post 2
  4. Blog post 1

On the other hand, if you only changed ContentTypeIdentifier to sort in descending order, this would be returned:

  1. BlogPost 1
  2. Blog post 2
  3. Article 1
  4. Article 2

You get the idea? It behaves pretty much the same as sorting in a relational database.
Finally, let’s write the sort clause for our pubs of interest. Similar to the corresponding Criterion, it will need the ContentType identifier, the FieldDefinition identifier and reference coordinates:

use eZ\Publish\API\Repository\Values\Content\Query;
use eZ\Publish\API\Repository\Values\Content\Query\SortClause;
 
$sortPubs = new SortClause\MapLocation(
    array(
        "pub",
        "location",
        $x,
        $y,
        Query::SORT_ASC
    )
);

Concrete example

So far you have seen how to build search criteria and sort clauses. Now let’s connect all that in a full example:

use eZ\Publish\API\Repository\Values\Content\Query;
use eZ\Publish\API\Repository\Values\Content\Query\Criterion;
use eZ\Publish\API\Repository\Values\Content\Query\SortClause;
 
$query = new Query();
$query->filter = new Criterion\LogicalAnd(
    array(
        new Criterion\ContentTypeIdentifier( "pub" ),
        new Criterion\MapLocationDistance(
            "location",
            Operator::LTE,
            1,
            $x,
            $y
        )
    )
);
$query->sortClauses = array(
    new SortClause\MapLocationDistance(
             "pub",
             "location"
             $x,
             $y,
             Query::SORT_ASC
    )
);
$query->limit = 10;
$query->offset = 0;
 
$searchResult = $searchService->findContent( $query );

And we finally have a result.

Search result

This again is a value object of SearchResult class, holding the search hits and also some other properties:

  • $searchHits
    Holds the actual search hits, which are special value objects
  • $totalCount
    Holds the total count of results found

You would use it like this:

$searchResult = $searchService->findContent( $query );
 
echo "Found total of " . $searchResult->count . " pubs";
 
/** @var \eZ\Publish\API\Repository\Values\Content\Content */
$nearestPub = $searchResult->searchHits[0]->valueObject;

So the $valueObject property of a SearchHit will hold the Content that is found. But if you look at the code of the SearchHit class, you will find that the $valueObject property is not hinted as Content, but as a more general ValueObject class. The reason for this is that Content is not the only thing you can search for.

Types of search

So far we were searching for Content, but you can also search for Locations. Most of the things written above about the Content search are also valid for the Location search, the only difference is you will need to use a different query class and then pass the query to the dedicated method of the SearchService:

use eZ\Publish\API\Repository\Values\Content\Query\Criterion;
use eZ\Publish\API\Repository\Values\Content\LocationQuery;
 
$query = new LocationQuery();
$query->filter = new Criterion\LocationId( $locationId );
 
$searchResult = $searchService->findLocations( $query );

There are also a few other things you should be aware of in this context. Since one Location always has exactly one Content, you can use all Criterions and SortClauses when searching for Locations. However, Content can have none, one or multiple Locations. This makes some criteria and sort clauses unusable for Content search as their meaning would be ambiguous in case when Content has multiple Locations.

These are Criterion classes that can’t be used with the Content search, found under eZ\Publish\API\Repository\Values\Content\Query\Criterion\Location namespace:

  • Depth
  • IsMainLocation
  • Priority

And these are the SortClauses that can’t be used with the Content search, basically all that are found under eZ\Publish\API\Repository\Values\Content\Query\SortClause\Location namespace:

  • Depth
  • Id
  • IsMainLocation
  • Path
  • Priority
  • Visibility

SearchService will throw you a nice descriptive exception if you ignore the above.

Now let’s take a look at the available SearchService methods:

  • SearchService::findContent( Query $query )
    Finds Content objects for the given Query. Returns SearchResult object.
  • SearchService::findSingle( Criterion $criterion )
    Finds and returns a single Content object. For a single Content object Query object is not needed, so Criterion is to be provided directly. NotFoundException will be thrown if the Content is not found.
  • SearchService::findLocations( LocationQuery $query )
    Finds Location objects for the given LocationQuery. Returns SearchResult object.

The (un)expected

Assume you have two Locations A and B. While Location A is visible, Location B is hidden. What would you expect from a Content query like the following?

use eZ\Publish\API\Repository\Values\Content\Query\Criterion;
use eZ\Publish\API\Repository\Values\Content\Query;
 
$query = new Query(
    array(
        'filter' => new Criterion\LogicalAnd(
            array(
                new Criterion\LocationId( $locationBId ),
                new Criterion\Visibility( Criterion\Visibility::VISIBLE ),
            )
        )
    )
);
 
$searchResult = $searchService->findContent( $query );

Most people would probably intuitively expect that no Content is found, because it seems like we queried for a Content with a visible Location B. However, this is not the case. The Criterion objects in a query are interpreted separately, so the above is really a query for Content that meets the following conditions:

  • has Location B
  • has a Location that is visible

Now Content would be found because both conditions are satisfied: it has Location B and it also has a Location that is visible (Location A). This principle does not apply only on the case above, it is valid in general. So how would one proceed with search for a Content with a visible Location X? In this case the Location search should be used instead:

use eZ\Publish\API\Repository\Values\Content\Query\Criterion;
use eZ\Publish\API\Repository\Values\Content\LocationQuery;
 
$query = new LocationQuery(
    array(
        'filter' => new Criterion\LogicalAnd(
            array(
                new Criterion\LocationId( $locationBId ),
                new Criterion\Visibility( Criterion\Visibility::VISIBLE ),
            )
        )
    )
);
 
$searchResult = $searchService->findLocations( $query );

As the Location search looks up (ahem) Locations, all of its Location properties related criteria necessarily apply on a Location. Therefore the same criteria with a Location search will return no hits, because Location B is not visible.

Mandatory caveats

There will probably always be some of these. eZ Publish 5 at the moment uses Legacy Storage Engine, which stores data in a relational database. Therefore most of the search criteria and sort clauses will introduce additional subselect or join on the main SQL query. Now consider that all permissions are automatically added to the search criteria to ensure the access control and it should be easy to imagine that in more complex cases the thing will not exactly fly.

You might encounter performance issues particularly if your permissions use a lot of Node and Subtree limitations. In this case Location search will generally perform better than Content search.

All of that is actually to be expected - search in eZ Publish 5 enables quite complex things and running it on top of Legacy Storage Engine is only a stopgap solution until Solr Storage is made ready. In eZ Publish 5.3 this is not yet the case, it lacks Location search and some other important things. Therefore it is not endorsed, though it is possible to enable it. If you are into this, the steps to do it are following:

  • Register EzPublishSolrBundle
  • Configure legacy_solr as your storage engine
  • Start Solr server
  • Execute the provided command to index your data:
    $ php ezpublish/console ezpublish:solr_create_index

And you should be ready to give it a go. In coming months you can expect things to improve on the Solr Storage front, so do keep an eye on it.

And that’s all for now.

Comments

blog comments powered by Disqus

Short backstory of our blog: Sharing our experience from various web projects based on eZ Publish / eZ Platform, Symfony, PHP, HTML5, MySQL, jQuery, CSS, etc. and focusing on solving the problems we encountered.

Subscribe to RSS feed

Tags