Lucjan Wilczewski's blog

Me(e)t Magento Polska - it was worth it

It is already a week since Baobaz team has returned from first MeetMagento event in Poland, excitement wore off a little, yet we still are convinced - it was worth it!

Baobaz na Meet Magento 2012 w Warszawie

Magento Community

At first Meet Magento in Poland gathered around 100 people from all around Europe. Thomas Fleck, from NetResearch, said it beautifully - I came to Poland from Germany, to listen to a swiss company presentation in english on their experience with integration Magento for Ghana government institution.

Variety of attendants experience created great opportunity to exchange knowledge during coffee breaks between speeches, and were continued even at After Party on 40th floor of Mariott Panorama Bar. Additionally - what other conditions would have to be met to gather at one place several former Magento Core Team members and have opportunity to ask them these difficult questions you have kept in your head for months of Magento integration? Oppa Gagnam Style bros! :)

Experienced speakers

Meet Magento gave opportunity to listen to 22 speeches from 23 speakers (including Piotra Kamińskiego, Damiana Luszczymak, Ivana Chepurnyi i Vinai Kopp), who have shared both their experience with Magento integration and solutions for e-commerce market (payment gateways, email marketing solutions). Thanks to presence of Magento representatives it was also opportunity to listen how Magento evolved in last years, how Magento Developer Certification looks like and where is Magento 2.0 today.

Those who impatiently wait for Magento 2.0 there is a good and bad news. Magento takes effort to make new version well tested and documented, therefore we should not expect it soon, especially core parts are still being modified - if you go for new website during next year, stay with Magento 1.x. In the meantime we can count on new Service Pack editions for Magento 1.x, and the first version of Magento should be supported for several years to come after the release of Magento 2.0.

Organization

Big congratulations to the organizers from Snowdog.pl team -  Kuba Zwoliński and Marta Molińska - who, regardless it was their debut, have organized this event in professional way in nice atmosphere. From the entrance to the hotel kind girls wearing orange shirts with logo of the event guided everybody to the right floor and room. Speaches performed in polish were simultanously translated to english and at the same time were available on live stream.

Great job! Impatiently, and against mayan prophecies, we wait for MeetMagento 2013!

We are really happy that we had the opportunity to support this event as a sponsor and moreover we would like to thank the auditory for pleasant reception of our presentation.

Setting up Magento with multiple websites or stores

There are many tutorials how to set Magento to work with multiple stores and make different domains point at each store. Since release of Magento CE 1.4.0.0-beta1 and Magento EE 1.6.0.0 it is even more easy to do.

Magento evolves

Solutions used in previous versions required developer to modify index.php file to handle different domains pointing at different stores. New index php contains following code:

$mageRunCode = isset($_SERVER['MAGE_RUN_CODE']) ? $_SERVER['MAGE_RUN_CODE'] : '';
$mageRunType = isset($_SERVER['MAGE_RUN_TYPE']) ? $_SERVER['MAGE_RUN_TYPE'] : 'store';

Mage::run($mageRunCode, $mageRunType);

So it checks two environmental variables and use them to start Magento runing. What does it give you? You can set now which store/website is supposed to be running under selected domain directly in virtual host definition or even htaccess.

VirtualHost solution

To benefit from this little piece of code it is enough for you to add following lines within your VirtualHost definition:

SetEnv MAGE_RUN_CODE "base" # put here your website or store code
SetEnv MAGE_RUN_TYPE "website" # put here 'website' or 'store'

.htaccess solution

If you have no access to virtual host definitions, you can still try to use .htaccess for that, putting within following lines:

SetEnvIf Host .*yourhost.* MAGE_RUN_CODE=base
SetEnvIf Host .*yourhost.* MAGE_RUN_TYPE=website

Where .*yourhost.* is an regex expression matching the domain for which you want to set environmental variable.

So now you are capable of setting up your Magento multiple stores website without messing up with the core. Good luck.

Customizing Magento Dataflow - import of custom data.

The flexibility of Magento Dataflow module lies in fact you can easily create your own adapters, parsers, mappers and apply them to your specific dataflow needs.

The basic case you may wonder how to do, is import of data for your custom module. Let's do this by example. Imagine you need to display on you e-shop list of stores, you have created custom module, table in database and datamodel part, all you need now is to populate this table with data you have within csv file.

Read the file

First you have to read the file. As you already know (if you read Magento Dataflow - Default Adapters [Part 2]) you can use dataflow/convert_adapter_io adapter for this.

<action type="dataflow/convert_adapter_io" method="load">
    <var name="type">file</var>
    <var name="path">var/import</var>
    <var name="filename"><![CDATA[stores.csv]]></var>
    <var name="format"><![CDATA[csv]]></var>
</action>

Parse the file content

Now that you have read the file content, you should parse it using dataflow/convert_parser_csv.

<action type="dataflow/convert_parser_csv" method="parse">
    <var name="delimiter"><![CDATA[,]]></var>
    <var name="enclose"><![CDATA["]]></var>
    <var name="fieldnames">true</var>
    <var name="store"><![CDATA[0]]></var>
    <var name="number_of_records">1</var>
    <var name="decimal_separator"><![CDATA[.]]></var>
</action>

Process rows of data

Now the custom part of this process. Within your custom module you have to create custom adapter that will create row in database for each processed row of parsed file. Within your module root directory create file ./Model/Convert/Adapter/Store.php of this content:

class Baobaz_Offer_Model_Convert_Adapter_Offer
    extends Mage_Dataflow_Model_Convert_Adapter_Abstract
{
    protected $_storeModel;

    public function load() {
      // you have to create this method, enforced by Mage_Dataflow_Model_Convert_Adapter_Interface
    }

    public function save() {
      // you have to create this method, enforced by Mage_Dataflow_Model_Convert_Adapter_Interface      
    }

    public function getStoreModel()
    {
        if (is_null($this->_storeModel)) {
            $storeModel = Mage::getModel('baobaz_store/store');
            $this->_storeModel = Mage::objects()->save($storeModel);
        }
        return Mage::objects()->load($this->_storeModel);
    }

    public function saveRow(array $importData)
    {
      $store = $this->getStoreModel();

      if (empty($importData['code'])) {
          $message = Mage::helper('catalog')->__('Skip import row, required field "%s" not defined', 'code');
          Mage::throwException($message);
      }
      else
      {
        $store->load($importData['code'],'code');
      }

      $store->setCode($importData['code']);
      $store->setName($importData['name']);

      $store->save();

      return true;

    }
}

Now when you have this file created you can modify a little bit the declaration of parser adding adapter and method variables: 

<action type="dataflow/convert_parser_csv" method="parse">
    <var name="delimiter"><![CDATA[,]]></var>
    <var name="enclose"><![CDATA["]]></var>
    <var name="fieldnames">true</var>
    <var name="store"><![CDATA[0]]></var>
    <var name="number_of_records">1</var>
    <var name="decimal_separator"><![CDATA[.]]></var>
    <var name="adapter">baobaz_store/convert_adapter_store</var>
    <var name="method">saveRow</var>
</action>

Having this done you should have your xml definition of custom dataflow profile looking like that:

<action type="dataflow/convert_adapter_io" method="load">
    <var name="type">file</var>
    <var name="path">var/import</var>
    <var name="filename"><![CDATA[stores.csv]]></var>
    <var name="format"><![CDATA[csv]]></var>
</action>
<action type="dataflow/convert_parser_csv" method="parse">
    <var name="delimiter"><![CDATA[,]]></var>
    <var name="enclose"><![CDATA["]]></var>
    <var name="fieldnames">true</var>
    <var name="store"><![CDATA[0]]></var>
    <var name="number_of_records">1</var>
    <var name="decimal_separator"><![CDATA[.]]></var>
    <var name="adapter">baobaz_store/convert_adapter_store</var>
    <var name="method">saveRow</var>
</action>

You can now enjoy your custom dataflow

Magento Dataflow - standard parsers and mapping values [part 4]

As promised in Magento Dataflow - Default Adapters [Part 2] today I will write about standard parsers in Magento DataFlow module and mapping values with mappers.

  1. Parser definition

    Parsers are responsible for transforming data from. Parser's interface Mage_Dataflow_Model_Convert_Parser_Interface defines two methods required in each parser: parse() and unparse(). Definition of parser within profile's xml can be as simple as:

    <action type="dataflow/convert_parser_serialize" method="parse" />

    Similar to adapter we define action tag with two attributes: type, which tells which class we want to use and this class's method we want to call. We can also call parser passing variables within action tag body as you will see below.

  2. Standard parsers

    Magento DataFlow includes few standard parsers which you can find in app/code/core/Dataflow/Model/Convert/Parser.

    The simplest of standard parsers is dataflow/convert_parser_serialize (Mage_Dataflow_Model_Convert_Parser_Serialize) which doesn't require any variables passed. It requires though that any of previous actions set data within profile's container. Method parse() unserialize data stored within profile's container and replace it with the result. Method unparse() do the opposite, so it serializes data stored within profile's container and replace it with the result.

    One of most often used standard parsers is dataflow/convert_parser_csv which allows transforming from (with method parse()) or to (with method unparse()) CSV file. Example of definition:

    <action type="dataflow/convert_parser_csv" method="parse">
        <var name="delimiter"><![CDATA[,]]></var>
        <var name="enclose"><![CDATA["]]></var>
        <var name="fieldnames">true</var>
        <var name="store"><![CDATA[0]]></var>
        <var name="decimal_separator"><![CDATA[.]]></var>
        <var name="adapter">catalog/convert_adapter_product</var>
        <var name="method">parse</var>
    </action>

    This parser requires that you call some io adapter prior to its execution (using for example dataflow/convert_adapter_io to read some csv file) if you want to call method parse. If you want to store data into CSV file you have to do both - call any action that will set data within profile's container prior to parser execution and call io adapter after parser execution to store data within file.

    Following variables will allow you to customize csv file parsing:

    • delimiter - defines delimiter used in csv file; defaults to comma (,) character
    • enclose - defines what character is used to enclose data values; defaults to empty character
    • escape - defines escape character for csv file; defaults to \\
    • decimal_separator - defines decimal separator sign
    • fieldnames - if set to true, it is assumed first row of csv file contains field names; if set to false map variable is used
    • map - defines fieldnames for files where first row doesn't contain fieldnames; to see how to define a map take a look at section of this article related to mapping values
    • adapter - tells which adapters method should be called on each row
    • method - tells which method of adapter should be called on each row; defaults to saveRow

    All variables defined within parser's action body are passed to the defined adapter, so if you need to pass something to it, you can simply set required variable within parser's action body.

    Last of standard parsers included within DataFlow module is dataflow/convert_parser_xml_excel (Mage_Dataflow_Model_Convert_Parser_Xml_Excel), which converts data from and to Excel xml file. Example of definition:

    <action type="dataflow/convert_parser_xml_excel" method="unparse">
        <var name="single_sheet"><![CDATA[products]]></var>
        <var name="fieldnames">true</var>
    </action>

    Use requirements are the same as for dataflow/convert_parser_csv.

    Following variables will allow you to customize csv file parsing:

    • fieldnames - if set to true, it is assumed first row of csv file contains field names; if set to false map variable is used
    • map - defines fieldnames for files where first row doesn't contain fieldnames
    • single_sheet - tells if parsed should be one sheet or all; should contain name of the sheet to be parsed
    • adapter - tells which adapters method should be called on each row
    • method - tells which method of adapter should be called on each row; defaults to saveRow
  3. Standard customer and product entity parsers

    For most commonly exchanged entities - customer and product - Magento provides also standard parsers: customer/convert_parser_customer (Mage_Customer_Model_Convert_Parser_Customer) and catalog/convert_parser_product (Mage_Catalog_Model_Convert_Parser_Product). Both inherit from Mage_Eav_Model_Convert_Adapter_Entity.

    Since standard adapter's load() methods calls result with array of solely entities' id values it is required to call parser's unparse method, if we want to get more related data. Both parsers take this arrays and for each entity parse its data variable content, ignore system fields, objects, non-attribute fields and create an associative array from the rest. Additionally product parser add to the array result of parsing product related stock item object, and customer parser - result of parsing shipping and billing addresses and information about newsletter subscription.

    Both entities parsers have deprecated parse() methods, since their function is now mostly done by parser actions with standard adapter methods called within parser's context. Example of product parser definition, parsing only products from selected store:

    <action type="catalog/convert_parser_product" method="unparse">
        <var name="store"><![CDATA[1]]></var>
    </action>

  4. Mapping values

    DataFlow module provides also a mapper concept - class with map() method that is responsible for mapping processed fields from one to another. The definition of mapper looks like that for example:

    <action type="dataflow/convert_mapper_column" method="map">
        <var name="map">
            <map name="category_ids"><![CDATA[categorie]]></map>
            <map name="sku"><![CDATA[reference]]></map>
            <map name="name"><![CDATA[titre]]></map>
            <map name="description"><![CDATA[description]]></map>
            <map name="price"><![CDATA[prix]]></map>
            <map name="special_price"><![CDATA[special_price]]></map>
            <map name="manufacturer"><![CDATA[marque]]></map>
        </var>
        <var name="_only_specified">true</var>
    </action>

    Again we have action tag with two attributes: type set as mapper class alias and method that is called to do the mapping. Mapper dataflow/convert_mapper_column is a standard mapper you can find in Magento DataFlow module within app/code/core/Dataflow/Model/Mapper/ folder, and its purpose is to map one array into another with changing the name and posibility to limit fields in result. Map's tag attribute name tells which field name should be replaced in new array by field named like the content of map's tag. If named field doesn't exist in source array, value for target's array field is set to null. Variable _only_specified tells if only fields specified in map definition should be in the resulting array.

This article would be the one that close standard features of DataFlow module and basics of its usage.

Most visited products - Issue with performance and flat catalog

Ever had a need to get most visited products within Magento? Most solutions and modules available use a reports/product_collection and addViewsCount() method for this. It does the job until you are in a need for performance or want to enable flat product's catalog.

Flat catalog issue

What is the problem? Using flat catalog change the way how products are read from database. Instead of complex query with multiple joins to get attributes of product build in EAV model, with flat catalog you need no joins to get attributes. And you do query different table.

While for Mage_Catalog catalog/product_collection it was kept in mind that there exists option of flat catalog for Magento, it was forgotten within Mage::getModel('reports/product_collection')->addViewsCount() method. Though you can fix this issue quite easily rewriting collection class, I would like recommend something different.

Performance problem

Mage_Report to get for you most visited products counts occurences for product view event within report_event table. This table is used to store 6 types of events and grows in size very fast. Having a website with 1000 views per hour, report_event table with 400 000 of records, any query using this table was a site performance killer.

Solution

The true problem of using reports/product_collection for calculating visit counts per product is that we mix up two things - events and reports. Solution for this I have chosen is creation of another table with calculated values of views for each product, table which is populated with cron based task making calculation using report_events, and packing this all together into one module.

Turning on modification of products views calculation and flat product catalog in this particular case decreased load of database 10 times.

Magento Events

When it comes to extending Magento core functionality you have two options - override core classes or use event-driven architecture. The major disadvantage of first is that you can override class only once, so if you want to override it in multiple modules you are soon going to find yourself in dependency hell. Event-driven architecture allows you to keep loose coupling without losing the flexibility of extending Magento modules.

When you want to use Magento event-driven architecture you must know basically two things - how to dispatch an event and how to catch it.

Dispatching events

Within Magento you can dispatch an event as simple as by calling Mage::dispatchEvent(...) method, for example:

 Mage::dispatchEvent('custom_event', array('object'=>$this));

This methods accepts two parameters - event unique identifier and associative array of data that is set as Varien_Event_Observer object data, so in fact passed to event observers.

Catching events

Catching events is a little bit more complex than dispatching. You have to use existing custom module or create a new one. In the minimal case the module file tree should look like this:

  

Within config.xml you have to add a definition of event observer. Which of the main config.xml file sections (frontend, adminhtml) should contain this definition depends on the scope you want your observer to work on. Here is the example of definition:

<events>
      <custom_event> <!-- identifier of the event we want to catch -->
        <observers>
          <custom_event_handler> <!-- identifier of the event handler -->
            <type>model</type> <!-- class method call type; valid are model, object and singleton -->
            <class>baobazacustommodule/observer</class> <!-- observers class alias -->
            <method>customObserverAction</method>  <!-- observer's method to be called -->
            <args></args> <!-- additional arguments passed to observer -->
          </custom_event_handler>
        </observers>
      </custom_event>
</events>

Xml above should be self-explanatory. I will just explain that for type model and object are equal in behavior and mean that object of a class will be instantiated using Mage::getModel(...) method, and singleton means it will be instantiated using Mage::getSingleton(...) method.

Observer.php file should contain relevant observer class. There is no interface nor need to extend any class for observer classes. The method though should accept one parameter which is the object of Varien_Event_Observer class. This object is the link between dispatcher and event handler. It inherits from Varien_Object so has all required getters handled magically. For example:

class Baobaz_ACustomModule_Model_Observer
{
  public function customObserverAction(Varien_Event_Observer $observer)
  {
    $object = $observer->getEvent()->getObject(); // we are taking the item with 'object' key from array passed to dispatcher
    $object->doSomething();

    return $this;
}

Default events

Magento implements lot of events. You can find list of them here. What you may miss reading this list, and what was spotted on MageDev blog, Mage_Core_Model_Abstract by default dispatch some special events. Those are:
 

event identifier event parameters
model_save_before 'object'=>$this
{_eventPrefix}_save_before {_eventObject}=>$this
model_save_after 'object'=>$this
{_eventPrefix}_save_after {_eventObject}=>$this
model_delete_before 'object'=>$this
{_eventPrefix}_delete_before {_eventObject}=>$this
model_delete_after 'object'=>$this
{_eventPrefix}_delete_after {_eventObject}=>$this
model_load_after 'object'=>$this
{_eventPrefix}_load_after {_eventObject}=>$this

 

{_eventPrefix} means the value of $_eventPrefix variable and {_eventObject} means the value of $_eventObject variable. All classes inheriting from Mage_Core_Model_Abstract should override these variables to create specific events being dispatched. For example for catalog cagetory these variables take following values: $_eventPrefix = 'catalog_category';  $_eventObject = 'category';

Magento Dataflow - Optimized Product Import [Part 3]

Magento Dataflow module comes with standard product adapter (see Magento Dataflow - Default Adapters [Part 2]). Sometimes though, default solution is not enough and you may want to create your own adapter processing products.

Creating own adapter is not hard, but if you forget two lines of code, you may be very surprised with its performance. These two lines you should add before calling $product->save():

$product->setIsMassupdate(true);
$product->setExcludeUrlRewrite(true);

First line sets $data variable 'is_massupdate', which can be later checked to save some postprocessing actions time. Some observers watching for catalog_product_save_after event check this value (i.e. CatalogRule module's Observer, which skips action of applying catalog rules on products if  $product->getIsMassupdate() returns  true )

Second line also sets $data variable 'exclude_url_rewrite', which is used by afterSave method of Mage_Catalog_Model_Product_Attribute_Backend_Urlkey to check if catalog url rewrite cache should be refreshed.

Those two lines allow you to save few seconds per each row of processed products, so keep in mind to not forget about them

Magento Dataflow - Default Adapters [Part 2]

"Magento DataFlow - Data Exchange Made Flexible" article introduced global concept of data exchange framework implemented in Magento. Today I would like to tell more about default adapters implemented in DataFlow module.

  1. Adapter definition

    Adapters are responsible for pluging into an external data resource and fetching requested data or saving given data into data resource. For this purpose all adapters implement interface Mage_Dataflow_Model_Convert_Adapter_Interface which contains two methods: load() and save(). Data exchange concept introduced in DataFlow module use adapters in 3 contexts:

    • to load data from resource - using load() method
    • to save data to resource - using save() method
    • to process one parsed row - when defined as adapter/method pair of variables of parser

    For first two contexts adapter's xml definition looks like that:

    <action type="dataflow/convert_adapter_io" method="load">
        ...
    </action>

    Action tag has two parameters: type and method. Type tells as which adapter class is to be used in this action. It is defined using its alias. Method tells us which method of this adapter class action should call. As mentioned before, by default there are two available methods: load and save. Children of action tag define variables which are parameters used when executing adapter's method. Variables are defined like in the example below:

    <action type="dataflow/convert_adapter_io" method="load">
        <var name="type">file</var>
        <var name="path">var/import</var>
        <var name="filename"><![CDATA[products.csv]]></var>
        <var name="format"><![CDATA[csv]]></var>
    </action>

  2. Magento DataFlow default adapters

    Magento DataFlow module contains few default adapter classes which you can find in app/code/core/Dataflow/Model/Convert/Adapter. Not all of them have yet implemented load() and save() methods.

    For common case of reading data from or saving data to local or remote file you will use dataflow/convert_adapter_io (Mage_Dataflow_Model_Convert_Adapter_Io).

    Following variables will allow you to define local/remote file as data source:

    • type - defines type of io source we want to process. Valid values: file, ftp
    • path - defines relative path to the file
    • filename - defines data source file's name
    • host - for ftp type it defines the ftp host
    • port - for ftp type it defines the ftp port; if not given, default value is 21
    • user - for ftp type it defines the ftp user, if not given default value is 'anonymous' and password then is 'anonymous@noserver.com'
    • password - for ftp type it defines the ftp user's password
    • timeout - for ftp type it defines connection timeout; default value is 90
    • file_mode - for ftp type it defines file mode; default value is FTP_BINARY
    • ssl - for ftp type if it is not empty, then ftp ssl connection is used
    • passive - for ftp type it defines connection mode; default value is false
  3. Customer and Product adapters

    For most commonly exchanged entities - customer and product - Magento provides default adapters: customer/convert_adapter_customer (Mage_Customer_Model_Convert_Adapter_Customer) and catalog/convert_adapter_product (Mage_Catalog_Model_Convert_Adapter_Product). Both inherit from Mage_Eav_Model_Convert_Adapter_Entity.

    To simply load all customers data for selected store you can use the following xml:

    <action type="customer/convert_adapter_customer" method="load">
        <var name="store">default</var>
    </action>

    Sometimes you may want to not load all customers in database. To help you with this there are following variables valid:

    • filter/firstname - to load only customers with firstname starting with value of this variable
    • filter/lastname - to load only customers with lastname starting with value of this variable
    • filter/email - to load only customers with email starting with value of this variable
    • filter/group - to load only customers from group with id equal to value of this variable
    • filter/adressType - to export only selected addressType; valid values are: both, default_billing, default_shipping
    • filter/telephone - to load only customers with telephone starting with value of this variable
    • filter/postcode - to load only customers with postcode starting with value of this variable
    • filter/country - to load only customers with country iso code equal to value of this variable
    • filter/region - to load only customers with region equal to value of this variable (for US just 2-letter state names)
    • filter/created_at/from - to load only customers created after a date defined as value of this variable
    • filter/created_at/to - to load only customers created before a date defined as value of this variable

    For example:

    <action type="customer/convert_adapter_customer" method="load">
        <var name="store"><![CDATA[0]]></var>
        <var name="filter/firstname"><![CDATA[a]]></var>
        <var name="filter/lastname"><![CDATA[a]]></var>
        <var name="filter/email"><![CDATA[a]]></var>
        <var name="filter/group"><![CDATA[1]]></var>
        <var name="filter/adressType"><![CDATA[default_billing]]></var>
        <var name="filter/telephone"><![CDATA[1]]></var>
        <var name="filter/postcode"><![CDATA[7]]></var>
        <var name="filter/country"><![CDATA[BS]]></var>
        <var name="filter/region"><![CDATA[WA]]></var>
        <var name="filter/created_at/from"><![CDATA[09/22/09]]></var>
        <var name="filter/created_at/to"><![CDATA[09/24/09]]></var>
    </action>

    Same way you can load and filter products loaded from database with following variables:

    • filter/name - to load only products with name starting with value of this variable
    • filter/sku - to load only products with sku starting with value of this variable
    • filter/type - to load only products with type defined as value of this variable; valid values are: simple, configurable, grouped, bundle, virtual, downloadable
    • filter/attribute_set - to load only products with attribute set id equal to value of this variable
    • filter/price/from - to load only products with price starting from value of this variable
    • filter/price/to - to load only products with price up to value of this variable
    • filter/qty/from - to load only products with quantity starting from value of this variable
    • filter/qty/to - to load only products with quantity up to value of this variable
    • filter/visibility - to load only products with visibility id equal to value of this variable
    • filter/status - to load only products with status id equal to value of this variable

    Example:

    <action type="catalog/convert_adapter_product" method="load">
        <var name="store"><![CDATA[0]]></var>
        <var name="filter/name"><![CDATA[a]]></var>
        <var name="filter/sku"><![CDATA[1]]></var>
        <var name="filter/type"><![CDATA[simple]]></var>
        <var name="filter/attribute_set"><![CDATA[29]]></var>
        <var name="filter/price/from"><![CDATA[1]]></var>
        <var name="filter/price/to"><![CDATA[2]]></var>
        <var name="filter/qty/from"><![CDATA[1]]></var>
        <var name="filter/qty/to"><![CDATA[2]]></var>
        <var name="filter/visibility"><![CDATA[2]]></var>
        <var name="filter/status"><![CDATA[1]]></var>
    </action>

Seems a little bit frightening if you see all those id values you have to provide for filters. Fortunatelly for these two entities - customers and products - there is wizard like profile generator that allows you to define filter with simple select boxes.

In next part I will describe use of parsers and adapters in context of parsers.

Magento DataFlow - Data Exchange Made Flexible [Part 1]

One of major features of e-commerce websites is the possibility to share data with offline sale management systems. Magento made data exchange flexible and quite easy with DataFlow module.

Magento DataFlow is a data exchange framework that use four types of components: adapter, parser, maper and validator. At current state of developement validators are not implemented, but are reserved for future use.

Dataflow of data exchange process is defined as XML structure and called profile. Magento provides simple wizard-like tool for generation of some basic import/export profiles operating on products or customers entities. Advanced profiles manager is also provided for advanced users able to create XML defining profile without wizard tool and with need to use more custom dataflow operations related also to other entities.

Adapters are responsible for pluging into an external data resource and fetching requested and filtered data. It can be used for example to get data from: local or remote file, web services, database and more.

For example to load data from csv file you can put in XML profile the following code:

<action type="dataflow/convert_adapter_io" method="load">
    <var name="type">file</var>
    <var name="path">var/import</var>
    <var name="filename"><![CDATA[products.csv]]></var>
    <var name="format"><![CDATA[csv]]></var>
</action>

To load data from remote FTP server you can use same adapter, but with these parameters:

<action type="dataflow/convert_adapter_io" method="load">
    <var name="type">ftp</var>
    <var name="host"><![CDATA[ftp.server.com]]></var>
    <var name="passive">true</var>
    <var name="user"><![CDATA[user]]></var>
    <var name="password"><![CDATA[password]]></var>
    <var name="path">var/import</var>
    <var name="filename"><![CDATA[products.csv]]></var>
    <var name="format"><![CDATA[csv]]></var>
</action>

Parsers are responsible for transforming one data format to another. It can be used for example to convert CSV file content to two-dimmensional array, or opposite.

To parse CSV file content into database product entities you can use this code in your profile:

<action type="dataflow/convert_parser_csv" method="parse">
    <var name="delimiter"><![CDATA[,]]></var>
    <var name="enclose"><![CDATA["]]></var>
    <var name="fieldnames">true</var>
    <var name="store"><![CDATA[0]]></var>
    <var name="number_of_records">1</var>
    <var name="decimal_separator"><![CDATA[.]]></var>
    <var name="adapter">catalog/convert_adapter_product</var>
    <var name="method">parse</var>
</action>

Adapter defined within parser variables as <var name="adapter"> and adapter's method <var name="method"> are responsible for parsing loaded data. In this particular case parser converts data from CSV file content to two-dimmensional array and calls the adapter's method "parse" to process it.

The simplest way of import customization is creating own adapter given as variable within parser definition. In most cases you will need to overwrite one of existing adapters and modify or write your own parsing method (in most cases it will be overwrited saveRow() method)

Mappers are responsible for altering data values from one to another. These are useful for maping one field to another.

In example below source's 'reference' column is mapped into 'sku' column and variable '_only_specified' is set to true, so imported/exported will be only listed columns:

<action type="dataflow/convert_mapper_column" method="map">
    <var name="map">
        <map name="sku"><![CDATA[reference]]></map>
        <map name="name"><![CDATA[name]]></map>
        <map name="price"><![CDATA[price]]></map>
        <map name="qty"><![CDATA[qty]]></map>
    </var>
    <var name="_only_specified">true</var>
</action>

This is just the tip of the iceberg of possibilities Magento DataFlow module offers. Come back later to read more.