magento dataflow

Customizing Magento Dataflow - import of custom data.

The flexibility of Magento Dataflow module lies in fact you can easily create your own adapters, parsers, mappers and apply them to your specific dataflow needs.

The basic case you may wonder how to do, is import of data for your custom module. Let's do this by example. Imagine you need to display on you e-shop list of stores, you have created custom module, table in database and datamodel part, all you need now is to populate this table with data you have within csv file.

Read the file

First you have to read the file. As you already know (if you read Magento Dataflow - Default Adapters [Part 2]) you can use dataflow/convert_adapter_io adapter for this.

<action type="dataflow/convert_adapter_io" method="load">
    <var name="type">file</var>
    <var name="path">var/import</var>
    <var name="filename"><![CDATA[stores.csv]]></var>
    <var name="format"><![CDATA[csv]]></var>
</action>

Parse the file content

Now that you have read the file content, you should parse it using dataflow/convert_parser_csv.

<action type="dataflow/convert_parser_csv" method="parse">
    <var name="delimiter"><![CDATA[,]]></var>
    <var name="enclose"><![CDATA["]]></var>
    <var name="fieldnames">true</var>
    <var name="store"><![CDATA[0]]></var>
    <var name="number_of_records">1</var>
    <var name="decimal_separator"><![CDATA[.]]></var>
</action>

Process rows of data

Now the custom part of this process. Within your custom module you have to create custom adapter that will create row in database for each processed row of parsed file. Within your module root directory create file ./Model/Convert/Adapter/Store.php of this content:

class Baobaz_Offer_Model_Convert_Adapter_Offer
    extends Mage_Dataflow_Model_Convert_Adapter_Abstract
{
    protected $_storeModel;

    public function load() {
      // you have to create this method, enforced by Mage_Dataflow_Model_Convert_Adapter_Interface
    }

    public function save() {
      // you have to create this method, enforced by Mage_Dataflow_Model_Convert_Adapter_Interface      
    }

    public function getStoreModel()
    {
        if (is_null($this->_storeModel)) {
            $storeModel = Mage::getModel('baobaz_store/store');
            $this->_storeModel = Mage::objects()->save($storeModel);
        }
        return Mage::objects()->load($this->_storeModel);
    }

    public function saveRow(array $importData)
    {
      $store = $this->getStoreModel();

      if (empty($importData['code'])) {
          $message = Mage::helper('catalog')->__('Skip import row, required field "%s" not defined', 'code');
          Mage::throwException($message);
      }
      else
      {
        $store->load($importData['code'],'code');
      }

      $store->setCode($importData['code']);
      $store->setName($importData['name']);

      $store->save();

      return true;

    }
}

Now when you have this file created you can modify a little bit the declaration of parser adding adapter and method variables: 

<action type="dataflow/convert_parser_csv" method="parse">
    <var name="delimiter"><![CDATA[,]]></var>
    <var name="enclose"><![CDATA["]]></var>
    <var name="fieldnames">true</var>
    <var name="store"><![CDATA[0]]></var>
    <var name="number_of_records">1</var>
    <var name="decimal_separator"><![CDATA[.]]></var>
    <var name="adapter">baobaz_store/convert_adapter_store</var>
    <var name="method">saveRow</var>
</action>

Having this done you should have your xml definition of custom dataflow profile looking like that:

<action type="dataflow/convert_adapter_io" method="load">
    <var name="type">file</var>
    <var name="path">var/import</var>
    <var name="filename"><![CDATA[stores.csv]]></var>
    <var name="format"><![CDATA[csv]]></var>
</action>
<action type="dataflow/convert_parser_csv" method="parse">
    <var name="delimiter"><![CDATA[,]]></var>
    <var name="enclose"><![CDATA["]]></var>
    <var name="fieldnames">true</var>
    <var name="store"><![CDATA[0]]></var>
    <var name="number_of_records">1</var>
    <var name="decimal_separator"><![CDATA[.]]></var>
    <var name="adapter">baobaz_store/convert_adapter_store</var>
    <var name="method">saveRow</var>
</action>

You can now enjoy your custom dataflow

Magento Dataflow - standard parsers and mapping values [part 4]

As promised in Magento Dataflow - Default Adapters [Part 2] today I will write about standard parsers in Magento DataFlow module and mapping values with mappers.

  1. Parser definition

    Parsers are responsible for transforming data from. Parser's interface Mage_Dataflow_Model_Convert_Parser_Interface defines two methods required in each parser: parse() and unparse(). Definition of parser within profile's xml can be as simple as:

    <action type="dataflow/convert_parser_serialize" method="parse" />

    Similar to adapter we define action tag with two attributes: type, which tells which class we want to use and this class's method we want to call. We can also call parser passing variables within action tag body as you will see below.

  2. Standard parsers

    Magento DataFlow includes few standard parsers which you can find in app/code/core/Dataflow/Model/Convert/Parser.

    The simplest of standard parsers is dataflow/convert_parser_serialize (Mage_Dataflow_Model_Convert_Parser_Serialize) which doesn't require any variables passed. It requires though that any of previous actions set data within profile's container. Method parse() unserialize data stored within profile's container and replace it with the result. Method unparse() do the opposite, so it serializes data stored within profile's container and replace it with the result.

    One of most often used standard parsers is dataflow/convert_parser_csv which allows transforming from (with method parse()) or to (with method unparse()) CSV file. Example of definition:

    <action type="dataflow/convert_parser_csv" method="parse">
        <var name="delimiter"><![CDATA[,]]></var>
        <var name="enclose"><![CDATA["]]></var>
        <var name="fieldnames">true</var>
        <var name="store"><![CDATA[0]]></var>
        <var name="decimal_separator"><![CDATA[.]]></var>
        <var name="adapter">catalog/convert_adapter_product</var>
        <var name="method">parse</var>
    </action>

    This parser requires that you call some io adapter prior to its execution (using for example dataflow/convert_adapter_io to read some csv file) if you want to call method parse. If you want to store data into CSV file you have to do both - call any action that will set data within profile's container prior to parser execution and call io adapter after parser execution to store data within file.

    Following variables will allow you to customize csv file parsing:

    • delimiter - defines delimiter used in csv file; defaults to comma (,) character
    • enclose - defines what character is used to enclose data values; defaults to empty character
    • escape - defines escape character for csv file; defaults to \\
    • decimal_separator - defines decimal separator sign
    • fieldnames - if set to true, it is assumed first row of csv file contains field names; if set to false map variable is used
    • map - defines fieldnames for files where first row doesn't contain fieldnames; to see how to define a map take a look at section of this article related to mapping values
    • adapter - tells which adapters method should be called on each row
    • method - tells which method of adapter should be called on each row; defaults to saveRow

    All variables defined within parser's action body are passed to the defined adapter, so if you need to pass something to it, you can simply set required variable within parser's action body.

    Last of standard parsers included within DataFlow module is dataflow/convert_parser_xml_excel (Mage_Dataflow_Model_Convert_Parser_Xml_Excel), which converts data from and to Excel xml file. Example of definition:

    <action type="dataflow/convert_parser_xml_excel" method="unparse">
        <var name="single_sheet"><![CDATA[products]]></var>
        <var name="fieldnames">true</var>
    </action>

    Use requirements are the same as for dataflow/convert_parser_csv.

    Following variables will allow you to customize csv file parsing:

    • fieldnames - if set to true, it is assumed first row of csv file contains field names; if set to false map variable is used
    • map - defines fieldnames for files where first row doesn't contain fieldnames
    • single_sheet - tells if parsed should be one sheet or all; should contain name of the sheet to be parsed
    • adapter - tells which adapters method should be called on each row
    • method - tells which method of adapter should be called on each row; defaults to saveRow
  3. Standard customer and product entity parsers

    For most commonly exchanged entities - customer and product - Magento provides also standard parsers: customer/convert_parser_customer (Mage_Customer_Model_Convert_Parser_Customer) and catalog/convert_parser_product (Mage_Catalog_Model_Convert_Parser_Product). Both inherit from Mage_Eav_Model_Convert_Adapter_Entity.

    Since standard adapter's load() methods calls result with array of solely entities' id values it is required to call parser's unparse method, if we want to get more related data. Both parsers take this arrays and for each entity parse its data variable content, ignore system fields, objects, non-attribute fields and create an associative array from the rest. Additionally product parser add to the array result of parsing product related stock item object, and customer parser - result of parsing shipping and billing addresses and information about newsletter subscription.

    Both entities parsers have deprecated parse() methods, since their function is now mostly done by parser actions with standard adapter methods called within parser's context. Example of product parser definition, parsing only products from selected store:

    <action type="catalog/convert_parser_product" method="unparse">
        <var name="store"><![CDATA[1]]></var>
    </action>

  4. Mapping values

    DataFlow module provides also a mapper concept - class with map() method that is responsible for mapping processed fields from one to another. The definition of mapper looks like that for example:

    <action type="dataflow/convert_mapper_column" method="map">
        <var name="map">
            <map name="category_ids"><![CDATA[categorie]]></map>
            <map name="sku"><![CDATA[reference]]></map>
            <map name="name"><![CDATA[titre]]></map>
            <map name="description"><![CDATA[description]]></map>
            <map name="price"><![CDATA[prix]]></map>
            <map name="special_price"><![CDATA[special_price]]></map>
            <map name="manufacturer"><![CDATA[marque]]></map>
        </var>
        <var name="_only_specified">true</var>
    </action>

    Again we have action tag with two attributes: type set as mapper class alias and method that is called to do the mapping. Mapper dataflow/convert_mapper_column is a standard mapper you can find in Magento DataFlow module within app/code/core/Dataflow/Model/Mapper/ folder, and its purpose is to map one array into another with changing the name and posibility to limit fields in result. Map's tag attribute name tells which field name should be replaced in new array by field named like the content of map's tag. If named field doesn't exist in source array, value for target's array field is set to null. Variable _only_specified tells if only fields specified in map definition should be in the resulting array.

This article would be the one that close standard features of DataFlow module and basics of its usage.

Magento Dataflow - Optimized Product Import [Part 3]

Magento Dataflow module comes with standard product adapter (see Magento Dataflow - Default Adapters [Part 2]). Sometimes though, default solution is not enough and you may want to create your own adapter processing products.

Creating own adapter is not hard, but if you forget two lines of code, you may be very surprised with its performance. These two lines you should add before calling $product->save():

$product->setIsMassupdate(true);
$product->setExcludeUrlRewrite(true);

First line sets $data variable 'is_massupdate', which can be later checked to save some postprocessing actions time. Some observers watching for catalog_product_save_after event check this value (i.e. CatalogRule module's Observer, which skips action of applying catalog rules on products if  $product->getIsMassupdate() returns  true )

Second line also sets $data variable 'exclude_url_rewrite', which is used by afterSave method of Mage_Catalog_Model_Product_Attribute_Backend_Urlkey to check if catalog url rewrite cache should be refreshed.

Those two lines allow you to save few seconds per each row of processed products, so keep in mind to not forget about them

Magento Dataflow - Default Adapters [Part 2]

"Magento DataFlow - Data Exchange Made Flexible" article introduced global concept of data exchange framework implemented in Magento. Today I would like to tell more about default adapters implemented in DataFlow module.

  1. Adapter definition

    Adapters are responsible for pluging into an external data resource and fetching requested data or saving given data into data resource. For this purpose all adapters implement interface Mage_Dataflow_Model_Convert_Adapter_Interface which contains two methods: load() and save(). Data exchange concept introduced in DataFlow module use adapters in 3 contexts:

    • to load data from resource - using load() method
    • to save data to resource - using save() method
    • to process one parsed row - when defined as adapter/method pair of variables of parser

    For first two contexts adapter's xml definition looks like that:

    <action type="dataflow/convert_adapter_io" method="load">
        ...
    </action>

    Action tag has two parameters: type and method. Type tells as which adapter class is to be used in this action. It is defined using its alias. Method tells us which method of this adapter class action should call. As mentioned before, by default there are two available methods: load and save. Children of action tag define variables which are parameters used when executing adapter's method. Variables are defined like in the example below:

    <action type="dataflow/convert_adapter_io" method="load">
        <var name="type">file</var>
        <var name="path">var/import</var>
        <var name="filename"><![CDATA[products.csv]]></var>
        <var name="format"><![CDATA[csv]]></var>
    </action>

  2. Magento DataFlow default adapters

    Magento DataFlow module contains few default adapter classes which you can find in app/code/core/Dataflow/Model/Convert/Adapter. Not all of them have yet implemented load() and save() methods.

    For common case of reading data from or saving data to local or remote file you will use dataflow/convert_adapter_io (Mage_Dataflow_Model_Convert_Adapter_Io).

    Following variables will allow you to define local/remote file as data source:

    • type - defines type of io source we want to process. Valid values: file, ftp
    • path - defines relative path to the file
    • filename - defines data source file's name
    • host - for ftp type it defines the ftp host
    • port - for ftp type it defines the ftp port; if not given, default value is 21
    • user - for ftp type it defines the ftp user, if not given default value is 'anonymous' and password then is 'anonymous@noserver.com'
    • password - for ftp type it defines the ftp user's password
    • timeout - for ftp type it defines connection timeout; default value is 90
    • file_mode - for ftp type it defines file mode; default value is FTP_BINARY
    • ssl - for ftp type if it is not empty, then ftp ssl connection is used
    • passive - for ftp type it defines connection mode; default value is false
  3. Customer and Product adapters

    For most commonly exchanged entities - customer and product - Magento provides default adapters: customer/convert_adapter_customer (Mage_Customer_Model_Convert_Adapter_Customer) and catalog/convert_adapter_product (Mage_Catalog_Model_Convert_Adapter_Product). Both inherit from Mage_Eav_Model_Convert_Adapter_Entity.

    To simply load all customers data for selected store you can use the following xml:

    <action type="customer/convert_adapter_customer" method="load">
        <var name="store">default</var>
    </action>

    Sometimes you may want to not load all customers in database. To help you with this there are following variables valid:

    • filter/firstname - to load only customers with firstname starting with value of this variable
    • filter/lastname - to load only customers with lastname starting with value of this variable
    • filter/email - to load only customers with email starting with value of this variable
    • filter/group - to load only customers from group with id equal to value of this variable
    • filter/adressType - to export only selected addressType; valid values are: both, default_billing, default_shipping
    • filter/telephone - to load only customers with telephone starting with value of this variable
    • filter/postcode - to load only customers with postcode starting with value of this variable
    • filter/country - to load only customers with country iso code equal to value of this variable
    • filter/region - to load only customers with region equal to value of this variable (for US just 2-letter state names)
    • filter/created_at/from - to load only customers created after a date defined as value of this variable
    • filter/created_at/to - to load only customers created before a date defined as value of this variable

    For example:

    <action type="customer/convert_adapter_customer" method="load">
        <var name="store"><![CDATA[0]]></var>
        <var name="filter/firstname"><![CDATA[a]]></var>
        <var name="filter/lastname"><![CDATA[a]]></var>
        <var name="filter/email"><![CDATA[a]]></var>
        <var name="filter/group"><![CDATA[1]]></var>
        <var name="filter/adressType"><![CDATA[default_billing]]></var>
        <var name="filter/telephone"><![CDATA[1]]></var>
        <var name="filter/postcode"><![CDATA[7]]></var>
        <var name="filter/country"><![CDATA[BS]]></var>
        <var name="filter/region"><![CDATA[WA]]></var>
        <var name="filter/created_at/from"><![CDATA[09/22/09]]></var>
        <var name="filter/created_at/to"><![CDATA[09/24/09]]></var>
    </action>

    Same way you can load and filter products loaded from database with following variables:

    • filter/name - to load only products with name starting with value of this variable
    • filter/sku - to load only products with sku starting with value of this variable
    • filter/type - to load only products with type defined as value of this variable; valid values are: simple, configurable, grouped, bundle, virtual, downloadable
    • filter/attribute_set - to load only products with attribute set id equal to value of this variable
    • filter/price/from - to load only products with price starting from value of this variable
    • filter/price/to - to load only products with price up to value of this variable
    • filter/qty/from - to load only products with quantity starting from value of this variable
    • filter/qty/to - to load only products with quantity up to value of this variable
    • filter/visibility - to load only products with visibility id equal to value of this variable
    • filter/status - to load only products with status id equal to value of this variable

    Example:

    <action type="catalog/convert_adapter_product" method="load">
        <var name="store"><![CDATA[0]]></var>
        <var name="filter/name"><![CDATA[a]]></var>
        <var name="filter/sku"><![CDATA[1]]></var>
        <var name="filter/type"><![CDATA[simple]]></var>
        <var name="filter/attribute_set"><![CDATA[29]]></var>
        <var name="filter/price/from"><![CDATA[1]]></var>
        <var name="filter/price/to"><![CDATA[2]]></var>
        <var name="filter/qty/from"><![CDATA[1]]></var>
        <var name="filter/qty/to"><![CDATA[2]]></var>
        <var name="filter/visibility"><![CDATA[2]]></var>
        <var name="filter/status"><![CDATA[1]]></var>
    </action>

Seems a little bit frightening if you see all those id values you have to provide for filters. Fortunatelly for these two entities - customers and products - there is wizard like profile generator that allows you to define filter with simple select boxes.

In next part I will describe use of parsers and adapters in context of parsers.

Magento DataFlow - Data Exchange Made Flexible [Part 1]

One of major features of e-commerce websites is the possibility to share data with offline sale management systems. Magento made data exchange flexible and quite easy with DataFlow module.

Magento DataFlow is a data exchange framework that use four types of components: adapter, parser, maper and validator. At current state of developement validators are not implemented, but are reserved for future use.

Dataflow of data exchange process is defined as XML structure and called profile. Magento provides simple wizard-like tool for generation of some basic import/export profiles operating on products or customers entities. Advanced profiles manager is also provided for advanced users able to create XML defining profile without wizard tool and with need to use more custom dataflow operations related also to other entities.

Adapters are responsible for pluging into an external data resource and fetching requested and filtered data. It can be used for example to get data from: local or remote file, web services, database and more.

For example to load data from csv file you can put in XML profile the following code:

<action type="dataflow/convert_adapter_io" method="load">
    <var name="type">file</var>
    <var name="path">var/import</var>
    <var name="filename"><![CDATA[products.csv]]></var>
    <var name="format"><![CDATA[csv]]></var>
</action>

To load data from remote FTP server you can use same adapter, but with these parameters:

<action type="dataflow/convert_adapter_io" method="load">
    <var name="type">ftp</var>
    <var name="host"><![CDATA[ftp.server.com]]></var>
    <var name="passive">true</var>
    <var name="user"><![CDATA[user]]></var>
    <var name="password"><![CDATA[password]]></var>
    <var name="path">var/import</var>
    <var name="filename"><![CDATA[products.csv]]></var>
    <var name="format"><![CDATA[csv]]></var>
</action>

Parsers are responsible for transforming one data format to another. It can be used for example to convert CSV file content to two-dimmensional array, or opposite.

To parse CSV file content into database product entities you can use this code in your profile:

<action type="dataflow/convert_parser_csv" method="parse">
    <var name="delimiter"><![CDATA[,]]></var>
    <var name="enclose"><![CDATA["]]></var>
    <var name="fieldnames">true</var>
    <var name="store"><![CDATA[0]]></var>
    <var name="number_of_records">1</var>
    <var name="decimal_separator"><![CDATA[.]]></var>
    <var name="adapter">catalog/convert_adapter_product</var>
    <var name="method">parse</var>
</action>

Adapter defined within parser variables as <var name="adapter"> and adapter's method <var name="method"> are responsible for parsing loaded data. In this particular case parser converts data from CSV file content to two-dimmensional array and calls the adapter's method "parse" to process it.

The simplest way of import customization is creating own adapter given as variable within parser definition. In most cases you will need to overwrite one of existing adapters and modify or write your own parsing method (in most cases it will be overwrited saveRow() method)

Mappers are responsible for altering data values from one to another. These are useful for maping one field to another.

In example below source's 'reference' column is mapped into 'sku' column and variable '_only_specified' is set to true, so imported/exported will be only listed columns:

<action type="dataflow/convert_mapper_column" method="map">
    <var name="map">
        <map name="sku"><![CDATA[reference]]></map>
        <map name="name"><![CDATA[name]]></map>
        <map name="price"><![CDATA[price]]></map>
        <map name="qty"><![CDATA[qty]]></map>
    </var>
    <var name="_only_specified">true</var>
</action>

This is just the tip of the iceberg of possibilities Magento DataFlow module offers. Come back later to read more.