Need a bit of help with a query

(2 posts) (2 voices)

Tags:

  1. ajdjackson, Member

    Hi

    I really could do with a bit of advice/help.

    I'm currently in the process of moving a myDBR instance from MySQL to SQL Server.

    I have several crosstab reports and they are all give me a headache in converting - primarily to do with the group by statement.

    For example

    Here is the MySQL code:


    select b.tblGroup_Cat_Desc as "[MCat]",
    c.subcategory as "[SCat]",
    d.tblcompanies_comp as "Company[Comp]",
    if(b.tblGroup_CAT_ID in (20),ifnull(sum(a.pl_val),0)*-1,ifnull(sum(a.pl_val),0)) as "Value, $[Val]",
    a.pl_company as "[bcomp]",
    a.pl_period as "[bdate]",
    a.pl_subcat as "[scat]",
    a.pl_maincat as "[mcat]",
    f.b_maincat as "[bcat]",
    inRType as "[rType]",
    if(f.b_maincat=20,ifnull(f.b_val,0),ifnull(f.b_val,0)*-1) as "Budget, $[BVal]",
    null as "Variance[bVar]",
    null as "% Var.[pVar]"
    from tmp_pandl a
    join tblgroup_coa_category b on a.pl_maincat=b.tblGroup_CAT_ID
    left outer join tblgroup_coa_subcat c on a.pl_subcat=c.subcatID
    join tblcompanies d on a.pl_company=d.tblcompanies_ID
    join (SELECT pl_subcat as pcat FROM tmp_pandl where iuser=inLogin group by 1 HAVING sum(pl_val) !=0) e on a.pl_subcat=e.pcat
    left outer join budget_sum f on a.pl_subcat=f.b_subcat
    where a.iuser=inLogin /*and a.pl_val!=0*/ and d.Ops=0
    group by a.pl_company,a.pl_subcat
    order by d.tblcompanies_ID,b.tblGroup_CAT_ID,c.subcatID;

    The issue I'm getting is that T-SQL requires me to add nearly every field in the above query to the group by statement. All the fields from Value to Budget are hidden are are used as parameters for linked reports. The above works as expected in MySQL but I have to add so many fields to the group by statement in T-SQL I end up getting duplicate lines in the report.

    As I said I have so many similar reports to I was hoping you could give me a steer on the best way to rewrite the above.

    Many thanks

    Jake

  2. myDBR Team, Key Master

    SQL Server is correct here and older MySQL's had an unfortunate "feature" by allowing incorrect use of GROUP BY.

    A GROUP BY clause is intended to be used when you mix normal columns with aggregate functions (SUM, MIN, MAX, COUNT...). The purpose of the GROUP BY is to divide the result set into groups into which the aggregate function is applied to. Sample use of GROUP BY would be the following:

    A list of customer purchases is grouped by the customer and the summary value for each customer is calculated.

    select c.id, c.name, sum( p.value )
    from customer c
    join purchases p on p.customer_id = c.id
    group by c.id, c.name

    SQL Server does require for you to list all non-aggregated columns (c.id, c.name) in the GROUP BY. Older MySQL versions allowed any combination of columns to be used, often resulting in erroneous results. Also, MySQL allowed reference by a column number while SQL Server requires the actual used columns. MySQL also allows for you to skip columns that can be derived from the others (like customer name is always the same as the corresponding customer id, so group by c.id would be enough for MySQL).

    To make it easy, use GROUP BY when you mix normal columns with aggregate functions and list all the columns in GROUP BY that are not used in an aggregate functions.

    In your query you do not have any aggregate functions, so you should not use GROUP BY at all. What if there were multiple f.b_maincat's per a.pl_subcat? Which one should the result set to show?

    You should write the query using DISTINCT. You may still come up with more rows than you expect and then you have to decide which of those rows you expect to get and fix the rest of the query.

    select distinct
    b.tblGroup_Cat_Desc as "[MCat]",
    c.subcategory as "[SCat]",
    d.tblcompanies_comp as "Company[Comp]",
    ...

    and the join:

    SELECT pl_subcat as pcat FROM tmp_pandl where iuser=inLogin group by 1 HAVING sum(pl_val) !=0

    should be written as

    SELECT pl_subcat as pcat FROM tmp_pandl WHERE iuser=inLogin GROUP BY pl_subcat HAVING sum(pl_val) !=0

    or using an IN-subquery (MS SQL Server's optimizer is way better than the MySQL's).

    WHERE a.pl_subcat in (
    SELECT pl_subcat as pcat
    FROM tmp_pandl
    WHERE iuser=inLogin
    GROUP BY pl_subcat
    HAVING sum(pl_val) !=0
    )

    --
    myDBR Team


Reply

You must log in to post.