Finding how many words are the same and duplicate content
I was trying to find how to get all the articles words then counting it by words, if it's tags it's obvious but what if I want to get the most words within all the articles combined using php and mysql using codeigniter as php framework.
public function find_words() { $query = $this->db->query('select title from articles'); $result = $query->result_array(); if (!empty($result)){ $wordsup = ''; foreach($result as $k => $v){ $wordsup .= $v['title'].' '; } } $words = $this->utf8_str_word_count($wordsup,1); $words = array_count_values($words); arsort($words); echo " print_r($words); exit(); } function utf8_str_word_count($string, $format = 0, $charlist = null) { $result = array(); if (preg_match_all('~[p{L}p{Mn}p{Pd}'x{2019}' . preg_quote($charlist, '~') . ']+~u', $string, $result) > 0) { if (array_key_exists(0, $result) === true) { $result = $result[0]; } } if ($format == 0) { $result = count($result); } return $result; }
or use this to find out duplicates articles by id
SELECT * FROM 2009_product_catalog WHERE sku IN ( SELECT sku FROM 2009_product_catalog GROUP BY sku HAVING count(sku) > 1 ) ORDER BY sku