Finding how many words are the same and duplicate content
I was trying to find how to get all the articles words then counting it by words, if it's tags it's obvious but what if I want to get the most words within all the articles combined using php and mysql using codeigniter as php framework.
public function find_words()
{
$query = $this->db->query('select title from articles');
$result = $query->result_array();
if (!empty($result)){
$wordsup = '';
foreach($result as $k => $v){
$wordsup .= $v['title'].' ';
}
}
$words = $this->utf8_str_word_count($wordsup,1);
$words = array_count_values($words);
arsort($words);
echo "
print_r($words);
exit();
}
function utf8_str_word_count($string, $format = 0, $charlist = null)
{
$result = array();
if (preg_match_all('~[p{L}p{Mn}p{Pd}'x{2019}' . preg_quote($charlist, '~') . ']+~u', $string, $result) > 0)
{
if (array_key_exists(0, $result) === true)
{
$result = $result[0];
}
}
if ($format == 0)
{
$result = count($result);
}
return $result;
}
or use this to find out duplicates articles by id
SELECT * FROM 2009_product_catalog WHERE sku IN ( SELECT sku FROM 2009_product_catalog GROUP BY sku HAVING count(sku) > 1 ) ORDER BY sku